Data Formats
Omic supports a wide range of standard bioinformatics and cheminformatics data formats.
Molecular Data
SMILES
Simplified Molecular Input Line Entry System for representing chemical structures.
# Example SMILES strings CC(=O)Oc1ccccc1C(=O)O # Aspirin CN1C=NC2=C1C(=O)N(C(=O)N2C)C # Caffeine CC(C)Cc1ccc(cc1)C(C)C(=O)O # Ibuprofen
SDF/MOL Files
Structure Data Files containing 2D/3D coordinates and molecular properties.
compounds.sdf ├── Molecule 1 │ ├── Atoms and coordinates │ ├── Bonds │ └── Properties (MW, logP, etc.) ├── Molecule 2 └── ...
Sequence Data
FASTA
Standard format for nucleotide and protein sequences.
>sp|P04637|P53_HUMAN Cellular tumor antigen p53 MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE
Expression Data
Count Matrix (CSV/TSV)
Gene expression count matrix with genes as rows and samples as columns.
gene_id,sample_1,sample_2,sample_3,sample_4 ENSG00000141510,1523,1821,892,1102 ENSG00000171862,4521,4102,5821,5012 ENSG00000134086,892,1021,723,812
Sample Metadata
Sample annotations including condition labels and covariates.
sample_id,condition,age,sex,batch sample_1,disease,45,M,batch1 sample_2,disease,52,F,batch1 sample_3,control,48,M,batch2 sample_4,control,51,F,batch2
Output Formats
Target Discovery Results (JSON)
Ranked target list with scores and supporting evidence.
{
"targets": [
{
"gene_symbol": "EGFR",
"ensembl_id": "ENSG00000146648",
"druggability_score": 0.92,
"expression_fc": 3.21,
"network_centrality": 0.85,
"literature_evidence": 127,
"existing_drugs": ["Erlotinib", "Gefitinib"]
}
],
"patient_clusters": [...],
"pathways": [...]
}File Size Limits
| File Type | Max Size | Notes |
|---|---|---|
| Expression Matrix | 500 MB | Up to 100,000 genes × 10,000 samples |
| SDF File | 1 GB | Up to 10M compounds |
| FASTA | 100 MB | Protein or nucleotide |
