Logical model
List of entities
Entity details
Entity namePrimary key attributes# AttributesDescription
AnalysisAnalysis_ID13This table captures information about each Analysis that is performed in one Study. A Study may perform one or more Analyses within that Study. An Analysis is the comparison of the Study's Dataset to a Reference.
CohortCohort_ID6This table contains information about a given Cohort . A Cohort could be part of one Study or a Cohort could be examined in more than one Study. For example, Samples from the HapMap Project or the Human Genome Diversity Panel are Cohorts that are examined in several Studies.
Copy_NumberCopy_Number_ID11This table captures descriptions of each Copy of a Copy Number Variant (CNV). At present, this information is not provided in a Study (primarily due to existing technological constraints). When these data become available, they can be captured in this table.
CoverageID7This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
Cytoband_OverlapID5This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
DatasetDataset_ID8This table represents an instance of output, or Dataset, generated by processing one or more Samples on a given Platform (employing a given Method) within a Study. One Study can have many Datasets.
Dataset_AnalysisDataset_Analysis_ID9This table describes each Dataset that is compared to one Reference during one Analysis within one Study. One or more Analyses may be performed on the same Dataset in a Study. One or more Datasets may be Analyzed in one Study. An Analysis is the comparison of the Study's Dataset to a Reference using an Analysis Tool. The output of the Analysis is the Variant set (described primarily in the 'Variant' table and secondarily in the 'Variant_Type', 'Copy_Number', and 'Variant_Mapping' tables). For example, if one Analyses examined three Datasets, there would be three records in this table, each having the same 'Analysis_ID' but different 'Dataset_ID'.
Dataset_XrefDataset_Xref_ID8This table captures any external identifers used to reference a Dataset that exists in the DGV. This information is currently not being captured by DGVa or dbVar. For example, DGVa Dataset accession numbers would be recorded here. Those identifiers would cross-referecnce Datasets in the DGVa.
Feature_OverlapID9This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
FilterFilter_ID9This table stores all records that are filtered out during the DGV Filtering steps, which is part of the automated DGV-curation pipeline. Each record that is filtered out is recorded with its Primary Key (unique table ID), table name, reason for being filtered out, and at what step the record was filtered out. Timestamps on each record help keep track of when, specifically, a record was filtered out.
Gene_OverlapID5This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
Merged_AnalysisMerged_Analysis_ID7Data from multiple Analyses can be merged and further Analyzed within one Study. This table captures the relationshjps between these multiple Analyses. For example, Analysis 1, Analysis 2, and Analysis 3 could then be merged into Analysis 4. This table captures the relationship between Analyses 1, 2, and 3 and Analysis 4. Analyses 1, 2, 3 are called "Sub Analyses" and Analysis 4 is called a "Super Analysis".
Method_Platform_SampleMethod_Platform_Sample_ID9This table captures information about an instance of a Method and of a Platform applied to a given Sample within a Study. Also captured is the Dataset to which the Sample-Method-Platform combination belongs. A Study can employ one or more Methods and one or more Platforms, in examining one or more Samples. Within one Study, the same Sample or Samples can be examined by one or more Methods or Platforms. This table captures the complex relationship between Method, Platform, and Sample.
Method_StudyMethod_Study_ID12This table captures each instance of a Method employed within one Study. One Study may employ one or more Methods, and/or one or more instances of the same Method.
Platform_StudyPlatform_Study_ID12This table captures each instance of a Platform being employed in a Study. One Study may employ one or more Platforms.
ReferenceReference_ID11This table describes the Reference applied to one Dataset in one Study. One or more Datasets are compared against one or more References. Each instance of this is captured in this table. A Reference is that to which the Dataset set in a given Study is compared by Analysis.
Reference_AssemblyReference_Assembly_ID3This table is used as a back-end to serve a drop down list on the new DGV website for a user to select a given Reference Assembly in which to view the data.
SampleSample_ID17This table contains all of the Samples that have been examined in any Study that is part of the DGV. Some Studies do not report Sample IDs but instead only a Variant ID. In these cases, it will not be possible to associate a given Variant to a Sample. This would typically be due to the need to protect the privacy of the individual who provided the Sample e.g. in cases where there is non-consent to share Sample information
Sample_PooledSample_Pooled_ID7This table captures the mapping between a Pooled Sample and the individual Samples that comprise the Pooled Sample - when this information is provided for a given a Study.
Sample_StudySample_Study_ID8This table resolves the many-to-many relationship between the Study and Sample tables. One Study may examine one or more Samples. One Sample may be examined in one or more Studies.
Sample_XrefSample_Xref_ID10This table captures external identifiers used to refer to Samples stored in the DGV. For example, a Sample may be assigned an (external) identifier when that Sample is analyzed in a Study. When that Sample is loaded to the DGV, it will receive a DGV database identifier. This table will capture the cross-reference between the external sample identifiers/accession numbers and the DGV identifiers. There could be external identifiers from other systems or projects, such as the HapMap project, the Coriell repository, or the Human Genome Diversity Panel. This table resolves the cross-referencing of these identifiers/accession numbers and the DGV Sample identifiers.
StudyStudy_ID20This table captures information about the Study in which structural variants were discovered and validated (not all discovered variants are validated in a Study). The 'universal identifier' of the study, usually the 'PubMed ID' (PMID) is captured in this table; the intention of this ID is to uniquely identify the Study across other databases. The 1000 Genomes Project Studies generate multiple input files per Study, or a "Study group". As each of those input files are loaded to the DGV, each will be assigned its own Study_ID. In these cases, all files from a Study group will have the same 'Study_Accession' value (e.g. "estd59" represents many input files from the 1000 Genomes project; one Study Accession number per multiple files). Sometimes an external database will be referenced in a record in the database. Below is a translation of the abbreviations one might find in a record in this database, and the full names of those abbreviations/external databases: AE = EBI Array Express CLONE = NCBI CLONE dbSNP = NCBI dbSNP dbGaP = NCBI dbGAP GENBANK = NCBI GENBANK GENE = NCBI GENE GEO = NCBI GEO HPO = Human Phenotype Ontology MeSH = Medical Subject Headings OMIM = Online Mendelian Inheritance in Man PROBE = NCBI PROBE SRA = NCBI Sequence Read Archive (to be discontinued in 2011-2012) ERA = EBI SRA TRACE = NCBI TRACE Archive (to be discontinued in 2011-2012)
Study_File_PrefixStudy_File_Prefix_ID6
Supporting_Merged_VariantSupporting_Merged_Variant_ID7This table describes which Supporting Variants comprise a given Merged Variant. Each record in this table will include the Merged Variant ID and one of its corresponding Supporting Variant IDs; there will be 'n' number of records for every 'n' number of Supporting Variants that comprise a Merged Variant. Note: The Variant table stores both Supporting and Merged Variant records. One Merged Variant is comprised of one or more Supporting Variants; Supporting Variants are not always reported in a Study (i.e. sometimes only the Merged, or "asserted" Variants are reported). This, the 'Sample_Merged_Variant' table resolves the one-to-many relationship of Supporting to Merged Variants that exist in the 'Variant' table. Both dbVar and DGVa describe what DGV calls a "Merged Variant" as an "asserted Variant" (SV; e.g. esv12345) and all three systems call the Variants the comprise a Merged Variant, "Supporting Variants" (SSV; e.g. nssv12324).
Translocation_MappingTranslocation_Mapping_ID29This table captures the mapping coordinates of a Translocation Variant or any other Variant that may have mappings across two Chromosomes. e.g. translocation, transposon A translocation can span more than one Chromosome, creating a one-to-many relationship between a Variant and its respective mapping records. Note: For other Variant Types, their respective mappings are stored in the 'Variant_Mapping' table. Translocation Mapping information is not likely to be provided in a Study until technology is refined enough to detect such events.
VariantVariant_ID23This table captures all Variants reported in Studies that have been loaded to the DGV. This table stores both Supporting and Merged Variants. For Studies that were accessioned by DGVa, the Variants are assigned a DGVa accession number that starts with "e"; for Studies that were accessioned by dbVar, the Variants are assigned a dbVar accession number that starts with "n". For example, a Supporting Variant, in DGVa will be assigned an accession number like "essv12345"; merged Variants, or what DGVa and dbVar call "asserted" Variants, would be assigned an accession numer like "esv45678". For dbVar, these examples, instead, would be, respectively, "nssv12345" and "nsv45678".
Variant_AnalysisVariant_Analysis_ID7One or more Variants can be related to one or more 'Dataset_Analysis' records. This table resolves the M:M relationship between the 'Dataset_Analysis' and 'Variant' tables. This is in order to support the relationships defined in the dbVar/DGVa XSD. For each 'Dataset_Analysis' record to which a Variant is associated, this table will store a record for each instance of this. For example, Variant_1 is associated to Dataset_Analysis 1, 2, and 3. So in the 'Variant_Analysis' table there would be three records, each containing the same 'Variant_ID' and each with a different 'Dataset_Analysis_ID', i.e. 1, 2, or 3.
Variant_MappingVariant_Mapping_ID29This table captures the mapping coordinates of a Variant. The purpose of capturing these data in a separate table is to afford mappings to any of the human genome reference assemblies (e.g. hg17, hg18) for a given Variant
Variant_TypeVariant_Type_ID21This table contains a list of all possible Variant Types that can be attributed to a given Variant. A Variant could have more than one type, e.g. Inversion + Deletion. As technologies develop further, more refined Variant Type descriptions could arise. This table will allow for multiple Variant Types to be ascribed to one Variant. NB The ICGC nomenclature already describes Variant Types with sub-categories. This is an example of how a Variant could be described with multiple Variant Types.
Variant_XrefVariant_Xref_ID9This table captures any external identifers used to reference a Variant that exists in the DGV. For example, DGVa 'esv' (asserted variant) and 'essv' (supporting variant) accession numbers would be recorded here. Those identifiers would cross-referecnce Variants in the DGVa. This table can also capture IDs from external databases for records that relate to a given Variant in this table. For example, if a Variant's sequence has been deposited to NCBI's Trace Archive (formally discontinued in February, 2011), that sequence will receive one or more TEMPLATE_IDs, e.g. TEMPLATE_ID = G248P82926C7. In this example, the 'External_ID_Source' = "Trace Archive" and the 'External_Variant_ID' = "G248P82926C7", with the corresponding 'Variant_ID' for that Variant from the 'Variant' table.