Logical model
List of entities
Entity details
Entity: Analysis
Entity details:
DescriptionThis table captures information about each Analysis that is performed in one Study. A Study may perform one or more Analyses within that Study. An Analysis is the comparison of the Study's Dataset to a Reference.
Primary key constraint namePK_Analysis
Attributes:
KeyAttribute nameData typeNot nullDescription
PKAnalysis_IDSERIALYesThis is the unique database identifer for one Analysis performed within a Study.
FKStudy_IDINTEGERYesThis is the unique identifier from the 'Study' table and represents the Study in which an instance of an Analysis was employed.
Analysis_DescriptionCHARACTER VARYINGNoThis field captures a brief description of the Analysis employed in a Study. This information is not always provided in a Study. Note: specific criteria used during the Analysis should be captured in the 'Analysis_Criteria' field and a general description captured in this field.
Analysis_TypeCHARACTER VARYINGNoIf this information is provided, the Analysis Type will be captured in this field; this information is not always provided in a Study. An example of an Analysis Type is "split-read mapping".
Analysis_ToolCHARACTER VARYINGNoThis field captures the name of the Analysis Tool employed in a given Analysis. Examples of Analysis Tools are "Birdsuite", "CNAG", "Genemapper".
Analysis_Tool_VersionCHARACTER VARYINGNoThis is a version number for the Analysis Tool that was used in a given Study. This information is not always provided for an Analysis Tool.
Analysis_Tool_DescriptionCHARACTER VARYINGNoThis is a description of the Analysis Tool that was used in a given Study. This information is not always provided for an Analysis Tool.
Analysis_Tool_TypeCHARACTER VARYINGNoThis field captures whether an Analysis Tool is an Algorithm or a Software Suite. This information is not always provided for a given Analysis Tool in a Study; DGVa and dbVar currently do not capture this information.
Analysis_NumberINTEGERNoThis is a sequential number assigned to each Analysis employed in a Study. This value is not unique in this table or across the database. For example, if five Analyses were employed in a Study, each Analysis (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
Analysis_CriteriaCHARACTER VARYINGNoIf the criteria to perform an Analysis are described in a Study, they are captured here. E.g. parameters set in an Analysis Tool.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Analysis_Study', etc.
Relationships:
Relationship nameTypeParentChildCardinality
Analysis_Dataset_AnalysisNon IdentifyingAnalysisDataset_AnalysisZero Or More
Analysis_Sub_AnalysisNon IdentifyingAnalysisMerged_AnalysisZero Or More
Analysis_Super_AnalysisNon IdentifyingAnalysisMerged_AnalysisZero Or More
Study_AnalysisNon IdentifyingStudyAnalysisZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Analysis_1Analysis_IDUNIQUE
IDX_Analysis_2Study_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullAnalysis_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_AnalysisTable ConstraintPrimary KeyPRIMARY KEY (Analysis_ID)
Study_AnalysisTable ConstraintForeign KeyFOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
Entity: Cohort
Entity details:
DescriptionThis table contains information about a given Cohort . A Cohort could be part of one Study or a Cohort could be examined in more than one Study. For example, Samples from the HapMap Project or the Human Genome Diversity Panel are Cohorts that are examined in several Studies.
Primary key constraint namePK_Cohort
Attributes:
KeyAttribute nameData typeNot nullDescription
PKCohort_IDSERIALYesThis is the unique identifier for one Cohort that is in the DGV.
Cohort_NameCHARACTER VARYINGNoThis is the name given to a Cohort within a given Study. Examples of Cohort Names are: "Human Genome Diversity Panel" or "HapMap".
Cohort_DescriptionCHARACTER VARYINGNoThis is a short description of the Cohort (within a Study). For example, the description can include details about what is common among the individuals within the Cohort, such as details about gender, age, etc.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally archived. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Cohort, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Cohort_SampleNon IdentifyingCohortSampleZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Cohort_1Cohort_IDUNIQUE
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullCohort_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_CohortTable ConstraintPrimary KeyPRIMARY KEY (Cohort_ID)
TUC_Cohort_1Table ConstraintUniqueUNIQUE (Cohort_Name)
Entity: Copy_Number
Entity details:
DescriptionThis table captures descriptions of each Copy of a Copy Number Variant (CNV). At present, this information is not provided in a Study (primarily due to existing technological constraints). When these data become available, they can be captured in this table.
Primary key constraint namePK_Copy_Number
Attributes:
KeyAttribute nameData typeNot nullDescription
PKCopy_Number_IDSERIALYesThis uniquely identifies one Copy of a Sample Level Copy Number Variant (CNV) in the DGV.
FKVariant_Type_IDINTEGERYesThis value represents one record from the 'Variant_Type' table and can be used to cross-reference a record in this table to its Variant (via 'Variant_ID') and associated Variant Type.
Copy_NumberCHARACTER VARYINGYesThis is the sequence number of the Copy in a given CNV. For example, if a CNV has 5 copies, there will be five records in this table, each representing one of the Copies of that CNV. For the five records in this example , 'Copy_Number' will have the values "1", "2", "3", "4", and "5". To find the total number of copies in one CNV, perform a query on the 'Variant' table (can use the 'Variant_ID' from this table) and get the 'Copy_Number_Count'.
Copy_DescriptionCHARACTER VARYINGNoThis is a desription of the structure, or composition of a given Copy of a CNV. An example of a description is "inverted copy" or "copy with SNP".
Copy_ChromosomeCHARACTER VARYINGNoThis is the Chromosome on which the Copy of the CNV is found.
Copy_StartCHARACTER VARYINGNoThis is the start coordinate for the Copy.
Copy_EndCHARACTER VARYINGNoThis is the end coordinate for the Copy.
Study_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Copy_Number, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Variant_Type_Copy_NumberNon IdentifyingVariant_TypeCopy_NumberZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Copy_Number_1Variant_Type_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullCopy_Number_ID
Column ConstraintNot NullVariant_Type_ID
Column ConstraintNot NullCopy_Number
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_Copy_NumberTable ConstraintPrimary KeyPRIMARY KEY (Copy_Number_ID)
TCC_Copy_Number_1Table ConstraintCheckCopy_Start <= Copy_End
Variant_Type_Copy_NumberTable ConstraintForeign KeyFOREIGN KEY (Variant_Type_ID) REFERENCES Variant_Type(Variant_Type_ID)
Entity: Coverage
Entity details:
DescriptionThis stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
Primary key constraint namePK_Coverage
Attributes:
KeyAttribute nameData typeNot nullDescription
PKIDSERIALYesThis is the primary key (PK) for this table; each record in this table will be assigned a unique PK.
ChromosomeCHARACTER VARYINGNo
Nucleotides_CoveredBIGINTNo
Chromosome_LengthBIGINTNo
Percent_CoverageDOUBLE PRECISIONNo
Reference_AssemblyCHARACTER VARYINGNo
Load_DateTIMESTAMPNo
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefault'NOW()'
Column ConstraintNot NullID
PK_CoverageTable ConstraintPrimary KeyPRIMARY KEY (ID)
Entity: Cytoband_Overlap
Entity details:
DescriptionThis stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
Primary key constraint namePK_Cytoband_Overlap
Attributes:
KeyAttribute nameData typeNot nullDescription
PKIDSERIALYesThis is the primary key (PK) for this table; each record in this table will be assigned a unique PK.
Variant_IDINTEGERNo
CytobandCHARACTER VARYINGNo
Reference_AssemblyCHARACTER VARYINGNo
Load_DateTIMESTAMPNo
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefault'NOW()'
Column ConstraintNot NullID
PK_Cytoband_OverlapTable ConstraintPrimary KeyPRIMARY KEY (ID)
TUC_Cytoband_Overlap_1Table ConstraintUniqueUNIQUE (Variant_ID, Cytoband, Reference_Assembly)
Entity: Dataset
Entity details:
DescriptionThis table represents an instance of output, or Dataset, generated by processing one or more Samples on a given Platform (employing a given Method) within a Study. One Study can have many Datasets.
Primary key constraint namePK_Dataset
Attributes:
KeyAttribute nameData typeNot nullDescription
PKDataset_IDSERIALYesThis field uniquely represents one Dataset that was generated in one Study. A Dataset is the output of examining one or more Samples on a Platform.
Dataset_NameCHARACTER VARYINGNoIf a Name has been assigned to a Dataset within a Study, that Name would be captured here. Currently DGVa and dbVar do not capture this information.
Dataset_NumberINTEGERNoThis is a sequential number assigned to each Dataset generated in a Study. This value is not unique in this table or across the database. For example, if five Datasets were generated in a Study, each Dataset (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
Dataset_DescriptionCHARACTER VARYINGNoThis field captures a brief description of the Dataset analyzed in a Study. This information is currently not captured by DGVa or dbVar. For example, if a Study is performing a re-analysis of a Dataset generated in a previous Strudy, such details would be captured in this Description field.
FKStudy_IDINTEGERNoThis is the unique identifier for a Study from the 'Study' table. This refers to the Study in which the Dataset was generated.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Dataset, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Dataset_Dataset_AnalysisNon IdentifyingDatasetDataset_AnalysisZero Or More
Dataset_Dataset_XrefNon IdentifyingDatasetDataset_XrefZero Or More
Dataset_Method_Platform_SampleNon IdentifyingDatasetMethod_Platform_SampleZero Or More
Dataset_ReferenceNon IdentifyingDatasetReferenceZero Or More
Study_DatasetNon IdentifyingStudyDatasetZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Dataset_1Study_ID
IDX_Dataset_2Dataset_IDUNIQUE
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullDataset_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_DatasetTable ConstraintPrimary KeyPRIMARY KEY (Dataset_ID)
Study_DatasetTable ConstraintForeign KeyFOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
TUC_Dataset_1Table ConstraintUniqueUNIQUE (Dataset_Number, Study_ID)
Entity: Dataset_Analysis
Entity details:
DescriptionThis table describes each Dataset that is compared to one Reference during one Analysis within one Study. One or more Analyses may be performed on the same Dataset in a Study. One or more Datasets may be Analyzed in one Study. An Analysis is the comparison of the Study's Dataset to a Reference using an Analysis Tool. The output of the Analysis is the Variant set (described primarily in the 'Variant' table and secondarily in the 'Variant_Type', 'Copy_Number', and 'Variant_Mapping' tables). For example, if one Analyses examined three Datasets, there would be three records in this table, each having the same 'Analysis_ID' but different 'Dataset_ID'.
Primary key constraint namePK_Dataset_Analysis
Attributes:
KeyAttribute nameData typeNot nullDescription
PKDataset_Analysis_IDSERIALYesThis uniquely identifies one Dataset that is compared to one Reference within a Study by one Analysis.
Dataset_Analysis_NumberINTEGERYesThis is a sequential number assigned to each Dataset-Analysis employed in a Study. This value is not unique in this table or across the database. For example, if five Dataset-Analysis were employed in a Study, each Dataset-Analysis (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
FKAnalysis_IDINTEGERYesThis is the unique identifer for one Analysis performed within a Study. This can be used to cross-reference one record in the 'Analysis' table.
FKReference_IDINTEGERYesThis uniquely identifies a given Reference against which a Dataset is compared during an Analysis within one Study. This can be used to cross-reference one record in the 'Reference' table.
FKDataset_IDINTEGERYesThis field uniquely represents one Dataset that was generated in one Study. This can be used to cross-reference one record in the 'Dataset' table.
FKStudy_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Dataset_Analysis_Reference, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Analysis_Dataset_AnalysisNon IdentifyingAnalysisDataset_AnalysisZero Or More
Dataset_Analysis_Variant_AnalysisNon IdentifyingDataset_AnalysisVariant_AnalysisZero Or More
Dataset_Dataset_AnalysisNon IdentifyingDatasetDataset_AnalysisZero Or More
Reference_Dataset_AnalysisNon IdentifyingReferenceDataset_AnalysisZero Or More
Study_Dataset_AnalysisNon IdentifyingStudyDataset_AnalysisZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Dataset_Analysis_1Analysis_ID
IDX_Dataset_Analysis_2Reference_ID
IDX_Dataset_Analysis_3Dataset_ID
IDX_Dataset_Analysis_4Study_ID
IDX_Dataset_Analysis_5Dataset_Analysis_IDUNIQUE
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullDataset_Analysis_ID
Column ConstraintNot NullDataset_Analysis_Number
Column ConstraintNot NullAnalysis_ID
Column ConstraintNot NullReference_ID
Column ConstraintNot NullDataset_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
Analysis_Dataset_AnalysisTable ConstraintForeign KeyFOREIGN KEY (Analysis_ID) REFERENCES Analysis(Analysis_ID)
Dataset_Dataset_AnalysisTable ConstraintForeign KeyFOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID)
PK_Dataset_AnalysisTable ConstraintPrimary KeyPRIMARY KEY (Dataset_Analysis_ID)
Reference_Dataset_AnalysisTable ConstraintForeign KeyFOREIGN KEY (Reference_ID) REFERENCES Reference(Reference_ID)
Study_Dataset_AnalysisTable ConstraintForeign KeyFOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
Entity: Dataset_Xref
Entity details:
DescriptionThis table captures any external identifers used to reference a Dataset that exists in the DGV. This information is currently not being captured by DGVa or dbVar. For example, DGVa Dataset accession numbers would be recorded here. Those identifiers would cross-referecnce Datasets in the DGVa.
Primary key constraint namePK_Dataset_Xref
Attributes:
KeyAttribute nameData typeNot nullDescription
PKDataset_Xref_IDSERIALYesThis is the unique identifer for one mapping of a Dataset in the DGV to an external identifier.
FKDataset_IDINTEGERYesThis is the unique identifer from the 'Dataset' table that represents one Dataset in the DGV. This can be used to cross-reference one record in the 'Dataset' table.
External_Dataset_IDCHARACTER VARYINGYesA Study may give a Dataset a Name or Identifier-code; this could also be the identifier assigned by an external informaiton system (e.g. DGVa). This field will capture either one of these that has been assigned to a given Dataset. This information is currently not being captured by DGVa or dbVar. For example, this could be an accession number, e.g. DGVa Dataset accession number, or the Dataset ID assigned in a Study, e.g. "A01".
External_ID_SourceCHARACTER VARYINGYesThis is the original source system from which the 'External_Dataset_ID' comes. This information is currently not being captured by DGVa or dbVar. For example, this could be "DGVa" or "dbVar", "Smith et al 2009" or a PubMed ID.
Study_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Dataset_Xref, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Dataset_Dataset_XrefNon IdentifyingDatasetDataset_XrefZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Dataset_Xref_1Dataset_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullDataset_Xref_ID
Column ConstraintNot NullDataset_ID
Column ConstraintNot NullExternal_Dataset_ID
Column ConstraintNot NullExternal_ID_Source
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
Dataset_Dataset_XrefTable ConstraintForeign KeyFOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID)
PK_Dataset_XrefTable ConstraintPrimary KeyPRIMARY KEY (Dataset_Xref_ID)
TUC_Dataset_Xref_1Table ConstraintUniqueUNIQUE (External_Dataset_ID, External_ID_Source)
Entity: Feature_Overlap
Entity details:
DescriptionThis stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
Primary key constraint namePK_Feature_Overlap
Attributes:
KeyAttribute nameData typeNot nullDescription
PKIDSERIALYesThis is the primary key (PK) for this table; each record in this table will be assigned a unique PK.
FeatureCHARACTER VARYINGNo
Number_of_FeaturesINTEGERNo
Number_of_CNVs_OverlappedINTEGERNo
Percent_of_CNVs_OverlappedDOUBLE PRECISIONNo
Number_of_Features_OverlappedINTEGERNo
Percent_of_Features_OverlappedDOUBLE PRECISIONNo
Reference_AssemblyCHARACTER VARYINGNo
Load_DateTIMESTAMPNo
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefault'NOW()'
Column ConstraintNot NullID
PK_Feature_OverlapTable ConstraintPrimary KeyPRIMARY KEY (ID)
Entity: Filter
Entity details:
DescriptionThis table stores all records that are filtered out during the DGV Filtering steps, which is part of the automated DGV-curation pipeline. Each record that is filtered out is recorded with its Primary Key (unique table ID), table name, reason for being filtered out, and at what step the record was filtered out. Timestamps on each record help keep track of when, specifically, a record was filtered out.
Primary key constraint namePK_Filter
Attributes:
KeyAttribute nameData typeNot nullDescription
PKFilter_IDSERIALYesThis is the primary key of the Filter table; it uniquely represents a record in this table.
Run_IDINTEGERYesThis identifier uniquely represents one run of the Filtering steps during automated DGV data curation.
Study_IDINTEGERYesThis value represents the Study to which a given record in this table is associated. For example, if "Variant 123" from "Study 1" was Filtered out because it did not meet the DGV minimum size threshold, the value in this field would be "1".
Table_NameCHARACTER VARYINGYesThis is the table from which the Filtered record originated. For example, if a record from the 'Variant' table was filtered out, the value in this field would be "Variant".
Table_IDINTEGERYesThis is the primary key for the record as it is represented in its original table. For example, if "Variant 123" was filtered out, the value in this field would be "123". This value in conjunction with the 'Table_Name' uniquely represents a record in the database. (I.e. 'table_name.primary_key')
Filter_StepINTEGERYesThe value in this field describes the step during the automated DGV curation process on which the record was filtered out. For example, if the "Variant 123" record was filtered out during step 2 of the curation process, the value in this field would be "2".
Filter_ReasonCHARACTER VARYINGYesThe value in this field represents the reason for which a record was filtered out during the automated DGV curation process. For example, if "Variant 123" was filtered out becuase it did not meet the DGV minimum size threshold, the value in this field would be "too small".
Load_DateTIMESTAMPYesThis is the timestamp that describes when a given record was loaded to this table. This is used for auditing and maintenance purposes.
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultY
Column ConstraintNot NullFilter_ID
Column ConstraintNot NullRun_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullTable_Name
Column ConstraintNot NullTable_ID
Column ConstraintNot NullFilter_Step
Column ConstraintNot NullFilter_Reason
Column ConstraintNot NullLoad_Date
Column ConstraintDefault'now()'
PK_FilterTable ConstraintPrimary KeyPRIMARY KEY (Filter_ID)
Entity: Gene_Overlap
Entity details:
DescriptionThis stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
Primary key constraint namePK_Gene_Overlap
Attributes:
KeyAttribute nameData typeNot nullDescription
PKIDSERIALYesThis is the primary key (PK) for this table; each record in this table will be assigned a unique PK.
Variant_IDINTEGERNo
GeneCHARACTER VARYINGNo
Reference_AssemblyCHARACTER VARYINGNo
Load_DateTIMESTAMPNo
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefault'NOW()'
Column ConstraintNot NullID
PK_Gene_OverlapTable ConstraintPrimary KeyPRIMARY KEY (ID)
TUC_Gene_Overlap_1Table ConstraintUniqueUNIQUE (Variant_ID, Gene, Reference_Assembly)
Entity: Merged_Analysis
Entity details:
DescriptionData from multiple Analyses can be merged and further Analyzed within one Study. This table captures the relationshjps between these multiple Analyses. For example, Analysis 1, Analysis 2, and Analysis 3 could then be merged into Analysis 4. This table captures the relationship between Analyses 1, 2, and 3 and Analysis 4. Analyses 1, 2, 3 are called "Sub Analyses" and Analysis 4 is called a "Super Analysis".
Primary key constraint namePK_Merged_Analysis
Attributes:
KeyAttribute nameData typeNot nullDescription
PKMerged_Analysis_IDSERIALYesThe value in this field uniquely identifies one record in this table. This value represents one relationship between a "Super Analysis" and one of the "Sub Analyses" that comprise the "Super Analysis".
FKSub_Analysis_IDINTEGERYesThis represents the lower-level Analysis that was merged with another Analysis to create a higher-level Analysis, called the "Super Analysis".
FKSuper_Analysis_IDINTEGERYesThis represents the higher-level Analysis that is comprised of two or more lower-level Analyses, called "Sub Analyses".
Study_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated (or was generated in the case of a join table composed of only database identifiers).
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample_Merged_Variant, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Analysis_Sub_AnalysisNon IdentifyingAnalysisMerged_AnalysisZero Or More
Analysis_Super_AnalysisNon IdentifyingAnalysisMerged_AnalysisZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Merged_Analysis_1Sub_Analysis_ID
IDX_Merged_Analysis_2Super_Analysis_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullMerged_Analysis_ID
Column ConstraintNot NullSub_Analysis_ID
Column ConstraintNot NullSuper_Analysis_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
Analysis_Sub_AnalysisTable ConstraintForeign KeyFOREIGN KEY (Sub_Analysis_ID) REFERENCES Analysis(Analysis_ID)
Analysis_Super_AnalysisTable ConstraintForeign KeyFOREIGN KEY (Super_Analysis_ID) REFERENCES Analysis(Analysis_ID)
PK_Merged_AnalysisTable ConstraintPrimary KeyPRIMARY KEY (Merged_Analysis_ID)
Entity: Method_Platform_Sample
Entity details:
DescriptionThis table captures information about an instance of a Method and of a Platform applied to a given Sample within a Study. Also captured is the Dataset to which the Sample-Method-Platform combination belongs. A Study can employ one or more Methods and one or more Platforms, in examining one or more Samples. Within one Study, the same Sample or Samples can be examined by one or more Methods or Platforms. This table captures the complex relationship between Method, Platform, and Sample.
Primary key constraint namePK_Method_Platform_Sample
Attributes:
KeyAttribute nameData typeNot nullDescription
PKMethod_Platform_Sample_IDSERIALYesThis is the unique identifer to repsesent one instance of a Sample that is examined using one Method and one Platform in a given Study.
FKMethod_Study_IDINTEGERYesThis is the unique identifier from the 'Method_Study' table that represents one instance of a Method being employed in one Study. This can be used to cross-reference one record in the 'Method_Study' table.
FKPlatform_Study_IDINTEGERYesThis is the unique identifier from the 'Platform_Study' table that represents one instance of a Platform being employed in one Study. This can be used to cross-reference one record in the 'Platform_Study' table.
FKSample_Study_IDINTEGERYesThis is the unique identifer within the DGV of one instance of a Sample being examined in one Study.
FKDataset_IDINTEGERYesThis is the unique identifier from the 'Dataset' table that describes to which Dataset a given Sample, analyzed by a given Method-Platform, belongs.
Study_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated.
Source_SystemCHARACTER VARYINGYes01.21.10 This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Method_Platform_Sample', etc.
Relationships:
Relationship nameTypeParentChildCardinality
Dataset_Method_Platform_SampleNon IdentifyingDatasetMethod_Platform_SampleZero Or More
Method_Study_Method_Platform_SampleNon IdentifyingMethod_StudyMethod_Platform_SampleZero Or More
Platform_Study_Method_Platform_SampleNon IdentifyingPlatform_StudyMethod_Platform_SampleZero Or More
Sample_Study_Method_Platform_SampleNon IdentifyingSample_StudyMethod_Platform_SampleZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Method_Platform_Sample_1Platform_Study_ID
IDX_Method_Platform_Sample_2Dataset_ID
IDX_Method_Platform_Sample_3Sample_Study_ID
IDX_Method_Platform_Sample_4Method_Study_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullMethod_Platform_Sample_ID
Column ConstraintNot NullMethod_Study_ID
Column ConstraintNot NullPlatform_Study_ID
Column ConstraintNot NullSample_Study_ID
Column ConstraintNot NullDataset_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
Dataset_Method_Platform_SampleTable ConstraintForeign KeyFOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID)
Method_Study_Method_Platform_SampleTable ConstraintForeign KeyFOREIGN KEY (Method_Study_ID) REFERENCES Method_Study(Method_Study_ID)
PK_Method_Platform_SampleTable ConstraintPrimary KeyPRIMARY KEY (Method_Platform_Sample_ID)
Platform_Study_Method_Platform_SampleTable ConstraintForeign KeyFOREIGN KEY (Platform_Study_ID) REFERENCES Platform_Study(Platform_Study_ID)
Sample_Study_Method_Platform_SampleTable ConstraintForeign KeyFOREIGN KEY (Sample_Study_ID) REFERENCES Sample_Study(Sample_Study_ID)
Entity: Method_Study
Entity details:
DescriptionThis table captures each instance of a Method employed within one Study. One Study may employ one or more Methods, and/or one or more instances of the same Method.
Primary key constraint namePK_Method_Study
Attributes:
KeyAttribute nameData typeNot nullDescription
PKMethod_Study_IDSERIALYesThis uniquely identifies within the DGV one instance of a Method employed in one Study. For example, BAC_aCGH may be employed twice in one Study; each instance of this Method being employed in that Study would get its own unique 'Method_Study_ID'
FKStudy_IDINTEGERYesThis is the unique identifier from the 'Study' table and represents the Study in which an instance of a Method was employed.
Method_NameCHARACTER VARYINGNoThis is the name of a given Method employed in a Study; e.g. BAC_aCGH; MCD_analysis.
Method_NumberINTEGERNoThis is a sequential number assigned to each Method empoyed in a Study. This value is not unique in this table or across the database. For example, if five Methods were generated in a Study, each Method (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
Method_TypeCHARACTER VARYINGNoThis field indicates whether a Method employed in a given Study was either a 'Discovery Method' or a 'Validation Method'. The values in this field are either "Discovery" or "Validation".
Method_DescriptionCHARACTER VARYINGNoThis field captures a brief description of the Method employed in a Study. This information is not always provided for a given Study.
Primary_Data_FlagCHARACTER VARYINGNoThis field will capture whether the data being analyzed in a given Study is Primary or a Re-Analysis of Primary data that was generated in a different Study. This information is not always provided in a Study (and currently may not be captured by DGVa or dbVar). The values in this field are, "Y", if the data is Primary data generated by the Study in question, or "N" if the data was generated in a different Study. If the 'Primary_Data_Flag' is "N", to find the Method employed in the original Study that generated the data, query the 'Original_Method' field in this table.
Original_MethodCHARACTER VARYINGNoIf data that was generated in a different Study is being re-analyzed/meta-analyzed in a new Study, the 'Primary_Data_Flag' is "N", indicating the data was originally generated in a previous Study. This field captures the Method used in the original Study to first generate the Dataset being analyzed in this Study. This information is not always provided in a Study (and currently may not be captured by DGVa or dbVar).
Originators_of_DataCHARACTER VARYINGNoThe field should capture the original source of the data i.e. the group who ran the samples and generated the dataset. This field could capture, e.g. the PMID, the primary author of a Study, the Site at which/organization by whom the original Study was performed. This information is not always provided in a Study (and currently may not be captured by DGVa or dbVar).
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Method, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Method_Study_Method_Platform_SampleNon IdentifyingMethod_StudyMethod_Platform_SampleZero Or More
Study_Method_StudyNon IdentifyingStudyMethod_StudyOne Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Method_Study_1Study_ID
IDX_Method_Study_2Method_Study_IDUNIQUE
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullMethod_Study_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_Method_StudyTable ConstraintPrimary KeyPRIMARY KEY (Method_Study_ID)
Study_Method_StudyTable ConstraintForeign KeyFOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
TCC_Method_Study_1Table ConstraintCheckMethod_Type = 'Validation' OR Method_Type = 'Discovery'
Entity: Platform_Study
Entity details:
DescriptionThis table captures each instance of a Platform being employed in a Study. One Study may employ one or more Platforms.
Primary key constraint namePK_Platform_Study
Attributes:
KeyAttribute nameData typeNot nullDescription
PKPlatform_Study_IDSERIALYesThis is the database unique identifier that represents one instance of a Platform being employed in one Study.
FKStudy_IDINTEGERYesThis is the unique identifier from the 'Study' table and represents the Study in which an instance of a Platform was employed.
Platform_NumberINTEGERNoThis is a sequential number assigned to each Platform employed in a Study. This value is not unique in this table or across the database. For example, if five Platforms were employed in a Study, each Platform (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
Platform_NameCHARACTER VARYINGNoThis is the name used to refer to a Platform. For example, "Affymetrix GeneChip Human Genome U95Av2 [HG_U95Av2]"
Platform_DescriptionCHARACTER VARYINGNoThis is a short description of a given Platform. Not all Platforms will necessarily have a description for them but if one is available, it will be captured here.
Platform_TypeCHARACTER VARYINGNoThis field captures the Type of a given Platform. This information may not necessarily be provided in a Study.
Platform_VersionCHARACTER VARYINGNoThis describes the version of a given Platform. This information may not necessarily be provided in a Study.
GEO_Accession_NumCHARACTER VARYINGNoThis field will capture the identifier/accession number that represents a given Platform within the GEO data repository. This information is not always provided in a Study. e.g. GEO accession number, GPL1352
ArrayExpress_Accession_NumCHARACTER VARYINGNoThis field will capture the identifier/accession number that represents a given Platform within the ArrayExpress data repository. This information is not always provided in a Study. e.g. ArrayExpress accession number, A-AFFY-65
Source_SystemCHARACTER VARYINGYesThis field contains an identifier of the data repository in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Platform_Study', etc.
Relationships:
Relationship nameTypeParentChildCardinality
Platform_Study_Method_Platform_SampleNon IdentifyingPlatform_StudyMethod_Platform_SampleZero Or More
Study_Platform_StudyNon IdentifyingStudyPlatform_StudyZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Platform_Study_1Study_ID
IDX_Platform_Study_2Platform_Study_IDUNIQUE
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullPlatform_Study_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_Platform_StudyTable ConstraintPrimary KeyPRIMARY KEY (Platform_Study_ID)
Study_Platform_StudyTable ConstraintForeign KeyFOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
Entity: Reference
Entity details:
DescriptionThis table describes the Reference applied to one Dataset in one Study. One or more Datasets are compared against one or more References. Each instance of this is captured in this table. A Reference is that to which the Dataset set in a given Study is compared by Analysis.
Primary key constraint namePK_Reference
Attributes:
KeyAttribute nameData typeNot nullDescription
PKReference_IDSERIALYesThis uniquely identifies a given Reference against which a Dataset is compared within one Study.
FKDataset_IDINTEGERYesThis field uniquely represents one Dataset that was generated in one Study.
Reference_NumberINTEGERYesThis is a sequential number assigned to each Reference employed in a Study. This value is not unique in this table or across the database. For example, if five References were employed in a Study, each Reference (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
Reference_TypeCHARACTER VARYINGNoThis is the type of Reference that could be employed in a Study. The Reference type could be: "Ref_sequence", "Sampleset", "Sample", "Control_tissue", "Merged", or "Other".
Reference_Sample_IDCHARACTER VARYINGNoIf the Reference Type = "Sample", this field will capture the Sample identifier associated to that Sample, if one is provided. For example, if one HapMap Sample is used as a Reference, the HapMap Sample ID (e.g. NA07019) could be provided in this field.
Reference_SequenceCHARACTER VARYINGNoThis field captures the specific Sequence name if the Reference Type = "Reference Sequence". Examples of Sequence names are "HuRef" and "Hg18/Build36".
Reference_DescriptionCHARACTER VARYINGNoThis is a brief description of the Reference, if one is available.
FKStudy_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Reference_Dataset, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Dataset_ReferenceNon IdentifyingDatasetReferenceZero Or More
Reference_Dataset_AnalysisNon IdentifyingReferenceDataset_AnalysisZero Or More
Study_ReferenceNon IdentifyingStudyReferenceZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Reference_1Reference_IDUNIQUE
IDX_Reference_2Dataset_ID
IDX_Reference_3Study_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullReference_ID
Column ConstraintNot NullDataset_ID
Column ConstraintNot NullReference_Number
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
Dataset_ReferenceTable ConstraintForeign KeyFOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID)
PK_ReferenceTable ConstraintPrimary KeyPRIMARY KEY (Reference_ID)
Study_ReferenceTable ConstraintForeign KeyFOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
TUC_Reference_1Table ConstraintUniqueUNIQUE (Dataset_ID, Reference_Number)
Entity: Reference_Assembly
Entity details:
DescriptionThis table is used as a back-end to serve a drop down list on the new DGV website for a user to select a given Reference Assembly in which to view the data.
Primary key constraint namePK_Reference_Assembly
Attributes:
KeyAttribute nameData typeNot nullDescription
PKReference_Assembly_IDSERIALYesThis is the identifier that uniquely represents a given Reference Assembly in this table.
Reference_Assembly_NameCHARACTER VARYINGNoThis is the common name used for a given Reference Assembly; e.g. NCBI36/hg18
Load_DateTIMESTAMPNoThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Study, etc.
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefault'NOW()'
Column ConstraintNot NullReference_Assembly_ID
PK_Reference_AssemblyTable ConstraintPrimary KeyPRIMARY KEY (Reference_Assembly_ID)
Entity: Sample
Entity details:
DescriptionThis table contains all of the Samples that have been examined in any Study that is part of the DGV. Some Studies do not report Sample IDs but instead only a Variant ID. In these cases, it will not be possible to associate a given Variant to a Sample. This would typically be due to the need to protect the privacy of the individual who provided the Sample e.g. in cases where there is non-consent to share Sample information
Primary key constraint namePK_Sample
Attributes:
KeyAttribute nameData typeNot nullDescription
PKSample_IDSERIALYesThis is the unique identifier of a given Sample in the DGV. This identifier represents a Sample within the DGV and does not represent the unique identifier of a Sample in other databases. It is important to note that this identifier is to uniquely identify one Sample in the DGV. Many Studies may examine the same Sample, e.g. a HapMap Sample; this Sample should exist only once in this table but may be cross-referenced several times in the 'Sample_Study' and 'Sample_Xref' tables. we should only load that Sample once to the database
Biological_SourceCHARACTER VARYINGNoThis field captures a brief description of the Biological Source of a Sample. For example, the 'Biological_Source' could be "cell line", "blood", "tissue" etc. Please note how this field relates to the 'Source_Type' field.
Source_TypeCHARACTER VARYINGNoThis field describes the type of the Sample. For example, a Sample could be of the Type "Genomic DNA", "RNA", "Clone library", etc. Please note how this field relates to the 'Biological_Source' field.
Sample_DescriptionCHARACTER VARYINGNoThis field will capture a Description of the Sample when necessary/if available. For example, a Sample could be "a pool of 10 anonymous male donors". In these cases, a Description of the Sample is important, esp. with regards to representation of Sample/Subject numbers in the DGV (i.e. a pool of 10 anonymous donors translates from one Sample to 10 Subjects).
Subject_IDCHARACTER VARYINGNoIf a Subject identifier of some kind is provided in a Study - as associated to a given Sample, that Subject identifier will be captured here. Note: one Subject can provide one or more Samples, each of which is captured in the Sample table. Also, in many Studies, the HapMap Sample ID (e.g. "NA19215") is also included as the Subject ID.
KaryotypeCHARACTER VARYINGNoThis field will capture the Karyotype of a Sample. One person can provide one or more Samples and each Sample may have a different Karyotype. For example, an individual with cancerous cells may have one Karyotype from a normal tissue Sample and another Karyotype from a cancer-tissue Sample. Examples of Karyotypes are: 46, XY; 46, XX, 5p-; 46, XX, del(5)(p15.2) (the first two being normal karyotypes) It should be noted that DGV currently only presents data for healthy controls so karyotype information will likely not be included for a given Sample.
EthnicityCHARACTER VARYINGNoThis field captures the "Ethnicity" of a Sample. "Ethnicity" is sometimes also referred to as "Population" and refers to the geographic origins of an individual. The use of the term 'Ethnicity' can vary between Studies. Examples of 'Ethnicity' are: "Asian", "Yoruban", "European", "African"
GenderCHARACTER VARYINGNoThis field captures the gender of the Subject who provided the Sample. The value in this field can be "Male", "Female", "Unknown", or NULL.
Family_IDCHARACTER VARYINGNoThis ID will represent a given Family whose data are stored in the DGV. The 'Family_ID' is submitted with a Study and not generated by the database. Thus it is possible that two different Families are assigned the same 'Family_ID', but not likely. For example, where HapMap Samples are all from the same Family, the same 'Family_ID' will be assigned to each Family member's respective Sample.
Family_TypeCHARACTER VARYINGNoThis field will describe the type of Family represented by Samples in this table. This field could describe whether a family is a trio, for example. This field could also capture whether a Family is 'Simplex' or 'Multiplex'.
Family_MemberCHARACTER VARYINGNoThis field will capture which member of a Family is represented by a given Sample. For example, if a Sample is from the mother in a Family, the value in this field will be "mother", from the father, this value will be "father", etc..
Mother_IDCHARACTER VARYINGNoIf a Sample is part of a Family and Family information is provided, this field would contain the ID of the Mother of the individual who provided this Sample.This information can be useful when looking at pedigree.
Father_IDCHARACTER VARYINGNoIf a Sample is part of a Family and Family information is provided, this field would contain the ID of the Father of the individual who provided this Sample . This information can be useful when looking at pedigree.
FKCohort_IDINTEGERNoThis is the Cohort to which a Sample belongs. This field may not be populated. A Sample could belong to a commonly referenced Cohort such as HapMap or the Human Genome Diversity Panel. A Sample could also belong to a Cohort within a specific Study.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally archived. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Cohort_SampleNon IdentifyingCohortSampleZero Or More
Sample_Sample_StudyNon IdentifyingSampleSample_StudyOne Or More
Sample_Sample_XrefNon IdentifyingSampleSample_XrefZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Sample_1Cohort_ID
IDX_Sample_2Sample_IDUNIQUE
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullSample_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
Cohort_SampleTable ConstraintForeign KeyFOREIGN KEY (Cohort_ID) REFERENCES Cohort(Cohort_ID)
PK_SampleTable ConstraintPrimary KeyPRIMARY KEY (Sample_ID)
Entity: Sample_Pooled
Entity details:
DescriptionThis table captures the mapping between a Pooled Sample and the individual Samples that comprise the Pooled Sample - when this information is provided for a given a Study.
Primary key constraint namePK_Sample_Pooled
Attributes:
KeyAttribute nameData typeNot nullDescription
PKSample_Pooled_IDSERIALYesThis field uniquely identifies one instance of a mapping between an individual Sample and a Pooled Sample. For each individual Sample that comprises a Pooled Sample in the database, a record will be stored in this table. For example, if a Pooled Sample is comprised of ten individual Samples, there will be ten records stored in this table, each with the same 'Pooled_Sample_ID' (which can be cross-referenced with a 'Sample_Study_ID' in the 'Sample_Study' table) and ten different 'Sample_Study_ID's (each of which can be cross-referenced with a 'Sample_Study_ID' in the 'Sample_Study' table), each record will have a unique 'Sample_Pooled_ID'.
FKIndividual_Sample_IDINTEGERYesThis is the 'Sample_Study_ID' of one of the individual Samples that comprise a Pooled Sample. There will be two or more 'Individual_Sample_IDs' for every 'Pooled_Sample_ID' in this table.
FKPooled_Sample_IDINTEGERYesThis is the unique identifer of one Pooled Sample within the database. For each 'Pooled_Sample_ID' there will be two or more 'Individual_Sample_IDs'.
Study_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV, that Study in which the Pooled Samples were analyzed.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally archived. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample/Pooled Sample, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Sample_Individual_to_PooledNon IdentifyingSample_StudySample_PooledZero Or More
Sample_Pooled_to_PooledNon IdentifyingSample_StudySample_PooledZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Sample_Pooled_1Individual_Sample_ID
IDX_Sample_Pooled_2Pooled_Sample_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullSample_Pooled_ID
Column ConstraintNot NullIndividual_Sample_ID
Column ConstraintNot NullPooled_Sample_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_Sample_PooledTable ConstraintPrimary KeyPRIMARY KEY (Sample_Pooled_ID)
Sample_Individual_to_PooledTable ConstraintForeign KeyFOREIGN KEY (Individual_Sample_ID) REFERENCES Sample_Study(Sample_Study_ID)
Sample_Pooled_to_PooledTable ConstraintForeign KeyFOREIGN KEY (Pooled_Sample_ID) REFERENCES Sample_Study(Sample_Study_ID)
Entity: Sample_Study
Entity details:
DescriptionThis table resolves the many-to-many relationship between the Study and Sample tables. One Study may examine one or more Samples. One Sample may be examined in one or more Studies.
Primary key constraint namePK_Sample_Study
Attributes:
KeyAttribute nameData typeNot nullDescription
PKSample_Study_IDSERIALYesThis is the unique identifer within the DGV of one instance of a Sample being examined in one Study. One Study may examine one or more Samples. One or more Samples may be examined by one or many Studies. This field is the unique identifer that resolves the many-to-many relationship between these entities (Sample and Study).
FKSample_IDINTEGERYesThis value uniquely identifies a Sample in the database and will cross-reference one record in the 'Sample' table.
FKStudy_IDINTEGERYesThis value uniquely identifies a Study in the database and will cross-reference one record in the 'Study' table.
Case_or_ControlCHARACTER VARYINGNoThis field will capture whether a Sample is a Case or a Control in a Study. This field will not always be populated. Note that for the current DGV only Controls, or healthy individuals, are part of the database.
Sample_NumberINTEGERNoThis is the number assigned to a Sample within a Study. E.g. if 100 Samples were examined in a Study, those Samples would be assigned a number from 1 to 100. The 'Sample_Number' is meant to label each Sample differently WITHIN a Study and is not a unique identifier across Studies. The 'Sample_ID' and 'Sample_Study_ID' uniquely identify a Sample within the database.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Sample_Study', etc.
Relationships:
Relationship nameTypeParentChildCardinality
Sample_Individual_to_PooledNon IdentifyingSample_StudySample_PooledZero Or More
Sample_Pooled_to_PooledNon IdentifyingSample_StudySample_PooledZero Or More
Sample_Sample_StudyNon IdentifyingSampleSample_StudyOne Or More
Sample_Study_Method_Platform_SampleNon IdentifyingSample_StudyMethod_Platform_SampleZero Or More
Study_Sample_StudyNon IdentifyingStudySample_StudyOne Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Sample_Study_1Sample_ID
IDX_Sample_Study_2Study_ID
IDX_Sample_Study_3Sample_Study_IDUNIQUE
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullSample_Study_ID
Column ConstraintNot NullSample_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_Sample_StudyTable ConstraintPrimary KeyPRIMARY KEY (Sample_Study_ID)
Sample_Sample_StudyTable ConstraintForeign KeyFOREIGN KEY (Sample_ID) REFERENCES Sample(Sample_ID)
Study_Sample_StudyTable ConstraintForeign KeyFOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
TUC_Sample_Study_1Table ConstraintUniqueUNIQUE (Study_ID, Sample_Number)
Entity: Sample_Xref
Entity details:
DescriptionThis table captures external identifiers used to refer to Samples stored in the DGV. For example, a Sample may be assigned an (external) identifier when that Sample is analyzed in a Study. When that Sample is loaded to the DGV, it will receive a DGV database identifier. This table will capture the cross-reference between the external sample identifiers/accession numbers and the DGV identifiers. There could be external identifiers from other systems or projects, such as the HapMap project, the Coriell repository, or the Human Genome Diversity Panel. This table resolves the cross-referencing of these identifiers/accession numbers and the DGV Sample identifiers.
Primary key constraint namePK_Sample_Xref
Attributes:
KeyAttribute nameData typeNot nullDescription
PKSample_Xref_IDSERIALYesThis is the unique identifer for one mapping of a DGV Sample identifier to an external identifier. For example, if a Sample has been assigned two external identifiers, there will be two records in this table for that Sample, each having one of the two external identifiers.
FKSample_IDINTEGERYesThis is the unique identifier that represents one Sample in the DGV. A Sample should be loaded only once to the database and assigned only one Sample ID. This means that there could be one or more records in this table that have the same Sample ID (one for each external identifier assigned to that Sample).
External_Sample_IDCHARACTER VARYINGYesThis field captures an exernal identifier that is used to describe a Sample that exists in the DGV. The external identifier could be an accession number, e.g. DGVa Sample accession number, or a Common Cohort Sample ID, e.g. the HamMap Sample ID "NA11994".
External_ID_SourceCHARACTER VARYINGYesThis is the original source system from which the 'External_Sample_ID' comes. For example, this could be "DGVa" or "dbVar" or "HapMap", or the PubMed ID from a given Study, etc.
External_ID_TypeCHARACTER VARYINGNoThis is a further qualifier for an External Sample Identifier, if any has been assigned to an external identifier. For example, this could say "accession number" or the like.
Cohort_NameCHARACTER VARYINGNoThis is the Cohort to which a Sample belongs; this information may or may not be provided in a Study. Examples of Cohorts are "HapMap" or "Human Genome Diversity Panel/Project".
Study_IDINTEGERNoThis is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated.
Source_SystemCHARACTER VARYINGYesThis field contains an identifier of the data repository in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample_Xref, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Sample_Sample_XrefNon IdentifyingSampleSample_XrefZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Sample_Xref_1Sample_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullSample_Xref_ID
Column ConstraintNot NullSample_ID
Column ConstraintNot NullExternal_Sample_ID
Column ConstraintNot NullExternal_ID_Source
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_Sample_XrefTable ConstraintPrimary KeyPRIMARY KEY (Sample_Xref_ID)
Sample_Sample_XrefTable ConstraintForeign KeyFOREIGN KEY (Sample_ID) REFERENCES Sample(Sample_ID)
TUC_Sample_Xref_1Table ConstraintUniqueUNIQUE (External_Sample_ID, External_ID_Source)
Entity: Study
Entity details:
DescriptionThis table captures information about the Study in which structural variants were discovered and validated (not all discovered variants are validated in a Study). The 'universal identifier' of the study, usually the 'PubMed ID' (PMID) is captured in this table; the intention of this ID is to uniquely identify the Study across other databases. The 1000 Genomes Project Studies generate multiple input files per Study, or a "Study group". As each of those input files are loaded to the DGV, each will be assigned its own Study_ID. In these cases, all files from a Study group will have the same 'Study_Accession' value (e.g. "estd59" represents many input files from the 1000 Genomes project; one Study Accession number per multiple files). Sometimes an external database will be referenced in a record in the database. Below is a translation of the abbreviations one might find in a record in this database, and the full names of those abbreviations/external databases: AE = EBI Array Express CLONE = NCBI CLONE dbSNP = NCBI dbSNP dbGaP = NCBI dbGAP GENBANK = NCBI GENBANK GENE = NCBI GENE GEO = NCBI GEO HPO = Human Phenotype Ontology MeSH = Medical Subject Headings OMIM = Online Mendelian Inheritance in Man PROBE = NCBI PROBE SRA = NCBI Sequence Read Archive (to be discontinued in 2011-2012) ERA = EBI SRA TRACE = NCBI TRACE Archive (to be discontinued in 2011-2012)
Primary key constraint namePK_Study
Attributes:
KeyAttribute nameData typeNot nullDescription
PKStudy_IDSERIALYesThis is the unique identifier for a given Study within the DGV. This only uniquely identifies a Study in the DGV and not across other (external) databases.
PubMed_IDINTEGERNoThis is also known as 'PMID' and is the unique indentifer for a citation within PubMed. This identifier can be used to uniquely identify a publication, or Study, across many databases and is seen as a "universal" identifer for a given Study.
Display_NameCHARACTER VARYINGYesThis is the common name used to refer to a Study. These names are also used by DGVa and dbVar to refer to a Study. Examples of common names are: "Cooper et al 2008" or "Kidd et al 2008".
Study_AccessionCHARACTER VARYINGNoThe value in this field will be the accession number assigned to a Study by either NCBI-dbVar or EBI-DGVa. This accession number can then be used to cross-reference the Study in either dbVar or DGVa. If the Study was accessioned in dbVar, this value will be prefixed with "nstd"; e.g. "nstd35" If the Study was accessioned in DGVa, this value willl be prefixed with "estd"; e.g. "estd1"
OneK_Genomes_ProjectCHARACTER VARYINGNoThis is a flag to indicate whether a Study is part of the 1000 Genomes Project. Multiple "Studies" in the database may actually represent one Study from the 1000 Genomes Project. These Studies generate multiple input files per Study, thus generating multiple Study records in the database. These Studies will have a 1000 Genomes Project flag = "Y". The 'Study_Accession' value can be used to group records in this table (and subsequently on joins to other tables) to see which database Studies are related to one another, thereby representing one 1000 Genomes Project Study.
Primary_Author_NameCHARACTER VARYINGNoThis is the name of the Author whose name appears first on a publication for a given Study. It is noted that there could be multiple authors on a given Study but these are not captured in the DGV. For the full list of authors of a Study, one should refer to the original publication.
Primary_Author_IDCHARACTER VARYINGNoThis is a unique identifier that an Author would use to refer to himself/herself across multiple databases and within Studies. Examples of an 'Author_ID' are "ResearcherID", "OpenID", "Scopus ID"
Project_IDCHARACTER VARYINGNoThis field may capture a Project Identifier if one has been assigned to a Study, or in the case that the Study is not in PubMed (note this is only in rare cases and these Studies will likely at least have a 'Study_Accession' from either dbVar or DGVa).
Principal_InvestigatorCHARACTER VARYINGNoThis field will capture the Principal Investigator of a project if this information is provided for a given Study.
NCBI_Tax_IDINTEGERNoThis field captures the NCBI Taxonomy ID for the organism being Studied. For humans, the Taxonomy ID is "9606" so users should expect to always see this value in this field for the DGV. NB A very few Studies loaded to the DGV archival database (DGVS) may also contain data for non-humans; these data are filtered out during the automated DGV-curation pipeline and not presented to users.
Sample_SizeINTEGERNoThis is the number of Samples examined in one Study and will be an integer greater than zero.
Variant_CountINTEGERNoThis is the reported number of Variants that were discovered in a given Study. This value is assigned by either DGVa or dbVar. It is possible that the reported Variant_Count (stored in this table) is different than the actual Variant Count. When performing computations using record counts, it is best to count the actual number of reported Variants as opposed to using this value.
SiteCHARACTER VARYINGNoThis refers to the primary or main Site or Organization at which a Study is performed. This information is not always provided in a Study. For example, if a 1000 Genomes Study was conducted at multiple sites, the value in this field would be "1000 Genomes: multiple sites".
DescriptionCHARACTER VARYINGNoThis field will capture any other descriptors of a Study that need to be stored in the DGV. For example, if there were special circumnstances under which a Study was performed, or if a special Pre-Publication Agreement exists, this information would be captured here.
Curation_CommentsCHARACTER VARYINGNoThis field will capture a summary of DGV curation applied to a given Study. This information will be a summary of the DGV Filtering and Merging process. This information will be made available on the Downloads page of the DGV website under the links to the original data ('Comments' field of the Studies' listing).
Hold_Until_Publication_FlagCHARACTER VARYINGNo*** this field + its timestamp are only used in the DGVS (staging database for archiving, not for user-access) and not the public version, DGVC - and will infrequently, if at all, be used*** This field will capture whether a Study is to be held until publication. If a Variant is to be held until publication, this value will be "Y"; if it is not to be held until publication, this value will be either "N" or "NULL". If the value in this field is "Y", there will also be a timestamp indicating the date/time when the Variant can be published; this will be captured in the 'Publish_Date' field.
Publish_DateTIMESTAMPNoThe value in this field will be the date the Study's Author has given us to publish the Study. This field will infrequently, if at all, be used. If the Author does not provide a specific date, this value will be set to a date far into the future which will be reset when the Author has given the go-ahead to publish the Variant.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Study, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Study_AnalysisNon IdentifyingStudyAnalysisZero Or More
Study_DatasetNon IdentifyingStudyDatasetZero Or More
Study_Dataset_AnalysisNon IdentifyingStudyDataset_AnalysisZero Or More
Study_Method_StudyNon IdentifyingStudyMethod_StudyOne Or More
Study_Platform_StudyNon IdentifyingStudyPlatform_StudyZero Or More
Study_ReferenceNon IdentifyingStudyReferenceZero Or More
Study_Sample_StudyNon IdentifyingStudySample_StudyOne Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Study_1Study_IDUNIQUE
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefault'NOW()'
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullDisplay_Name
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_StudyTable ConstraintPrimary KeyPRIMARY KEY (Study_ID)
Entity: Study_File_Prefix
Entity details:
Description
Primary key constraint namePK_Study_File_Prefix
Attributes:
KeyAttribute nameData typeNot nullDescription
PKStudy_File_Prefix_IDSERIALYes
Study_IDINTEGERYes
Study_File_PrefixCHARACTER VARYINGYes
Source_SystemCHARACTER VARYINGYes
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYes
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefault'now()'
Column ConstraintNot NullStudy_File_Prefix_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullStudy_File_Prefix
Column ConstraintNot NullSource_System
Column ConstraintDefault'Y'
Column ConstraintNot NullLoad_Date
PK_Study_File_PrefixTable ConstraintPrimary KeyPRIMARY KEY (Study_File_Prefix_ID)
Entity: Supporting_Merged_Variant
Entity details:
DescriptionThis table describes which Supporting Variants comprise a given Merged Variant. Each record in this table will include the Merged Variant ID and one of its corresponding Supporting Variant IDs; there will be 'n' number of records for every 'n' number of Supporting Variants that comprise a Merged Variant. Note: The Variant table stores both Supporting and Merged Variant records. One Merged Variant is comprised of one or more Supporting Variants; Supporting Variants are not always reported in a Study (i.e. sometimes only the Merged, or "asserted" Variants are reported). This, the 'Sample_Merged_Variant' table resolves the one-to-many relationship of Supporting to Merged Variants that exist in the 'Variant' table. Both dbVar and DGVa describe what DGV calls a "Merged Variant" as an "asserted Variant" (SV; e.g. esv12345) and all three systems call the Variants the comprise a Merged Variant, "Supporting Variants" (SSV; e.g. nssv12324).
Primary key constraint namePK_Supporting_Merged_Variant
Attributes:
KeyAttribute nameData typeNot nullDescription
PKSupporting_Merged_Variant_IDSERIALYes9.10.9 This field represents one instance of a mapping between a Sample Level and a Merged Variant. For each Merged Variant, there will be as many records in this table as there are Sample Level Variants that comprise the Merged Variant (provided these data are available from a published dataset).
FKMerged_IDINTEGERYesThis field captures the unique identifer of a given Merged Variant in the DGV. This field cross-references one record in the Variant table. The Variant table stores both Supporting and Merged Variants. In the case of DGVa and dbVar, this is the "asserted variant", which is comprised of "supporting variants" (those are referenced in the 'Supporting_Variant_ID' field).
FKSupporting_Variant_IDINTEGERYesThis field captures the unique identifer of a given Supporting Variant in the DGV. This field cross-references one record in the Variant table. The Variant table stores both Supporting and Merged Variants. In the case of DGVa and dbVar, this is the "supporting variant", which comprises the "asserted variant" (those are referenced in the 'Merged_ID' field).
Study_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Supporting_Merged_Variant, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Sample_LevelNon IdentifyingVariantSupporting_Merged_VariantZero Or More
Variant_Supporting_Merged_VariantNon IdentifyingVariantSupporting_Merged_VariantZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Supporting_Merged_Variant_1Merged_ID
IDX_Supporting_Merged_Variant_2Supporting_Variant_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullSupporting_Merged_Variant_ID
Column ConstraintNot NullMerged_ID
Column ConstraintNot NullSupporting_Variant_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_Supporting_Merged_VariantTable ConstraintPrimary KeyPRIMARY KEY (Supporting_Merged_Variant_ID)
Sample_LevelTable ConstraintForeign KeyFOREIGN KEY (Supporting_Variant_ID) REFERENCES Variant(Variant_ID)
Variant_Supporting_Merged_VariantTable ConstraintForeign KeyFOREIGN KEY (Merged_ID) REFERENCES Variant(Variant_ID)
Entity: Translocation_Mapping
Entity details:
DescriptionThis table captures the mapping coordinates of a Translocation Variant or any other Variant that may have mappings across two Chromosomes. e.g. translocation, transposon A translocation can span more than one Chromosome, creating a one-to-many relationship between a Variant and its respective mapping records. Note: For other Variant Types, their respective mappings are stored in the 'Variant_Mapping' table. Translocation Mapping information is not likely to be provided in a Study until technology is refined enough to detect such events.
Primary key constraint namePK_Translocation_Mapping
Attributes:
KeyAttribute nameData typeNot nullDescription
PKTranslocation_Mapping_IDSERIALYesThis uniquely represents one mapping for one Translocation Variant stored in the DGV. A Translocation Variant may have multiple mappings reported for each part of the Variant. Each part will be represented by this unique identifer. All of the parts comprising one Translocation Variant can be associated in this table by the 'Variant_ID'.
FKVariant_IDINTEGERYes9.14.9 This is the unique identifer for a Variant in the DGV. The 'Variant_ID' can appear more than once in this table if that Variant has been mapped to more than one reference assembly of the human genome. This can be used to cross-reference one record in the 'Variant' table.
Reference_AssemblyCHARACTER VARYINGYesThis is the Reference Assembly that was used to map a variant's coordinates. Examples of Reference Assemblies are: GRCh37 (build 37), hg18/Build 36, and hg17/Build 35.
Study_MappingCHARACTER VARYINGYesThis field indicates whether this was the original Variant Mapping that was published as part of a Study's dataset or whether the mapping was a result of mitrating the Study-published mapping to another Reference Assembly (e.g. using the LiftOver tool to migrate from Hg17 to Hg18). If this is the original Variant Mapping as published within a Study, the value in this field is "Y"; if it is not the original Variant Mapping, the value in this field will be "N" or blank.
Placement_MethodCHARACTER VARYINGNoThis is a short term used (by DGVa and dbVar) to describe how a Variant was mapped to the genome. The default value is "Genomic" which describes the mapping provided by the Authors of a Study (i.e. the Submitter of the Study's data to dbVar/DGVa). Other terms include: Artifact, DGV, Clone Registry, Clone ad-hoc, Cytogenetic, Flanking sequences, Remapped auto, Remapped ad-hoc.
Remap_StatusCHARACTER VARYINGNoThis field captures a short descriptor of the status of a given Variant Mapping after it has been forward mapped to a newer Reference Assembly. This is a value used by dbVar and DGVa and may not always be provided in a Study. Values that could be in this field are: Perfect (perfectly mapped), Deletion (a deletion occured in the remapping), Insertion (an insertin occured in the remapping), No map (the placement did not map).
RecipientCHARACTER VARYINGNoThis field is specific to DGVa and dbVar and is captured here to provide full context of a Variant Mapping. For Variants that span across more than one locus (usually across Chromosomes), the value in this field describes whether that second location is either "known" or "ambiguous".
Unmapped_FlagCHARACTER VARYINGNo A Variant might not map to a given Reference Assembly when that Variant is re-mapped, or migrated, to a newer version of a Reference Assembly (e.g. using the UCSC LiftOver tool to re-map/migrate from Hg17 to Hg18). For Variants that do not map to a given Reference Assembly, the value in this field will be "Y". If a Variant was successfully mapped to a Reference Assembly, this fileld will be NULL. The current DGV pipeline may not use this field.
SpanINTEGERNoThis field captures the Span of a Variant, if available. The Span is be the distance between the mapped start and mapped end of the variant on the reference assembly. For example, in a PEM study, the clone end anchors will delineate the start-end of variant, in aCGH, the first/last probe with signal intensity difference will delineate the variant span.
SizeINTEGERNoThis field captures the Size of the Variant, if available. The Size is be the number of basepairs contained within the variant. In many cases the size will be equal to the span. For PEM and insertions into the reference, these values may be different, for aCGH etc, they will be the same.
Estimated_SizeINTEGERNoThis field captures the Estimated Size of a Variant, if available. The Estimated Size will provide an approximate size, depending on the methods and resolution of the various approaches within a study. For example, in the case where an insertion is called using a PEM strategy, the ends map to the reference at a distance less than expected by the known fragment insert size (in some studies less than 3 SD from the mean mapping size). Based on the size of the insert library (can't capture insertion larger than clone) and the SD (ability to resolve a small insertion from the inherent variability of insert sizes within that library), a max-min size can be estimated without sequencing the actual insert.
DGV_Calculated_SizeINTEGERNoIf no Size information was reported with a Variant, the DGV-pipeline may calculate the Size of the Variant using its Start and End coordinates. This calculation is only done if determined appropriate for a given Study. The Size is calculated as such: 'End' - 'Start' + 1
Chromosome_NumberCHARACTER VARYINGNoThis is the primary Chromosome on which a Translocation Variant has been discovered.
Start_BINTEGERNoThis is the Average Start Boundary of a Variant. Alternatively, if only one Start Coordinate has been provided for a Variant, it will be stored here.
End_BINTEGERNoThis is the Average End Boundary of a Variant. Alternatively, if only one End Coordinate has been provided for a Variant, it will be stored here.
Inner_Start_BINTEGERNoThis is the Minimum Start Boundary of a Variant.
Inner_End_BINTEGERNoThis is the Minimum End Boundary of a Variant.
Outer_Start_BINTEGERNoThis is the Maximum Start Boundary of a Variant.
Outer_End_BINTEGERNoThis is the Maximum End Boundary of a Variant.
Chromosome_Random_NumCHARACTER VARYINGNoThis field is to capture coordinates for variants that cannot be mapped on a chromosome but for which we know to what chromosome the variant belongs.
Start_RandomCHARACTER VARYINGNoThis is the Start coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs.
End_RandomCHARACTER VARYINGNoThis is the End coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs.
DGV_StartINTEGERNoThis field is for internal use only. This and the 'DGV_End' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline.
DGV_EndINTEGERNoThis field is for internal use only. This and the 'DGV_Start' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline.
Variant_NumberINTEGERYesThis is a sequential number assigned to a Variant within a reported Study and is not the database identifier for that Variant (which is the 'Variant.Variant_ID'). For a Translocation Variant, the mapping for each part of the Translocation will be assigned the same Variant Number in the input file. In conjunction with the 'Variant_ID', a user will be able to determine which records/mappings group together to represent one Translocation Variant.
Study_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV.
Source_SystemCHARACTER VARYINGNoThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPNoThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Mapping, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Variant_Translocation_MappingNon IdentifyingVariantTranslocation_MappingZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Translocation_Mapping_1Variant_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullTranslocation_Mapping_ID
Column ConstraintNot NullVariant_ID
Column ConstraintNot NullReference_Assembly
Column ConstraintNot NullStudy_Mapping
Column ConstraintDefaultY
Column ConstraintNot NullVariant_Number
Column ConstraintNot NullStudy_ID
Column ConstraintDefaultY
PK_Translocation_MappingTable ConstraintPrimary KeyPRIMARY KEY (Translocation_Mapping_ID)
TCC_OuterStartA_Translocation_Mapping_2Table ConstraintCheckOuter_Start_B <= Outer_End_B
TCC_StartA_Translocation_Mapping_1Table ConstraintCheckStart_B <= End_B
Variant_Translocation_MappingTable ConstraintForeign KeyFOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID)
Entity: Variant
Entity details:
DescriptionThis table captures all Variants reported in Studies that have been loaded to the DGV. This table stores both Supporting and Merged Variants. For Studies that were accessioned by DGVa, the Variants are assigned a DGVa accession number that starts with "e"; for Studies that were accessioned by dbVar, the Variants are assigned a dbVar accession number that starts with "n". For example, a Supporting Variant, in DGVa will be assigned an accession number like "essv12345"; merged Variants, or what DGVa and dbVar call "asserted" Variants, would be assigned an accession numer like "esv45678". For dbVar, these examples, instead, would be, respectively, "nssv12345" and "nsv45678".
Primary key constraint namePK_Variant
Attributes:
KeyAttribute nameData typeNot nullDescription
PKVariant_IDSERIALYesThis is the unique identifier for a Variant in the DGV.
Variant_NumberINTEGERYesThis is a sequential number assigned to each Variant discovered or validated in a Study. This value is not unique in this table or across the database. For example, if five Variants were discovered in a Study, each Variant (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
Sample_Merged_FlagCHARACTER VARYINGYesThis field distinguishes whether a Variant is a Supporting Variant or a Merged Variant. If the Variant is Ssupporting, the value in this field will be "S". If the Variant is Merged, the value in this field will be "M".
Study_IDINTEGERNoThis is the unique database identifier that represents one Study in the DGV.
Sample_Study_IDINTEGERNoThis is the unique identifier in the DGV for the Sample in which the Variant was found in a given Study. It is the unique identifier from the 'Sample_Study' table. This can be used to cross-reference one record in the 'Sample_Study' table. NB this value represents an instance of a Sample as it is analyzed in one Study. The same Sample could be analyzed in different Studies, each instance having a different 'Sample_Study_ID', but the same 'Sample_ID'.
Validation_FlagCHARACTER VARYINGNoThis field captures whether a Variant has been Validated or not, and the results of the Validation. A Variant needs to be 'Discovered' in a Sample before it can be 'Validated'. A Variant is Validated by performing further assays. -- Flag Values are as follows: "Y" means that a Validation Method was applied and the Variant was validated "A" means that a Validation Method was attempted but the Variant was not validated "N" means that no Validation Method was applied or no Validation information was provided for a given Variant NB: Multiple Validation Methods can be applied to one Variant; if any of those methods yeilds a positive validation result, the Validation_Flag="Y"
Validation_MethodCHARACTER VARYINGNoIf a Discovered Variant is also Validated, and information about Validation is provided for that Variant, the Validation Method (if available) is captured here. The list of Validation Methods can be found in the 'Exchange.XSD' file as the controlled list "MethodTypeCV"
Validation_Platform_NameCHARACTER VARYINGNoIf a Discovered Variant is also Validated, and information about Validation is provided for that Variant, the Validation Platform (if available) is captured here.
LandmarkCHARACTER VARYINGNoThis field will capture any 'Landmark' that was used to discover a Variant. Typically the Landmark is a Clone employed to find the location of a Variant on the genome. Thus the most common value in this field will be the Clone Name. E.g. "RP11-420B6"; "AL031643".
InheritanceCHARACTER VARYINGNoThis is to indicate whether a Variant is inherited or otherwise. The value in this field could be for example: "Maternal", "Paternal", "Germline", "De novo", "Somatic", "Not tested", "Tested - inconclusive", "Biparental", or "Uniparental". This informaiton may not always be provided in a published dataset.
SequenceCHARACTER VARYINGNoThe sequence of the variant, if available, is captured in this field. Alternatively, this field may not contain the actual sequence but rather may be a pointer to another table or file where the actual sequence is stored.
Probe_CountINTEGERNoThis field refers to the number of probes on the array, within the region called as a Variant.
Merging_CriteriaTEXTNoIf a Variant is a Merged Variant, a brief description of the criteria used to merge Supporting Variants to the Merged Variants will be provided here.
EthnicityCHARACTER VARYINGNoIn the case where a Variant is reported with Ethnicity information but with no corresponding Sample information, Ethnicity will be captured here.
GenderCHARACTER VARYINGNoIn the case where a Variant is reported with Gender information but with no corresponding Sample information, Gender will be captured here. The value in this field can be "Male", "Female", or "Unknown", or NULL.
Cohort_NameCHARACTER VARYINGNoThis is the Cohort to which a Variant belongs; this information may or may not be provided in a Study. This information is more commonly found in the Sample related tables ('Sample', 'Sample_Xref') but could also be captured here in cases when a Variant is reported with no associated Sample. Examples of Cohorts are "HapMap" or "Human Genome Diversity Project".
Platform_NameCHARACTER VARYINGNoIf the Platform is known for which a given Variant is associated, this information may be captured here. The value in this field is the name of the Platform used in analyzing a given Sample to discover or validate the Structural Variants described in this table.
DGV_Merged_FlagCHARACTER VARYINGNoA value of "Y" in this field indicates that this is a Variant that was merged using the DGV-merging criteria (currently 70% reciprocal overlap). If the value in this field is blank or "N", this Variant is not a DGV-merged variant. If the 'Supporting_Merged_Flag' = "S" and 'DGV_Merged_Flag = "N", this is a Supporting Variant that was reported in a Study. There should be no records in this table with a 'Supporting_Merged_Flag' = "S" and 'DGV_Merged_Flag = "Y".
DGVA_Inferred_FlagCHARACTER VARYINGNoSome Studies do not submit sample-level calls (SSVs) to DGVa or dbVar, but rather only merged calls, or "asserted Variants" (SVs). Thus during accessioning of these Studies, DGVa or dbVar will create what DGVa calls "curated" SSVs to support the submitted SVs; DGV calls these "inferred Variants". A typical scenario in which dbVar or DGVa would created these Inferred Variants is to assign Sample IDs to a Merged Variant (one Inferred Variant per Sample ID). In order to distinguish between SSVs that are submitted as part of a Study from SSVs that are created by DGVa/dbVar during the accessioning process, DGV has established this flag (DGVA_Inferred_Flag). If an SSV is created by DGVa/dbVar during the accessioning process, this flag will be set to "Y" (yes); if an SSV was submitted as part of a Study's data submission, this flag will be set to "N" (no). These Variants are filtered out in the DGV pipeline.
CommentsCHARACTER VARYINGNoIf there is any supplemental information about a Variant that cannot be populated into any of the other fields in the 'Variant' table, that information will go here. e.g. "does not validate experimentally" may need to be appended to/annotated for a Variant
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Sample_LevelNon IdentifyingVariantSupporting_Merged_VariantZero Or More
Variant_Supporting_Merged_VariantNon IdentifyingVariantSupporting_Merged_VariantZero Or More
Variant_Translocation_MappingNon IdentifyingVariantTranslocation_MappingZero Or More
Variant_Variant_AnalysisNon IdentifyingVariantVariant_AnalysisZero Or More
Variant_Variant_MappingNon IdentifyingVariantVariant_MappingZero Or More
Variant_Variant_TypeNon IdentifyingVariantVariant_TypeZero Or More
Variant_Variant_XrefNon IdentifyingVariantVariant_XrefZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Variant_1Variant_IDUNIQUE
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullVariant_ID
Column ConstraintNot NullVariant_Number
Column ConstraintNot NullSample_Merged_Flag
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_VariantTable ConstraintPrimary KeyPRIMARY KEY (Variant_ID)
Entity: Variant_Analysis
Entity details:
DescriptionOne or more Variants can be related to one or more 'Dataset_Analysis' records. This table resolves the M:M relationship between the 'Dataset_Analysis' and 'Variant' tables. This is in order to support the relationships defined in the dbVar/DGVa XSD. For each 'Dataset_Analysis' record to which a Variant is associated, this table will store a record for each instance of this. For example, Variant_1 is associated to Dataset_Analysis 1, 2, and 3. So in the 'Variant_Analysis' table there would be three records, each containing the same 'Variant_ID' and each with a different 'Dataset_Analysis_ID', i.e. 1, 2, or 3.
Primary key constraint namePK_Variant_Analysis
Attributes:
KeyAttribute nameData typeNot nullDescription
PKVariant_Analysis_IDSERIALYesThis is the unique identifier that represents one record in this table; i.e. one instance of a Dataset-Analysis and associated Variant.
FKDataset_Analysis_IDINTEGERYesThis uniquely identifies one Dataset that is compared to one Reference within a Study by one Analysis.
FKVariant_IDINTEGERYesThis is the unique identifier for a Variant in the DGV.
Study_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Analysis, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Dataset_Analysis_Variant_AnalysisNon IdentifyingDataset_AnalysisVariant_AnalysisZero Or More
Variant_Variant_AnalysisNon IdentifyingVariantVariant_AnalysisZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Variant_Analysis_1Dataset_Analysis_ID
IDX_Variant_Analysis_2Variant_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullVariant_Analysis_ID
Column ConstraintNot NullDataset_Analysis_ID
Column ConstraintNot NullVariant_ID
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
Dataset_Analysis_Variant_AnalysisTable ConstraintForeign KeyFOREIGN KEY (Dataset_Analysis_ID) REFERENCES Dataset_Analysis(Dataset_Analysis_ID)
PK_Variant_AnalysisTable ConstraintPrimary KeyPRIMARY KEY (Variant_Analysis_ID)
Variant_Variant_AnalysisTable ConstraintForeign KeyFOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID)
Entity: Variant_Mapping
Entity details:
DescriptionThis table captures the mapping coordinates of a Variant. The purpose of capturing these data in a separate table is to afford mappings to any of the human genome reference assemblies (e.g. hg17, hg18) for a given Variant
Primary key constraint namePK_Variant_Mapping
Attributes:
KeyAttribute nameData typeNot nullDescription
PKVariant_Mapping_IDSERIALYesThis uniquely represents one mapping for one Variant stored in the DGV. This table allows for multiple mappings of one Variant; e.g. to two iterations of the human genome reference assembly
FKVariant_IDINTEGERYesThis is the unique identifer for a Variant in the DGV. The 'Variant_ID' can appear more than once in this table if that Variant has been mapped to more than one reference assembly of the human genome. This can be used to cross-reference one record in the 'Variant' table.
Reference_AssemblyCHARACTER VARYINGNoThis is the Reference Assembly that was used to map a variant's coordinates. Examples of Reference Assemblies are: GRCh37 (build 37), hg18/Build 36, and hg17/Build 35.
Study_MappingCHARACTER VARYINGYesThis field indicates whether this was the original Variant Mapping that was published as part of a Study's dataset or whether the mapping was a result of migrating the Study-published mapping to another Reference Assembly (e.g. using the LiftOver tool to migrate from Hg17 to Hg18). If this is the original Variant Mapping as published within a Study, the value in this field is "Y"; if it is not the original Variant Mapping, the value in this field will be "N" or blank.
Placement_MethodCHARACTER VARYINGNoThis is a short term used (by DGVa and dbVar) to describe how a Variant was mapped to the genome. The default value is "Genomic" which describes the mapping provided by the Authors of a Study (i.e. the Submitter of the Study's data to dbVar/DGVa). Other terms include: Artifact, DGV, Clone Registry, Clone ad-hoc, Cytogenetic, Flanking sequences, Remapped auto, Remapped ad-hoc.
Remap_StatusCHARACTER VARYINGNoThis field captures a short descriptor of the status of a given Variant Mapping after it has been forward mapped to a newer Reference Assembly. This is a value used by dbVar and DGVa. Values that could be in this field are: Perfect (perfectly mapped), Deletion (a deletion occured in the remapping), Insertion (an insertin occured in the remapping), No map (the placement did not map).
RecipientCHARACTER VARYINGNoThis field is specific to DGVa and dbVar and is captured here to provide full context of a Variant Mapping. For Variants that span across more than one locus (usually across Chromosomes), the value in this field describes whether that second location is either "known" or "ambiguous".
Unmapped_FlagCHARACTER VARYINGNoA Variant might not map to a given Reference Assembly when that Variant is re-mapped, or migrated, to a newer version of a Reference Assembly (e.g. using the UCSC LiftOver tool to re-map/migrate from Hg17 to Hg18). For Variants that do not map to a given Reference Assembly, the value in this field will be "Y". If a Variant was successfully mapped to a Reference Assembly, this fileld will be NULL. The current DGV pipeline may not use this field.
SpanINTEGERNoThis field captures the Span of a Variant, if available. The Span is be the distance between the mapped start and mapped end of the variant on the reference assembly. For example, in a PEM study, the clone end anchors will delineate the start-end of variant, in aCGH, the first/last probe with signal intensity difference will delineate the variant span.
SizeINTEGERNoThis field captures the Size of the Variant, if available. The Size is be the number of basepairs contained within the variant. In many cases the size will be equal to the span. For PEM and insertions into the reference, these values may be different, for aCGH etc, they will be the same.
Estimated_SizeINTEGERNoThis field captures the Estimated Size of a Variant, if available. The Estimated Size will provide an approximate size, depending on the methods and resolution of the various approaches within a study. For example, in the case where an insertion is called using a PEM strategy, the ends map to the reference at a distance less than expected by the known fragment insert size (in some studies less than 3 SD from the mean mapping size). Based on the size of the insert library (can't capture insertion larger than clone) and the SD (ability to resolve a small insertion from the inherent variability of insert sizes within that library), a max-min size can be estimated without sequencing the actual insert.
DGV_Calculated_SizeINTEGERNoIf no Size information was reported with a Variant, the DGV-Curation process may calculate the Size of the Variant using its Start and End coordinates. This calculation is only done if determined appropriate for a given Study. The Size is calculated as such: 'End' - 'Start' + 1
Chromosome_NumberCHARACTER VARYINGNoThis is the primary Chromosome on which a Variant has been mapped. The 'Chromosome_B_Number' captures the secondary Chromosome on which a Variant has been discoverd, for example, in the case of a translocation.
Start_AINTEGERNoThis is the Average Start Boundary of a Variant. Alternatively, if only one Start Coordinate has been provided for a Variant, it will be stored here.
End_AINTEGERNoThis is the Average End Boundary of a Variant. Alternatively, if only one End Coordinate has been provided for a Variant, it will be stored here.
Inner_Start_AINTEGERNoThis is the Minimum Start Boundary of a Variant.
Inner_End_AINTEGERNoThis is the Minimum End Boundary of a Variant.
Outer_Start_AINTEGERNoThis is the Maximum Start Boundary of a Variant.
Outer_End_AINTEGERNoThis is the Maximum End Boundary of a Variant.
Chromosome_Random_NumCHARACTER VARYINGNoThis field is to capture coordinates for variants that cannot be mapped on a chromosome but for which we know to what chromosome the variant belongs.
Start_RandomCHARACTER VARYINGNo This is the Start coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs.
End_RandomCHARACTER VARYINGNoThis is the End coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs.
DGV_StartINTEGERNoThis field is for internal use only. This and the 'DGV_End' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline.
DGV_EndINTEGERNoThis field is for internal use only. This and the 'DGV_Start' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline.
Study_IDINTEGERNoThis is the unique database identifier that represents one Study in the DGV.
Variant_NumberINTEGERNoThis is a sequential number assigned to a Variant within a reported Study and is not the database identifier for that Variant (which is the 'Variant.Variant_ID'). For example, if five Variants were discovered in a Study, each Variant (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
Source_SystemCHARACTER VARYINGNoThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPNoThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Mapping, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Variant_Variant_MappingNon IdentifyingVariantVariant_MappingZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Variant_Mapping_1Variant_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullVariant_Mapping_ID
Column ConstraintNot NullVariant_ID
Column ConstraintNot NullStudy_Mapping
Column ConstraintDefaultY
Column ConstraintDefaultY
PK_Variant_MappingTable ConstraintPrimary KeyPRIMARY KEY (Variant_Mapping_ID)
TCC_OuterStartA_Variant_Mapping_2Table ConstraintCheckOuter_Start_A <= Outer_End_A
TCC_StartA_Variant_Mapping_1Table ConstraintCheckStart_A <= End_A
Variant_Variant_MappingTable ConstraintForeign KeyFOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID)
Entity: Variant_Type
Entity details:
DescriptionThis table contains a list of all possible Variant Types that can be attributed to a given Variant. A Variant could have more than one type, e.g. Inversion + Deletion. As technologies develop further, more refined Variant Type descriptions could arise. This table will allow for multiple Variant Types to be ascribed to one Variant. NB The ICGC nomenclature already describes Variant Types with sub-categories. This is an example of how a Variant could be described with multiple Variant Types.
Primary key constraint namePK_Variant_Type
Attributes:
KeyAttribute nameData typeNot nullDescription
PKVariant_Type_IDSERIALYesThis is the unique identifier to represent one instance of a Variant Type in the DGV.
FKVariant_IDINTEGERYesThis is the unique identifer that represents one Variant in the DGV. It is the Variant that is described by the 'Variant_Type'. This can be used to cross-reference one record in the 'Variant' table.
Variant_TypeCHARACTER VARYINGYesThis field will capture the Variant Type ascribed to one Varian. Typically a Variant will only be assigned one Variant Type; i.e. "CNV" or "Other"; a Variant can also be assigned a 'Variant_Sub_Type'; e.g. "Gain" or "Complex". A Variant could be of more than one Type, e.g. CNV + Other. If more than one Variant Type is ascribed to a Variant, that Variant will have as many records in this table as are Types assigned to that Variant, each record having the same 'Variant_ID'.
Variant_Sub_TypeCHARACTER VARYINGNoThis field will capture the Variant Sub-Type ascribed to one Variant. Typically a Variant will be assigned only one 'Variant_Sub_Type'. A Variant could have more than one Type+Sub-Type, e.g. CNV+Duplication and Other+Inversion. If more one Variant Type+Sub-Type is ascribed to a Variant there will be as many records in this table as there are for that number of combinations of Type+Sub-Type assigned to that Variant, each record having the same 'Variant_ID'.
Copy_Number_CountINTEGERNoFor Variants that are of type "CNV", this field indicates the number of copies of a Variant sequence. If information about each copy of a CNV is available, that information will be captured in the 'Copy_Number' table (1:M relationship with the 'Variant_Type' table). This information may not be provided in a published dataset. If this information is available, the value in this field willl typically be an absolute copy number state.
Variant_Type_DescriptionCHARACTER VARYINGNoThis field provides further information about a Variant Type by way of a short description of the Variant Type. An example of a Variant Type description for an Insertion is: "Insertion of DNA sequence of known position relative to a reference sequence, detected by sequence/paired-end-mapping technology".
GenotypeCHARACTER VARYINGNoThis is the genotype of a particular Variant as described for a given individual. This information may not be provided in a published dataset.
ZygosityCHARACTER VARYINGNoThis value is related to the Variant Sub-Type. The allowable values in this field are "Heterozygous", "Homozygous", or "Hemizygous". For example, if a Variant Type = "CNV" and Sub-Type = "Deletion", the 'Zygosity' could thus be included as either "Heterozygous" or "Homozygous".
Number_of_GainsINTEGERNoThis value applies only to Merged Variant calls, aka Regions or Events. This refers to the number of individuals that have a copy number gain within a merged CNV event or region.
Number_of_LossesINTEGERNoThis value applies only to Merged Variant calls, aka Regions or Events. This refers to the number of individuals that have a copy number loss within a merged CNV event or region.
CountINTEGERNoThis is a value related to the Frequency of a Variant. This value is captured during DGVa/dbVar archiving and is described by DGVa/dbVar as: "Number of samples that reported the structural variant".
Total_ReportingINTEGERNoThis is a value related to the Frequency of a Variant. This value is captured during DGVa/dbVar archiving and is described by DGVa/dbVar as: "Number of samples assessed for structural variant".
Reported_FrequencyNUMERICNoThis is a value related to the Frequency of a Variant. This value is captured during DGVa/dbVar archiving and is described by DGVa/dbVar as: "Frequency of variant" This information may not be provided in a published dataset.
Calculated_FrequencyINTEGERNoThis field will serve as a placeholder in the case that DGV develops a frequency calculation step in the automated DGV-curation pipeline. This is a frequency that would be calculated during the DGV data curation process. In cases where no 'Reported_Frequency' is available, a Calculated Frequency may be derived from other information that was reported in a Study. The 'Calculated_Frequency' has an additional descriptor ('Calc_Frequency_Qualifier') to distinguish between the frequency as being "type" (number of Gains or Losses within a Study or Cohort) or "allele" (frequency of the allele in a Variant set).
Calc_Frequency_QualifierCHARACTER VARYINGNoThis field will serve as a placeholder in the case that DGV develops a frequency calculation step in the automated DGV-curation pipeline. The 'Calc_Frequency_Qualifier' is an additional descriptor of 'Calculated_Frequency' to distinguish between the frequency as being either: "type" = number of Gains or Losses within a Study or Cohort OR "allele" = frequency of the allele in a Variant set This is a value that is assigned during the DGV data curation process.
Number_of_Gains_and_LossesINTEGERNoThis value applies only to Merged Variant calls, aka Regions or Events. This refers to the total number of individuals that have a copy number change within a merged CNV event or region.
CommentsCHARACTER VARYINGNoIf further details about the Variant Type are provided, those details could be captured in this field.
Study_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Type, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Variant_Type_Copy_NumberNon IdentifyingVariant_TypeCopy_NumberZero Or More
Variant_Variant_TypeNon IdentifyingVariantVariant_TypeZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Variant_Type_1Variant_ID
IDX_Variant_Type_2Variant_Type_IDUNIQUE
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullVariant_Type_ID
Column ConstraintNot NullVariant_ID
Column ConstraintNot NullVariant_Type
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_Variant_TypeTable ConstraintPrimary KeyPRIMARY KEY (Variant_Type_ID)
Variant_Variant_TypeTable ConstraintForeign KeyFOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID)
Entity: Variant_Xref
Entity details:
DescriptionThis table captures any external identifers used to reference a Variant that exists in the DGV. For example, DGVa 'esv' (asserted variant) and 'essv' (supporting variant) accession numbers would be recorded here. Those identifiers would cross-referecnce Variants in the DGVa. This table can also capture IDs from external databases for records that relate to a given Variant in this table. For example, if a Variant's sequence has been deposited to NCBI's Trace Archive (formally discontinued in February, 2011), that sequence will receive one or more TEMPLATE_IDs, e.g. TEMPLATE_ID = G248P82926C7. In this example, the 'External_ID_Source' = "Trace Archive" and the 'External_Variant_ID' = "G248P82926C7", with the corresponding 'Variant_ID' for that Variant from the 'Variant' table.
Primary key constraint namePK_Variant_Xref
Attributes:
KeyAttribute nameData typeNot nullDescription
PKVariant_Xref_IDSERIALYesThis is the unique identifer for one mapping of a Variant in the DGV to an external identifier.
FKVariant_IDINTEGERYesThis is the unique identifer from the 'Variant' table that represents one Variant in the DGV. This can be used to cross-reference one record in the 'Variant' table.
External_Variant_IDCHARACTER VARYINGYesThis field captures an exernal identifier that is used to describe a Variant that exists in the DGV. The external identifier could be an accession number, e.g. DGVa 'esv12345', an ID, or a Variant name, or the Variant ID assigned to a Variant within a Study (i.e. the Study's Variant ID).
External_ID_SourceCHARACTER VARYINGYesThis is the original source system from which the 'External_Variant_ID' comes. For example, this could be "DGVa" or "dbVar" or a PubMed ID.
External_ID_TypeCHARACTER VARYINGNoThis field will capture additional information about the external identifier, if it is provided in a Study. For example, DGVa may have a 'Variant Name'/'Submitter ID' and a 'Variant Accession Number'; this field will capture which label has been assigned to the identifier.
Study_IDINTEGERYesThis is the unique database identifier that represents one Study in the DGV.
Source_SystemCHARACTER VARYINGYesThis field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
Current_Record_FlagCHARACTER VARYINGNoThis value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
Load_DateTIMESTAMPYesThis field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Xref, etc.
Relationships:
Relationship nameTypeParentChildCardinality
Variant_Variant_XrefNon IdentifyingVariantVariant_XrefZero Or More
Indexes:
Index nameIndex columnsTypeSort order
IDX_Variant_Xref_1Variant_ID
Constraints:
Constraint nameTypeLevelConstraint
Column ConstraintDefaultnow()
Column ConstraintNot NullVariant_Xref_ID
Column ConstraintNot NullVariant_ID
Column ConstraintNot NullExternal_Variant_ID
Column ConstraintNot NullExternal_ID_Source
Column ConstraintNot NullStudy_ID
Column ConstraintNot NullSource_System
Column ConstraintDefaultY
Column ConstraintNot NullLoad_Date
PK_Variant_XrefTable ConstraintPrimary KeyPRIMARY KEY (Variant_Xref_ID)
TUC_Variant_Xref_1Table ConstraintUniqueUNIQUE (External_Variant_ID, External_ID_Source)
Variant_Variant_XrefTable ConstraintForeign KeyFOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID)