|
Entity: Analysis
Entity details:
Description | This table captures information about each Analysis that is performed in one Study.
A Study may perform one or more Analyses within that Study.
An Analysis is the comparison of the Study's Dataset to a Reference. |
Primary key constraint name | PK_Analysis |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Analysis_ID | SERIAL | Yes | This is the unique database identifer for one Analysis performed within a Study. |
FK | Study_ID | INTEGER | Yes | This is the unique identifier from the 'Study' table and represents the Study in which an instance of an Analysis was employed. |
| Analysis_Description | CHARACTER VARYING | No | This field captures a brief description of the Analysis employed in a Study. This information is not always provided in a Study.
Note: specific criteria used during the Analysis should be captured in the 'Analysis_Criteria' field and a general description captured in this field. |
| Analysis_Type | CHARACTER VARYING | No | If this information is provided, the Analysis Type will be captured in this field; this information is not always provided in a Study.
An example of an Analysis Type is "split-read mapping". |
| Analysis_Tool | CHARACTER VARYING | No | This field captures the name of the Analysis Tool employed in a given Analysis.
Examples of Analysis Tools are "Birdsuite", "CNAG", "Genemapper". |
| Analysis_Tool_Version | CHARACTER VARYING | No | This is a version number for the Analysis Tool that was used in a given Study. This information is not always provided for an Analysis Tool. |
| Analysis_Tool_Description | CHARACTER VARYING | No | This is a description of the Analysis Tool that was used in a given Study. This information is not always provided for an Analysis Tool. |
| Analysis_Tool_Type | CHARACTER VARYING | No | This field captures whether an Analysis Tool is an Algorithm or a Software Suite. This information is not always provided for a given Analysis Tool in a Study; DGVa and dbVar currently do not capture this information. |
| Analysis_Number | INTEGER | No | This is a sequential number assigned to each Analysis employed in a Study. This value is not unique in this table or across the database.
For example, if five Analyses were employed in a Study, each Analysis (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks. |
| Analysis_Criteria | CHARACTER VARYING | No | If the criteria to perform an Analysis are described in a Study, they are captured here. E.g. parameters set in an Analysis Tool. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Analysis_Study', etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Analysis_Dataset_Analysis | Non Identifying | Analysis | Dataset_Analysis | Zero Or More |
Analysis_Sub_Analysis | Non Identifying | Analysis | Merged_Analysis | Zero Or More |
Analysis_Super_Analysis | Non Identifying | Analysis | Merged_Analysis | Zero Or More |
Study_Analysis | Non Identifying | Study | Analysis | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Analysis_1 | Analysis_ID | UNIQUE | |
IDX_Analysis_2 | Study_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Analysis_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Analysis | Table Constraint | Primary Key | PRIMARY KEY (Analysis_ID) |
Study_Analysis | Table Constraint | Foreign Key | FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID) |
Entity: Cohort
Entity details:
Description | This table contains information about a given Cohort .
A Cohort could be part of one Study or a Cohort could be examined in more than one Study. For example, Samples from the HapMap Project or the Human Genome Diversity Panel are Cohorts that are examined in several Studies. |
Primary key constraint name | PK_Cohort |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Cohort_ID | SERIAL | Yes | This is the unique identifier for one Cohort that is in the DGV. |
| Cohort_Name | CHARACTER VARYING | No | This is the name given to a Cohort within a given Study.
Examples of Cohort Names are: "Human Genome Diversity Panel" or "HapMap". |
| Cohort_Description | CHARACTER VARYING | No | This is a short description of the Cohort (within a Study).
For example, the description can include details about what is common among the individuals within the Cohort, such as details about gender, age, etc. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally archived. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Cohort, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Cohort_Sample | Non Identifying | Cohort | Sample | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Cohort_1 | Cohort_ID | UNIQUE | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Cohort_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Cohort | Table Constraint | Primary Key | PRIMARY KEY (Cohort_ID) |
TUC_Cohort_1 | Table Constraint | Unique | UNIQUE (Cohort_Name) |
Entity: Copy_Number
Entity details:
Description | This table captures descriptions of each Copy of a Copy Number Variant (CNV). At present, this information is not provided in a Study (primarily due to existing technological constraints). When these data become available, they can be captured in this table. |
Primary key constraint name | PK_Copy_Number |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Copy_Number_ID | SERIAL | Yes | This uniquely identifies one Copy of a Sample Level Copy Number Variant (CNV) in the DGV. |
FK | Variant_Type_ID | INTEGER | Yes | This value represents one record from the 'Variant_Type' table and can be used to cross-reference a record in this table to its Variant (via 'Variant_ID') and associated Variant Type. |
| Copy_Number | CHARACTER VARYING | Yes | This is the sequence number of the Copy in a given CNV.
For example, if a CNV has 5 copies, there will be five records in this table, each representing one of the Copies of that CNV. For the five records in this example , 'Copy_Number' will have the values "1", "2", "3", "4", and "5".
To find the total number of copies in one CNV, perform a query on the 'Variant' table (can use the 'Variant_ID' from this table) and get the 'Copy_Number_Count'. |
| Copy_Description | CHARACTER VARYING | No | This is a desription of the structure, or composition of a given Copy of a CNV.
An example of a description is "inverted copy" or "copy with SNP". |
| Copy_Chromosome | CHARACTER VARYING | No | This is the Chromosome on which the Copy of the CNV is found. |
| Copy_Start | CHARACTER VARYING | No | This is the start coordinate for the Copy. |
| Copy_End | CHARACTER VARYING | No | This is the end coordinate for the Copy. |
| Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Copy_Number, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Variant_Type_Copy_Number | Non Identifying | Variant_Type | Copy_Number | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Copy_Number_1 | Variant_Type_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Copy_Number_ID |
| Column Constraint | Not Null | Variant_Type_ID |
| Column Constraint | Not Null | Copy_Number |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Copy_Number | Table Constraint | Primary Key | PRIMARY KEY (Copy_Number_ID) |
TCC_Copy_Number_1 | Table Constraint | Check | Copy_Start <= Copy_End |
Variant_Type_Copy_Number | Table Constraint | Foreign Key | FOREIGN KEY (Variant_Type_ID) REFERENCES Variant_Type(Variant_Type_ID) |
Entity: Coverage
Entity details:
Description | This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently. |
Primary key constraint name | PK_Coverage |
Attributes:
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | 'NOW()' |
| Column Constraint | Not Null | ID |
PK_Coverage | Table Constraint | Primary Key | PRIMARY KEY (ID) |
Entity: Cytoband_Overlap
Entity details:
Description | This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently. |
Primary key constraint name | PK_Cytoband_Overlap |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | ID | SERIAL | Yes | This is the primary key (PK) for this table; each record in this table will be assigned a unique PK. |
| Variant_ID | INTEGER | No | |
| Cytoband | CHARACTER VARYING | No | |
| Reference_Assembly | CHARACTER VARYING | No | |
| Load_Date | TIMESTAMP | No | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | 'NOW()' |
| Column Constraint | Not Null | ID |
PK_Cytoband_Overlap | Table Constraint | Primary Key | PRIMARY KEY (ID) |
TUC_Cytoband_Overlap_1 | Table Constraint | Unique | UNIQUE (Variant_ID, Cytoband, Reference_Assembly) |
Entity: Dataset
Entity details:
Description | This table represents an instance of output, or Dataset, generated by processing one or more Samples on a given Platform (employing a given Method) within a Study.
One Study can have many Datasets. |
Primary key constraint name | PK_Dataset |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Dataset_ID | SERIAL | Yes | This field uniquely represents one Dataset that was generated in one Study.
A Dataset is the output of examining one or more Samples on a Platform. |
| Dataset_Name | CHARACTER VARYING | No | If a Name has been assigned to a Dataset within a Study, that Name would be captured here. Currently DGVa and dbVar do not capture this information. |
| Dataset_Number | INTEGER | No | This is a sequential number assigned to each Dataset generated in a Study. This value is not unique in this table or across the database.
For example, if five Datasets were generated in a Study, each Dataset (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks. |
| Dataset_Description | CHARACTER VARYING | No | This field captures a brief description of the Dataset analyzed in a Study. This information is currently not captured by DGVa or dbVar.
For example, if a Study is performing a re-analysis of a Dataset generated in a previous Strudy, such details would be captured in this Description field. |
FK | Study_ID | INTEGER | No | This is the unique identifier for a Study from the 'Study' table. This refers to the Study in which the Dataset was generated. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Dataset, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Dataset_Dataset_Analysis | Non Identifying | Dataset | Dataset_Analysis | Zero Or More |
Dataset_Dataset_Xref | Non Identifying | Dataset | Dataset_Xref | Zero Or More |
Dataset_Method_Platform_Sample | Non Identifying | Dataset | Method_Platform_Sample | Zero Or More |
Dataset_Reference | Non Identifying | Dataset | Reference | Zero Or More |
Study_Dataset | Non Identifying | Study | Dataset | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Dataset_1 | Study_ID | | |
IDX_Dataset_2 | Dataset_ID | UNIQUE | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Dataset_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Dataset | Table Constraint | Primary Key | PRIMARY KEY (Dataset_ID) |
Study_Dataset | Table Constraint | Foreign Key | FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID) |
TUC_Dataset_1 | Table Constraint | Unique | UNIQUE (Dataset_Number, Study_ID) |
Entity: Dataset_Analysis
Entity details:
Description | This table describes each Dataset that is compared to one Reference during one Analysis within one Study.
One or more Analyses may be performed on the same Dataset in a Study. One or more Datasets may be Analyzed in one Study.
An Analysis is the comparison of the Study's Dataset to a Reference using an Analysis Tool. The output of the Analysis is the Variant set (described primarily in the 'Variant' table and secondarily in the 'Variant_Type', 'Copy_Number', and 'Variant_Mapping' tables).
For example, if one Analyses examined three Datasets, there would be three records in this table, each having the same 'Analysis_ID' but different 'Dataset_ID'. |
Primary key constraint name | PK_Dataset_Analysis |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Dataset_Analysis_ID | SERIAL | Yes | This uniquely identifies one Dataset that is compared to one Reference within a Study by one Analysis. |
| Dataset_Analysis_Number | INTEGER | Yes | This is a sequential number assigned to each Dataset-Analysis employed in a Study. This value is not unique in this table or across the database.
For example, if five Dataset-Analysis were employed in a Study, each Dataset-Analysis (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks. |
FK | Analysis_ID | INTEGER | Yes | This is the unique identifer for one Analysis performed within a Study.
This can be used to cross-reference one record in the 'Analysis' table. |
FK | Reference_ID | INTEGER | Yes | This uniquely identifies a given Reference against which a Dataset is compared during an Analysis within one Study.
This can be used to cross-reference one record in the 'Reference' table. |
FK | Dataset_ID | INTEGER | Yes | This field uniquely represents one Dataset that was generated in one Study.
This can be used to cross-reference one record in the 'Dataset' table. |
FK | Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Dataset_Analysis_Reference, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Analysis_Dataset_Analysis | Non Identifying | Analysis | Dataset_Analysis | Zero Or More |
Dataset_Analysis_Variant_Analysis | Non Identifying | Dataset_Analysis | Variant_Analysis | Zero Or More |
Dataset_Dataset_Analysis | Non Identifying | Dataset | Dataset_Analysis | Zero Or More |
Reference_Dataset_Analysis | Non Identifying | Reference | Dataset_Analysis | Zero Or More |
Study_Dataset_Analysis | Non Identifying | Study | Dataset_Analysis | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Dataset_Analysis_1 | Analysis_ID | | |
IDX_Dataset_Analysis_2 | Reference_ID | | |
IDX_Dataset_Analysis_3 | Dataset_ID | | |
IDX_Dataset_Analysis_4 | Study_ID | | |
IDX_Dataset_Analysis_5 | Dataset_Analysis_ID | UNIQUE | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Dataset_Analysis_ID |
| Column Constraint | Not Null | Dataset_Analysis_Number |
| Column Constraint | Not Null | Analysis_ID |
| Column Constraint | Not Null | Reference_ID |
| Column Constraint | Not Null | Dataset_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
Analysis_Dataset_Analysis | Table Constraint | Foreign Key | FOREIGN KEY (Analysis_ID) REFERENCES Analysis(Analysis_ID) |
Dataset_Dataset_Analysis | Table Constraint | Foreign Key | FOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID) |
PK_Dataset_Analysis | Table Constraint | Primary Key | PRIMARY KEY (Dataset_Analysis_ID) |
Reference_Dataset_Analysis | Table Constraint | Foreign Key | FOREIGN KEY (Reference_ID) REFERENCES Reference(Reference_ID) |
Study_Dataset_Analysis | Table Constraint | Foreign Key | FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID) |
Entity: Dataset_Xref
Entity details:
Description | This table captures any external identifers used to reference a Dataset that exists in the DGV. This information is currently not being captured by DGVa or dbVar.
For example, DGVa Dataset accession numbers would be recorded here. Those identifiers would cross-referecnce Datasets in the DGVa. |
Primary key constraint name | PK_Dataset_Xref |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Dataset_Xref_ID | SERIAL | Yes | This is the unique identifer for one mapping of a Dataset in the DGV to an external identifier. |
FK | Dataset_ID | INTEGER | Yes | This is the unique identifer from the 'Dataset' table that represents one Dataset in the DGV.
This can be used to cross-reference one record in the 'Dataset' table. |
| External_Dataset_ID | CHARACTER VARYING | Yes | A Study may give a Dataset a Name or Identifier-code; this could also be the identifier assigned by an external informaiton system (e.g. DGVa). This field will capture either one of these that has been assigned to a given Dataset. This information is currently not being captured by DGVa or dbVar.
For example, this could be an accession number, e.g. DGVa Dataset accession number, or the Dataset ID assigned in a Study, e.g. "A01". |
| External_ID_Source | CHARACTER VARYING | Yes | This is the original source system from which the 'External_Dataset_ID' comes. This information is currently not being captured by DGVa or dbVar.
For example, this could be "DGVa" or "dbVar", "Smith et al 2009" or a PubMed ID. |
| Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Dataset_Xref, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Dataset_Dataset_Xref | Non Identifying | Dataset | Dataset_Xref | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Dataset_Xref_1 | Dataset_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Dataset_Xref_ID |
| Column Constraint | Not Null | Dataset_ID |
| Column Constraint | Not Null | External_Dataset_ID |
| Column Constraint | Not Null | External_ID_Source |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
Dataset_Dataset_Xref | Table Constraint | Foreign Key | FOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID) |
PK_Dataset_Xref | Table Constraint | Primary Key | PRIMARY KEY (Dataset_Xref_ID) |
TUC_Dataset_Xref_1 | Table Constraint | Unique | UNIQUE (External_Dataset_ID, External_ID_Source) |
Entity: Feature_Overlap
Entity details:
Description | This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently. |
Primary key constraint name | PK_Feature_Overlap |
Attributes:
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | 'NOW()' |
| Column Constraint | Not Null | ID |
PK_Feature_Overlap | Table Constraint | Primary Key | PRIMARY KEY (ID) |
Entity: Filter
Entity details:
Description | This table stores all records that are filtered out during the DGV Filtering steps, which is part of the automated DGV-curation pipeline. Each record that is filtered out is recorded with its Primary Key (unique table ID), table name, reason for being filtered out, and at what step the record was filtered out. Timestamps on each record help keep track of when, specifically, a record was filtered out. |
Primary key constraint name | PK_Filter |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Filter_ID | SERIAL | Yes | This is the primary key of the Filter table; it uniquely represents a record in this table. |
| Run_ID | INTEGER | Yes | This identifier uniquely represents one run of the Filtering steps during automated DGV data curation. |
| Study_ID | INTEGER | Yes | This value represents the Study to which a given record in this table is associated.
For example, if "Variant 123" from "Study 1" was Filtered out because it did not meet the DGV minimum size threshold, the value in this field would be "1". |
| Table_Name | CHARACTER VARYING | Yes | This is the table from which the Filtered record originated.
For example, if a record from the 'Variant' table was filtered out, the value in this field would be "Variant". |
| Table_ID | INTEGER | Yes | This is the primary key for the record as it is represented in its original table.
For example, if "Variant 123" was filtered out, the value in this field would be "123". This value in conjunction with the 'Table_Name' uniquely represents a record in the database. (I.e. 'table_name.primary_key') |
| Filter_Step | INTEGER | Yes | The value in this field describes the step during the automated DGV curation process on which the record was filtered out.
For example, if the "Variant 123" record was filtered out during step 2 of the curation process, the value in this field would be "2". |
| Filter_Reason | CHARACTER VARYING | Yes | The value in this field represents the reason for which a record was filtered out during the automated DGV curation process.
For example, if "Variant 123" was filtered out becuase it did not meet the DGV minimum size threshold, the value in this field would be "too small". |
| Load_Date | TIMESTAMP | Yes | This is the timestamp that describes when a given record was loaded to this table. This is used for auditing and maintenance purposes. |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Filter_ID |
| Column Constraint | Not Null | Run_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Table_Name |
| Column Constraint | Not Null | Table_ID |
| Column Constraint | Not Null | Filter_Step |
| Column Constraint | Not Null | Filter_Reason |
| Column Constraint | Not Null | Load_Date |
| Column Constraint | Default | 'now()' |
PK_Filter | Table Constraint | Primary Key | PRIMARY KEY (Filter_ID) |
Entity: Gene_Overlap
Entity details:
Description | This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently. |
Primary key constraint name | PK_Gene_Overlap |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | ID | SERIAL | Yes | This is the primary key (PK) for this table; each record in this table will be assigned a unique PK. |
| Variant_ID | INTEGER | No | |
| Gene | CHARACTER VARYING | No | |
| Reference_Assembly | CHARACTER VARYING | No | |
| Load_Date | TIMESTAMP | No | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | 'NOW()' |
| Column Constraint | Not Null | ID |
PK_Gene_Overlap | Table Constraint | Primary Key | PRIMARY KEY (ID) |
TUC_Gene_Overlap_1 | Table Constraint | Unique | UNIQUE (Variant_ID, Gene, Reference_Assembly) |
Entity: Merged_Analysis
Entity details:
Description | Data from multiple Analyses can be merged and further Analyzed within one Study. This table captures the relationshjps between these multiple Analyses.
For example, Analysis 1, Analysis 2, and Analysis 3 could then be merged into Analysis 4. This table captures the relationship between Analyses 1, 2, and 3 and Analysis 4.
Analyses 1, 2, 3 are called "Sub Analyses" and Analysis 4 is called a "Super Analysis". |
Primary key constraint name | PK_Merged_Analysis |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Merged_Analysis_ID | SERIAL | Yes | The value in this field uniquely identifies one record in this table. This value represents one relationship between a "Super Analysis" and one of the "Sub Analyses" that comprise the "Super Analysis". |
FK | Sub_Analysis_ID | INTEGER | Yes | This represents the lower-level Analysis that was merged with another Analysis to create a higher-level Analysis, called the "Super Analysis". |
FK | Super_Analysis_ID | INTEGER | Yes | This represents the higher-level Analysis that is comprised of two or more lower-level Analyses, called "Sub Analyses". |
| Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated (or was generated in the case of a join table composed of only database identifiers). |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample_Merged_Variant, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Analysis_Sub_Analysis | Non Identifying | Analysis | Merged_Analysis | Zero Or More |
Analysis_Super_Analysis | Non Identifying | Analysis | Merged_Analysis | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Merged_Analysis_1 | Sub_Analysis_ID | | |
IDX_Merged_Analysis_2 | Super_Analysis_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Merged_Analysis_ID |
| Column Constraint | Not Null | Sub_Analysis_ID |
| Column Constraint | Not Null | Super_Analysis_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
Analysis_Sub_Analysis | Table Constraint | Foreign Key | FOREIGN KEY (Sub_Analysis_ID) REFERENCES Analysis(Analysis_ID) |
Analysis_Super_Analysis | Table Constraint | Foreign Key | FOREIGN KEY (Super_Analysis_ID) REFERENCES Analysis(Analysis_ID) |
PK_Merged_Analysis | Table Constraint | Primary Key | PRIMARY KEY (Merged_Analysis_ID) |
Entity: Method_Platform_Sample
Entity details:
Description | This table captures information about an instance of a Method and of a Platform applied to a given Sample within a Study. Also captured is the Dataset to which the Sample-Method-Platform combination belongs.
A Study can employ one or more Methods and one or more Platforms, in examining one or more Samples. Within one Study, the same Sample or Samples can be examined by one or more Methods or Platforms. This table captures the complex relationship between Method, Platform, and Sample. |
Primary key constraint name | PK_Method_Platform_Sample |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Method_Platform_Sample_ID | SERIAL | Yes | This is the unique identifer to repsesent one instance of a Sample that is examined using one Method and one Platform in a given Study. |
FK | Method_Study_ID | INTEGER | Yes | This is the unique identifier from the 'Method_Study' table that represents one instance of a Method being employed in one Study.
This can be used to cross-reference one record in the 'Method_Study' table. |
FK | Platform_Study_ID | INTEGER | Yes | This is the unique identifier from the 'Platform_Study' table that represents one instance of a Platform being employed in one Study.
This can be used to cross-reference one record in the 'Platform_Study' table. |
FK | Sample_Study_ID | INTEGER | Yes | This is the unique identifer within the DGV of one instance of a Sample being examined in one Study. |
FK | Dataset_ID | INTEGER | Yes | This is the unique identifier from the 'Dataset' table that describes to which Dataset a given Sample, analyzed by a given Method-Platform, belongs. |
| Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated. |
| Source_System | CHARACTER VARYING | Yes | 01.21.10 This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Method_Platform_Sample', etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Dataset_Method_Platform_Sample | Non Identifying | Dataset | Method_Platform_Sample | Zero Or More |
Method_Study_Method_Platform_Sample | Non Identifying | Method_Study | Method_Platform_Sample | Zero Or More |
Platform_Study_Method_Platform_Sample | Non Identifying | Platform_Study | Method_Platform_Sample | Zero Or More |
Sample_Study_Method_Platform_Sample | Non Identifying | Sample_Study | Method_Platform_Sample | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Method_Platform_Sample_1 | Platform_Study_ID | | |
IDX_Method_Platform_Sample_2 | Dataset_ID | | |
IDX_Method_Platform_Sample_3 | Sample_Study_ID | | |
IDX_Method_Platform_Sample_4 | Method_Study_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Method_Platform_Sample_ID |
| Column Constraint | Not Null | Method_Study_ID |
| Column Constraint | Not Null | Platform_Study_ID |
| Column Constraint | Not Null | Sample_Study_ID |
| Column Constraint | Not Null | Dataset_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
Dataset_Method_Platform_Sample | Table Constraint | Foreign Key | FOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID) |
Method_Study_Method_Platform_Sample | Table Constraint | Foreign Key | FOREIGN KEY (Method_Study_ID) REFERENCES Method_Study(Method_Study_ID) |
PK_Method_Platform_Sample | Table Constraint | Primary Key | PRIMARY KEY (Method_Platform_Sample_ID) |
Platform_Study_Method_Platform_Sample | Table Constraint | Foreign Key | FOREIGN KEY (Platform_Study_ID) REFERENCES Platform_Study(Platform_Study_ID) |
Sample_Study_Method_Platform_Sample | Table Constraint | Foreign Key | FOREIGN KEY (Sample_Study_ID) REFERENCES Sample_Study(Sample_Study_ID) |
Entity: Method_Study
Entity details:
Description | This table captures each instance of a Method employed within one Study. One Study may employ one or more Methods, and/or one or more instances of the same Method. |
Primary key constraint name | PK_Method_Study |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Method_Study_ID | SERIAL | Yes | This uniquely identifies within the DGV one instance of a Method employed in one Study.
For example, BAC_aCGH may be employed twice in one Study; each instance of this Method being employed in that Study would get its own unique 'Method_Study_ID' |
FK | Study_ID | INTEGER | Yes | This is the unique identifier from the 'Study' table and represents the Study in which an instance of a Method was employed. |
| Method_Name | CHARACTER VARYING | No | This is the name of a given Method employed in a Study; e.g. BAC_aCGH; MCD_analysis. |
| Method_Number | INTEGER | No | This is a sequential number assigned to each Method empoyed in a Study. This value is not unique in this table or across the database.
For example, if five Methods were generated in a Study, each Method (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks. |
| Method_Type | CHARACTER VARYING | No | This field indicates whether a Method employed in a given Study was either a 'Discovery Method' or a 'Validation Method'.
The values in this field are either "Discovery" or "Validation". |
| Method_Description | CHARACTER VARYING | No | This field captures a brief description of the Method employed in a Study. This information is not always provided for a given Study. |
| Primary_Data_Flag | CHARACTER VARYING | No | This field will capture whether the data being analyzed in a given Study is Primary or a Re-Analysis of Primary data that was generated in a different Study. This information is not always provided in a Study (and currently may not be captured by DGVa or dbVar).
The values in this field are, "Y", if the data is Primary data generated by the Study in question, or "N" if the data was generated in a different Study.
If the 'Primary_Data_Flag' is "N", to find the Method employed in the original Study that generated the data, query the 'Original_Method' field in this table. |
| Original_Method | CHARACTER VARYING | No | If data that was generated in a different Study is being re-analyzed/meta-analyzed in a new Study, the 'Primary_Data_Flag' is "N", indicating the data was originally generated in a previous Study. This field captures the Method used in the original Study to first generate the Dataset being analyzed in this Study.
This information is not always provided in a Study (and currently may not be captured by DGVa or dbVar). |
| Originators_of_Data | CHARACTER VARYING | No | The field should capture the original source of the data i.e. the group who ran the samples and generated the dataset.
This field could capture, e.g. the PMID, the primary author of a Study, the Site at which/organization by whom the original Study was performed.
This information is not always provided in a Study (and currently may not be captured by DGVa or dbVar). |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Method, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Method_Study_Method_Platform_Sample | Non Identifying | Method_Study | Method_Platform_Sample | Zero Or More |
Study_Method_Study | Non Identifying | Study | Method_Study | One Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Method_Study_1 | Study_ID | | |
IDX_Method_Study_2 | Method_Study_ID | UNIQUE | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Method_Study_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Method_Study | Table Constraint | Primary Key | PRIMARY KEY (Method_Study_ID) |
Study_Method_Study | Table Constraint | Foreign Key | FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID) |
TCC_Method_Study_1 | Table Constraint | Check | Method_Type = 'Validation' OR Method_Type = 'Discovery' |
Entity: Platform_Study
Entity details:
Description | This table captures each instance of a Platform being employed in a Study. One Study may employ one or more Platforms. |
Primary key constraint name | PK_Platform_Study |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Platform_Study_ID | SERIAL | Yes | This is the database unique identifier that represents one instance of a Platform being employed in one Study. |
FK | Study_ID | INTEGER | Yes | This is the unique identifier from the 'Study' table and represents the Study in which an instance of a Platform was employed. |
| Platform_Number | INTEGER | No | This is a sequential number assigned to each Platform employed in a Study. This value is not unique in this table or across the database.
For example, if five Platforms were employed in a Study, each Platform (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks. |
| Platform_Name | CHARACTER VARYING | No | This is the name used to refer to a Platform.
For example, "Affymetrix GeneChip Human Genome U95Av2 [HG_U95Av2]" |
| Platform_Description | CHARACTER VARYING | No | This is a short description of a given Platform.
Not all Platforms will necessarily have a description for them but if one is available, it will be captured here. |
| Platform_Type | CHARACTER VARYING | No | This field captures the Type of a given Platform. This information may not necessarily be provided in a Study.
|
| Platform_Version | CHARACTER VARYING | No | This describes the version of a given Platform. This information may not necessarily be provided in a Study. |
| GEO_Accession_Num | CHARACTER VARYING | No | This field will capture the identifier/accession number that represents a given Platform within the GEO data repository. This information is not always provided in a Study.
e.g. GEO accession number, GPL1352 |
| ArrayExpress_Accession_Num | CHARACTER VARYING | No | This field will capture the identifier/accession number that represents a given Platform within the ArrayExpress data repository. This information is not always provided in a Study.
e.g. ArrayExpress accession number, A-AFFY-65 |
| Source_System | CHARACTER VARYING | Yes | This field contains an identifier of the data repository in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Platform_Study', etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Platform_Study_Method_Platform_Sample | Non Identifying | Platform_Study | Method_Platform_Sample | Zero Or More |
Study_Platform_Study | Non Identifying | Study | Platform_Study | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Platform_Study_1 | Study_ID | | |
IDX_Platform_Study_2 | Platform_Study_ID | UNIQUE | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Platform_Study_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Platform_Study | Table Constraint | Primary Key | PRIMARY KEY (Platform_Study_ID) |
Study_Platform_Study | Table Constraint | Foreign Key | FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID) |
Entity: Reference
Entity details:
Description | This table describes the Reference applied to one Dataset in one Study.
One or more Datasets are compared against one or more References. Each instance of this is captured in this table.
A Reference is that to which the Dataset set in a given Study is compared by Analysis. |
Primary key constraint name | PK_Reference |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Reference_ID | SERIAL | Yes | This uniquely identifies a given Reference against which a Dataset is compared within one Study. |
FK | Dataset_ID | INTEGER | Yes | This field uniquely represents one Dataset that was generated in one Study. |
| Reference_Number | INTEGER | Yes | This is a sequential number assigned to each Reference employed in a Study. This value is not unique in this table or across the database.
For example, if five References were employed in a Study, each Reference (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks. |
| Reference_Type | CHARACTER VARYING | No | This is the type of Reference that could be employed in a Study.
The Reference type could be: "Ref_sequence", "Sampleset", "Sample", "Control_tissue", "Merged", or "Other". |
| Reference_Sample_ID | CHARACTER VARYING | No | If the Reference Type = "Sample", this field will capture the Sample identifier associated to that Sample, if one is provided.
For example, if one HapMap Sample is used as a Reference, the HapMap Sample ID (e.g. NA07019) could be provided in this field. |
| Reference_Sequence | CHARACTER VARYING | No | This field captures the specific Sequence name if the Reference Type = "Reference Sequence".
Examples of Sequence names are "HuRef" and "Hg18/Build36". |
| Reference_Description | CHARACTER VARYING | No | This is a brief description of the Reference, if one is available. |
FK | Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Reference_Dataset, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Dataset_Reference | Non Identifying | Dataset | Reference | Zero Or More |
Reference_Dataset_Analysis | Non Identifying | Reference | Dataset_Analysis | Zero Or More |
Study_Reference | Non Identifying | Study | Reference | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Reference_1 | Reference_ID | UNIQUE | |
IDX_Reference_2 | Dataset_ID | | |
IDX_Reference_3 | Study_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Reference_ID |
| Column Constraint | Not Null | Dataset_ID |
| Column Constraint | Not Null | Reference_Number |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
Dataset_Reference | Table Constraint | Foreign Key | FOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID) |
PK_Reference | Table Constraint | Primary Key | PRIMARY KEY (Reference_ID) |
Study_Reference | Table Constraint | Foreign Key | FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID) |
TUC_Reference_1 | Table Constraint | Unique | UNIQUE (Dataset_ID, Reference_Number) |
Entity: Reference_Assembly
Entity details:
Description | This table is used as a back-end to serve a drop down list on the new DGV website for a user to select a given Reference Assembly in which to view the data. |
Primary key constraint name | PK_Reference_Assembly |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Reference_Assembly_ID | SERIAL | Yes | This is the identifier that uniquely represents a given Reference Assembly in this table. |
| Reference_Assembly_Name | CHARACTER VARYING | No | This is the common name used for a given Reference Assembly; e.g. NCBI36/hg18 |
| Load_Date | TIMESTAMP | No | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Study, etc. |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | 'NOW()' |
| Column Constraint | Not Null | Reference_Assembly_ID |
PK_Reference_Assembly | Table Constraint | Primary Key | PRIMARY KEY (Reference_Assembly_ID) |
Entity: Sample
Entity details:
Description | This table contains all of the Samples that have been examined in any Study that is part of the DGV.
Some Studies do not report Sample IDs but instead only a Variant ID. In these cases, it will not be possible to associate a given Variant to a Sample. This would typically be due to the need to protect the privacy of the individual who provided the Sample e.g. in cases where there is non-consent to share Sample information |
Primary key constraint name | PK_Sample |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Sample_ID | SERIAL | Yes | This is the unique identifier of a given Sample in the DGV. This identifier represents a Sample within the DGV and does not represent the unique identifier of a Sample in other databases.
It is important to note that this identifier is to uniquely identify one Sample in the DGV. Many Studies may examine the same Sample, e.g. a HapMap Sample; this Sample should exist only once in this table but may be cross-referenced several times in the 'Sample_Study' and 'Sample_Xref' tables. we should only load that Sample once to the database |
| Biological_Source | CHARACTER VARYING | No | This field captures a brief description of the Biological Source of a Sample.
For example, the 'Biological_Source' could be "cell line", "blood", "tissue" etc.
Please note how this field relates to the 'Source_Type' field. |
| Source_Type | CHARACTER VARYING | No | This field describes the type of the Sample.
For example, a Sample could be of the Type "Genomic DNA", "RNA", "Clone library", etc.
Please note how this field relates to the 'Biological_Source' field. |
| Sample_Description | CHARACTER VARYING | No | This field will capture a Description of the Sample when necessary/if available.
For example, a Sample could be "a pool of 10 anonymous male donors". In these cases, a Description of the Sample is important, esp. with regards to representation of Sample/Subject numbers in the DGV (i.e. a pool of 10 anonymous donors translates from one Sample to 10 Subjects). |
| Subject_ID | CHARACTER VARYING | No | If a Subject identifier of some kind is provided in a Study - as associated to a given Sample, that Subject identifier will be captured here.
Note: one Subject can provide one or more Samples, each of which is captured in the Sample table. Also, in many Studies, the HapMap Sample ID (e.g. "NA19215") is also included as the Subject ID. |
| Karyotype | CHARACTER VARYING | No | This field will capture the Karyotype of a Sample.
One person can provide one or more Samples and each Sample may have a different Karyotype. For example, an individual with cancerous cells may have one Karyotype from a normal tissue Sample and another Karyotype from a cancer-tissue Sample.
Examples of Karyotypes are: 46, XY; 46, XX, 5p-; 46, XX, del(5)(p15.2) (the first two being normal karyotypes)
It should be noted that DGV currently only presents data for healthy controls so karyotype information will likely not be included for a given Sample. |
| Ethnicity | CHARACTER VARYING | No | This field captures the "Ethnicity" of a Sample.
"Ethnicity" is sometimes also referred to as "Population" and refers to the geographic origins of an individual. The use of the term 'Ethnicity' can vary between Studies.
Examples of 'Ethnicity' are: "Asian", "Yoruban", "European", "African" |
| Gender | CHARACTER VARYING | No | This field captures the gender of the Subject who provided the Sample.
The value in this field can be "Male", "Female", "Unknown", or NULL. |
| Family_ID | CHARACTER VARYING | No | This ID will represent a given Family whose data are stored in the DGV. The 'Family_ID' is submitted with a Study and not generated by the database. Thus it is possible that two different Families are assigned the same 'Family_ID', but not likely.
For example, where HapMap Samples are all from the same Family, the same 'Family_ID' will be assigned to each Family member's respective Sample. |
| Family_Type | CHARACTER VARYING | No | This field will describe the type of Family represented by Samples in this table. This field could describe whether a family is a trio, for example.
This field could also capture whether a Family is 'Simplex' or 'Multiplex'. |
| Family_Member | CHARACTER VARYING | No | This field will capture which member of a Family is represented by a given Sample.
For example, if a Sample is from the mother in a Family, the value in this field will be "mother", from the father, this value will be "father", etc.. |
| Mother_ID | CHARACTER VARYING | No | If a Sample is part of a Family and Family information is provided, this field would contain the ID of the Mother of the individual who provided this Sample.This information can be useful when looking at pedigree. |
| Father_ID | CHARACTER VARYING | No | If a Sample is part of a Family and Family information is provided, this field would contain the ID of the Father of the individual who provided this Sample
.
This information can be useful when looking at pedigree. |
FK | Cohort_ID | INTEGER | No | This is the Cohort to which a Sample belongs. This field may not be populated.
A Sample could belong to a commonly referenced Cohort such as HapMap or the Human Genome Diversity Panel. A Sample could also belong to a Cohort within a specific Study. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally archived. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Cohort_Sample | Non Identifying | Cohort | Sample | Zero Or More |
Sample_Sample_Study | Non Identifying | Sample | Sample_Study | One Or More |
Sample_Sample_Xref | Non Identifying | Sample | Sample_Xref | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Sample_1 | Cohort_ID | | |
IDX_Sample_2 | Sample_ID | UNIQUE | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Sample_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
Cohort_Sample | Table Constraint | Foreign Key | FOREIGN KEY (Cohort_ID) REFERENCES Cohort(Cohort_ID) |
PK_Sample | Table Constraint | Primary Key | PRIMARY KEY (Sample_ID) |
Entity: Sample_Pooled
Entity details:
Description | This table captures the mapping between a Pooled Sample and the individual Samples that comprise the Pooled Sample - when this information is provided for a given a Study. |
Primary key constraint name | PK_Sample_Pooled |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Sample_Pooled_ID | SERIAL | Yes | This field uniquely identifies one instance of a mapping between an individual Sample and a Pooled Sample. For each individual Sample that comprises a Pooled Sample in the database, a record will be stored in this table.
For example, if a Pooled Sample is comprised of ten individual Samples, there will be ten records stored in this table, each with the same 'Pooled_Sample_ID' (which can be cross-referenced with a 'Sample_Study_ID' in the 'Sample_Study' table) and ten different 'Sample_Study_ID's (each of which can be cross-referenced with a 'Sample_Study_ID' in the 'Sample_Study' table), each record will have a unique 'Sample_Pooled_ID'. |
FK | Individual_Sample_ID | INTEGER | Yes | This is the 'Sample_Study_ID' of one of the individual Samples that comprise a Pooled Sample. There will be two or more 'Individual_Sample_IDs' for every 'Pooled_Sample_ID' in this table. |
FK | Pooled_Sample_ID | INTEGER | Yes | This is the unique identifer of one Pooled Sample within the database. For each 'Pooled_Sample_ID' there will be two or more 'Individual_Sample_IDs'. |
| Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV, that Study in which the Pooled Samples were analyzed. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally archived. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample/Pooled Sample, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Sample_Individual_to_Pooled | Non Identifying | Sample_Study | Sample_Pooled | Zero Or More |
Sample_Pooled_to_Pooled | Non Identifying | Sample_Study | Sample_Pooled | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Sample_Pooled_1 | Individual_Sample_ID | | |
IDX_Sample_Pooled_2 | Pooled_Sample_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Sample_Pooled_ID |
| Column Constraint | Not Null | Individual_Sample_ID |
| Column Constraint | Not Null | Pooled_Sample_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Sample_Pooled | Table Constraint | Primary Key | PRIMARY KEY (Sample_Pooled_ID) |
Sample_Individual_to_Pooled | Table Constraint | Foreign Key | FOREIGN KEY (Individual_Sample_ID) REFERENCES Sample_Study(Sample_Study_ID) |
Sample_Pooled_to_Pooled | Table Constraint | Foreign Key | FOREIGN KEY (Pooled_Sample_ID) REFERENCES Sample_Study(Sample_Study_ID) |
Entity: Sample_Study
Entity details:
Description | This table resolves the many-to-many relationship between the Study and Sample tables. One Study may examine one or more Samples. One Sample may be examined in one or more Studies. |
Primary key constraint name | PK_Sample_Study |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Sample_Study_ID | SERIAL | Yes | This is the unique identifer within the DGV of one instance of a Sample being examined in one Study.
One Study may examine one or more Samples. One or more Samples may be examined by one or many Studies. This field is the unique identifer that resolves the many-to-many relationship between these entities (Sample and Study). |
FK | Sample_ID | INTEGER | Yes | This value uniquely identifies a Sample in the database and will cross-reference one record in the 'Sample' table. |
FK | Study_ID | INTEGER | Yes | This value uniquely identifies a Study in the database and will cross-reference one record in the 'Study' table. |
| Case_or_Control | CHARACTER VARYING | No | This field will capture whether a Sample is a Case or a Control in a Study. This field will not always be populated.
Note that for the current DGV only Controls, or healthy individuals, are part of the database. |
| Sample_Number | INTEGER | No | This is the number assigned to a Sample within a Study. E.g. if 100 Samples were examined in a Study, those Samples would be assigned a number from 1 to 100. The 'Sample_Number' is meant to label each Sample differently WITHIN a Study and is not a unique identifier across Studies. The 'Sample_ID' and 'Sample_Study_ID' uniquely identify a Sample within the database. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Sample_Study', etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Sample_Individual_to_Pooled | Non Identifying | Sample_Study | Sample_Pooled | Zero Or More |
Sample_Pooled_to_Pooled | Non Identifying | Sample_Study | Sample_Pooled | Zero Or More |
Sample_Sample_Study | Non Identifying | Sample | Sample_Study | One Or More |
Sample_Study_Method_Platform_Sample | Non Identifying | Sample_Study | Method_Platform_Sample | Zero Or More |
Study_Sample_Study | Non Identifying | Study | Sample_Study | One Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Sample_Study_1 | Sample_ID | | |
IDX_Sample_Study_2 | Study_ID | | |
IDX_Sample_Study_3 | Sample_Study_ID | UNIQUE | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Sample_Study_ID |
| Column Constraint | Not Null | Sample_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Sample_Study | Table Constraint | Primary Key | PRIMARY KEY (Sample_Study_ID) |
Sample_Sample_Study | Table Constraint | Foreign Key | FOREIGN KEY (Sample_ID) REFERENCES Sample(Sample_ID) |
Study_Sample_Study | Table Constraint | Foreign Key | FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID) |
TUC_Sample_Study_1 | Table Constraint | Unique | UNIQUE (Study_ID, Sample_Number) |
Entity: Sample_Xref
Entity details:
Description | This table captures external identifiers used to refer to Samples stored in the DGV.
For example, a Sample may be assigned an (external) identifier when that Sample is analyzed in a Study. When that Sample is loaded to the DGV, it will receive a DGV database identifier. This table will capture the cross-reference between the external sample identifiers/accession numbers and the DGV identifiers.
There could be external identifiers from other systems or projects, such as the HapMap project, the Coriell repository, or the Human Genome Diversity Panel. This table resolves the cross-referencing of these identifiers/accession numbers and the DGV Sample identifiers. |
Primary key constraint name | PK_Sample_Xref |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Sample_Xref_ID | SERIAL | Yes | This is the unique identifer for one mapping of a DGV Sample identifier to an external identifier.
For example, if a Sample has been assigned two external identifiers, there will be two records in this table for that Sample, each having one of the two external identifiers. |
FK | Sample_ID | INTEGER | Yes | This is the unique identifier that represents one Sample in the DGV. A Sample should be loaded only once to the database and assigned only one Sample ID. This means that there could be one or more records in this table that have the same Sample ID (one for each external identifier assigned to that Sample). |
| External_Sample_ID | CHARACTER VARYING | Yes | This field captures an exernal identifier that is used to describe a Sample that exists in the DGV.
The external identifier could be an accession number, e.g. DGVa Sample accession number, or a Common Cohort Sample ID, e.g. the HamMap Sample ID "NA11994". |
| External_ID_Source | CHARACTER VARYING | Yes | This is the original source system from which the 'External_Sample_ID' comes.
For example, this could be "DGVa" or "dbVar" or "HapMap", or the PubMed ID from a given Study, etc. |
| External_ID_Type | CHARACTER VARYING | No | This is a further qualifier for an External Sample Identifier, if any has been assigned to an external identifier.
For example, this could say "accession number" or the like. |
| Cohort_Name | CHARACTER VARYING | No | This is the Cohort to which a Sample belongs; this information may or may not be provided in a Study.
Examples of Cohorts are "HapMap" or "Human Genome Diversity Panel/Project". |
| Study_ID | INTEGER | No | This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated. |
| Source_System | CHARACTER VARYING | Yes | This field contains an identifier of the data repository in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample_Xref, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Sample_Sample_Xref | Non Identifying | Sample | Sample_Xref | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Sample_Xref_1 | Sample_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Sample_Xref_ID |
| Column Constraint | Not Null | Sample_ID |
| Column Constraint | Not Null | External_Sample_ID |
| Column Constraint | Not Null | External_ID_Source |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Sample_Xref | Table Constraint | Primary Key | PRIMARY KEY (Sample_Xref_ID) |
Sample_Sample_Xref | Table Constraint | Foreign Key | FOREIGN KEY (Sample_ID) REFERENCES Sample(Sample_ID) |
TUC_Sample_Xref_1 | Table Constraint | Unique | UNIQUE (External_Sample_ID, External_ID_Source) |
Entity: Study
Entity details:
Description | This table captures information about the Study in which structural variants were discovered and validated (not all discovered variants are validated in a Study).
The 'universal identifier' of the study, usually the 'PubMed ID' (PMID) is captured in this table; the intention of this ID is to uniquely identify the Study across other databases.
The 1000 Genomes Project Studies generate multiple input files per Study, or a "Study group". As each of those input files are loaded to the DGV, each will be assigned its own Study_ID. In these cases, all files from a Study group will have the same 'Study_Accession' value (e.g. "estd59" represents many input files from the 1000 Genomes project; one Study Accession number per multiple files).
Sometimes an external database will be referenced in a record in the database. Below is a translation of the abbreviations one might find in a record in this database, and the full names of those abbreviations/external databases:
AE = EBI Array Express
CLONE = NCBI CLONE
dbSNP = NCBI dbSNP
dbGaP = NCBI dbGAP
GENBANK = NCBI GENBANK
GENE = NCBI GENE
GEO = NCBI GEO
HPO = Human Phenotype Ontology
MeSH = Medical Subject Headings
OMIM = Online Mendelian Inheritance in Man
PROBE = NCBI PROBE
SRA = NCBI Sequence Read Archive (to be discontinued in 2011-2012)
ERA = EBI SRA
TRACE = NCBI TRACE Archive (to be discontinued in 2011-2012) |
Primary key constraint name | PK_Study |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Study_ID | SERIAL | Yes | This is the unique identifier for a given Study within the DGV. This only uniquely identifies a Study in the DGV and not across other (external) databases. |
| PubMed_ID | INTEGER | No | This is also known as 'PMID' and is the unique indentifer for a citation within PubMed. This identifier can be used to uniquely identify a publication, or Study, across many databases and is seen as a "universal" identifer for a given Study. |
| Display_Name | CHARACTER VARYING | Yes | This is the common name used to refer to a Study. These names are also used by DGVa and dbVar to refer to a Study.
Examples of common names are: "Cooper et al 2008" or "Kidd et al 2008". |
| Study_Accession | CHARACTER VARYING | No | The value in this field will be the accession number assigned to a Study by either NCBI-dbVar or EBI-DGVa. This accession number can then be used to cross-reference the Study in either dbVar or DGVa.
If the Study was accessioned in dbVar, this value will be prefixed with "nstd"; e.g. "nstd35"
If the Study was accessioned in DGVa, this value willl be prefixed with "estd"; e.g. "estd1" |
| OneK_Genomes_Project | CHARACTER VARYING | No | This is a flag to indicate whether a Study is part of the 1000 Genomes Project.
Multiple "Studies" in the database may actually represent one Study from the 1000 Genomes Project. These Studies generate multiple input files per Study, thus generating multiple Study records in the database. These Studies will have a 1000 Genomes Project flag = "Y". The 'Study_Accession' value can be used to group records in this table (and subsequently on joins to other tables) to see which database Studies are related to one another, thereby representing one 1000 Genomes Project Study. |
| Primary_Author_Name | CHARACTER VARYING | No | This is the name of the Author whose name appears first on a publication for a given Study. It is noted that there could be multiple authors on a given Study but these are not captured in the DGV. For the full list of authors of a Study, one should refer to the original publication. |
| Primary_Author_ID | CHARACTER VARYING | No | This is a unique identifier that an Author would use to refer to himself/herself across multiple databases and within Studies. Examples of an 'Author_ID' are "ResearcherID", "OpenID", "Scopus ID" |
| Project_ID | CHARACTER VARYING | No | This field may capture a Project Identifier if one has been assigned to a Study, or in the case that the Study is not in PubMed (note this is only in rare cases and these Studies will likely at least have a 'Study_Accession' from either dbVar or DGVa). |
| Principal_Investigator | CHARACTER VARYING | No | This field will capture the Principal Investigator of a project if this information is provided for a given Study. |
| NCBI_Tax_ID | INTEGER | No | This field captures the NCBI Taxonomy ID for the organism being Studied.
For humans, the Taxonomy ID is "9606" so users should expect to always see this value in this field for the DGV. NB A very few Studies loaded to the DGV archival database (DGVS) may also contain data for non-humans; these data are filtered out during the automated DGV-curation pipeline and not presented to users. |
| Sample_Size | INTEGER | No | This is the number of Samples examined in one Study and will be an integer greater than zero. |
| Variant_Count | INTEGER | No | This is the reported number of Variants that were discovered in a given Study. This value is assigned by either DGVa or dbVar.
It is possible that the reported Variant_Count (stored in this table) is different than the actual Variant Count. When performing computations using record counts, it is best to count the actual number of reported Variants as opposed to using this value. |
| Site | CHARACTER VARYING | No | This refers to the primary or main Site or Organization at which a Study is performed. This information is not always provided in a Study.
For example, if a 1000 Genomes Study was conducted at multiple sites, the value in this field would be "1000 Genomes: multiple sites". |
| Description | CHARACTER VARYING | No | This field will capture any other descriptors of a Study that need to be stored in the DGV.
For example, if there were special circumnstances under which a Study was performed, or if a special Pre-Publication Agreement exists, this information would be captured here. |
| Curation_Comments | CHARACTER VARYING | No | This field will capture a summary of DGV curation applied to a given Study. This information will be a summary of the DGV Filtering and Merging process.
This information will be made available on the Downloads page of the DGV website under the links to the original data ('Comments' field of the Studies' listing). |
| Hold_Until_Publication_Flag | CHARACTER VARYING | No | *** this field + its timestamp are only used in the DGVS (staging database for archiving, not for user-access) and not the public version, DGVC - and will infrequently, if at all, be used***
This field will capture whether a Study is to be held until publication.
If a Variant is to be held until publication, this value will be "Y"; if it is not to be held until publication, this value will be either "N" or "NULL".
If the value in this field is "Y", there will also be a timestamp indicating the date/time when the Variant can be published; this will be captured in the 'Publish_Date' field. |
| Publish_Date | TIMESTAMP | No | The value in this field will be the date the Study's Author has given us to publish the Study. This field will infrequently, if at all, be used.
If the Author does not provide a specific date, this value will be set to a date far into the future which will be reset when the Author has given the go-ahead to publish the Variant. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Study, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Study_Analysis | Non Identifying | Study | Analysis | Zero Or More |
Study_Dataset | Non Identifying | Study | Dataset | Zero Or More |
Study_Dataset_Analysis | Non Identifying | Study | Dataset_Analysis | Zero Or More |
Study_Method_Study | Non Identifying | Study | Method_Study | One Or More |
Study_Platform_Study | Non Identifying | Study | Platform_Study | Zero Or More |
Study_Reference | Non Identifying | Study | Reference | Zero Or More |
Study_Sample_Study | Non Identifying | Study | Sample_Study | One Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Study_1 | Study_ID | UNIQUE | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | 'NOW()' |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Display_Name |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Study | Table Constraint | Primary Key | PRIMARY KEY (Study_ID) |
Entity: Study_File_Prefix
Entity details:
Description | |
Primary key constraint name | PK_Study_File_Prefix |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Study_File_Prefix_ID | SERIAL | Yes | |
| Study_ID | INTEGER | Yes | |
| Study_File_Prefix | CHARACTER VARYING | Yes | |
| Source_System | CHARACTER VARYING | Yes | |
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y". |
| Load_Date | TIMESTAMP | Yes | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | 'now()' |
| Column Constraint | Not Null | Study_File_Prefix_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Study_File_Prefix |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | 'Y' |
| Column Constraint | Not Null | Load_Date |
PK_Study_File_Prefix | Table Constraint | Primary Key | PRIMARY KEY (Study_File_Prefix_ID) |
Entity: Supporting_Merged_Variant
Entity details:
Description | This table describes which Supporting Variants comprise a given Merged Variant. Each record in this table will include the Merged Variant ID and one of its corresponding Supporting Variant IDs; there will be 'n' number of records for every 'n' number of Supporting Variants that comprise a Merged Variant.
Note: The Variant table stores both Supporting and Merged Variant records. One Merged Variant is comprised of one or more Supporting Variants; Supporting Variants are not always reported in a Study (i.e. sometimes only the Merged, or "asserted" Variants are reported). This, the 'Sample_Merged_Variant' table resolves the one-to-many relationship of Supporting to Merged Variants that exist in the 'Variant' table.
Both dbVar and DGVa describe what DGV calls a "Merged Variant" as an "asserted Variant" (SV; e.g. esv12345) and all three systems call the Variants the comprise a Merged Variant, "Supporting Variants" (SSV; e.g. nssv12324). |
Primary key constraint name | PK_Supporting_Merged_Variant |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Supporting_Merged_Variant_ID | SERIAL | Yes | 9.10.9 This field represents one instance of a mapping between a Sample Level and a Merged Variant.
For each Merged Variant, there will be as many records in this table as there are Sample Level Variants that comprise the Merged Variant (provided these data are available from a published dataset). |
FK | Merged_ID | INTEGER | Yes | This field captures the unique identifer of a given Merged Variant in the DGV.
This field cross-references one record in the Variant table. The Variant table stores both Supporting and Merged Variants.
In the case of DGVa and dbVar, this is the "asserted variant", which is comprised of "supporting variants" (those are referenced in the 'Supporting_Variant_ID' field). |
FK | Supporting_Variant_ID | INTEGER | Yes | This field captures the unique identifer of a given Supporting Variant in the DGV.
This field cross-references one record in the Variant table. The Variant table stores both Supporting and Merged Variants.
In the case of DGVa and dbVar, this is the "supporting variant", which comprises the "asserted variant" (those are referenced in the 'Merged_ID' field). |
| Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI
|
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
|
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Supporting_Merged_Variant, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Sample_Level | Non Identifying | Variant | Supporting_Merged_Variant | Zero Or More |
Variant_Supporting_Merged_Variant | Non Identifying | Variant | Supporting_Merged_Variant | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Supporting_Merged_Variant_1 | Merged_ID | | |
IDX_Supporting_Merged_Variant_2 | Supporting_Variant_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Supporting_Merged_Variant_ID |
| Column Constraint | Not Null | Merged_ID |
| Column Constraint | Not Null | Supporting_Variant_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Supporting_Merged_Variant | Table Constraint | Primary Key | PRIMARY KEY (Supporting_Merged_Variant_ID) |
Sample_Level | Table Constraint | Foreign Key | FOREIGN KEY (Supporting_Variant_ID) REFERENCES Variant(Variant_ID) |
Variant_Supporting_Merged_Variant | Table Constraint | Foreign Key | FOREIGN KEY (Merged_ID) REFERENCES Variant(Variant_ID) |
Entity: Translocation_Mapping
Entity details:
Description | This table captures the mapping coordinates of a Translocation Variant or any other Variant that may have mappings across two Chromosomes. e.g. translocation, transposon
A translocation can span more than one Chromosome, creating a one-to-many relationship between a Variant and its respective mapping records.
Note: For other Variant Types, their respective mappings are stored in the 'Variant_Mapping' table. Translocation Mapping information is not likely to be provided in a Study until technology is refined enough to detect such events. |
Primary key constraint name | PK_Translocation_Mapping |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Translocation_Mapping_ID | SERIAL | Yes | This uniquely represents one mapping for one Translocation Variant stored in the DGV.
A Translocation Variant may have multiple mappings reported for each part of the Variant. Each part will be represented by this unique identifer. All of the parts comprising one Translocation Variant can be associated in this table by the 'Variant_ID'. |
FK | Variant_ID | INTEGER | Yes | 9.14.9 This is the unique identifer for a Variant in the DGV.
The 'Variant_ID' can appear more than once in this table if that Variant has been mapped to more than one reference assembly of the human genome.
This can be used to cross-reference one record in the 'Variant' table. |
| Reference_Assembly | CHARACTER VARYING | Yes | This is the Reference Assembly that was used to map a variant's coordinates.
Examples of Reference Assemblies are: GRCh37 (build 37), hg18/Build 36, and hg17/Build 35. |
| Study_Mapping | CHARACTER VARYING | Yes | This field indicates whether this was the original Variant Mapping that was published as part of a Study's dataset or whether the mapping was a result of mitrating the Study-published mapping to another Reference Assembly (e.g. using the LiftOver tool to migrate from Hg17 to Hg18).
If this is the original Variant Mapping as published within a Study, the value in this field is "Y"; if it is not the original Variant Mapping, the value in this field will be "N" or blank. |
| Placement_Method | CHARACTER VARYING | No | This is a short term used (by DGVa and dbVar) to describe how a Variant was mapped to the genome. The default value is "Genomic" which describes the mapping provided by the Authors of a Study (i.e. the Submitter of the Study's data to dbVar/DGVa).
Other terms include: Artifact, DGV, Clone Registry, Clone ad-hoc, Cytogenetic, Flanking sequences, Remapped auto, Remapped ad-hoc. |
| Remap_Status | CHARACTER VARYING | No | This field captures a short descriptor of the status of a given Variant Mapping after it has been forward mapped to a newer Reference Assembly. This is a value used by dbVar and DGVa and may not always be provided in a Study.
Values that could be in this field are: Perfect (perfectly mapped), Deletion (a deletion occured in the remapping), Insertion (an insertin occured in the remapping), No map (the placement did not map). |
| Recipient | CHARACTER VARYING | No | This field is specific to DGVa and dbVar and is captured here to provide full context of a Variant Mapping. For Variants that span across more than one locus (usually across Chromosomes), the value in this field describes whether that second location is either "known" or "ambiguous". |
| Unmapped_Flag | CHARACTER VARYING | No | A Variant might not map to a given Reference Assembly when that Variant is re-mapped, or migrated, to a newer version of a Reference Assembly (e.g. using the UCSC LiftOver tool to re-map/migrate from Hg17 to Hg18).
For Variants that do not map to a given Reference Assembly, the value in this field will be "Y". If a Variant was successfully mapped to a Reference Assembly, this fileld will be NULL. The current DGV pipeline may not use this field. |
| Span | INTEGER | No | This field captures the Span of a Variant, if available. The Span is be the distance between the mapped start and mapped end of the variant on the reference assembly.
For example, in a PEM study, the clone end anchors will delineate the start-end of variant, in aCGH, the first/last probe with signal intensity difference will delineate the variant span. |
| Size | INTEGER | No | This field captures the Size of the Variant, if available. The Size is be the number of basepairs contained within the variant. In many cases the size will be equal to the span. For PEM and insertions into the reference, these values may be different, for aCGH etc, they will be the same. |
| Estimated_Size | INTEGER | No | This field captures the Estimated Size of a Variant, if available. The Estimated Size will provide an approximate size, depending on the methods and resolution of the various approaches within a study.
For example, in the case where an insertion is called using a PEM strategy, the ends map to the reference at a distance less than expected by the known fragment insert size (in some studies less than 3 SD from the mean mapping size). Based on the size of the insert library (can't capture insertion larger than clone) and the SD (ability to resolve a small insertion from the inherent variability of insert sizes within that library), a max-min size can be estimated without sequencing the actual insert. |
| DGV_Calculated_Size | INTEGER | No | If no Size information was reported with a Variant, the DGV-pipeline may calculate the Size of the Variant using its Start and End coordinates. This calculation is only done if determined appropriate for a given Study.
The Size is calculated as such: 'End' - 'Start' + 1 |
| Chromosome_Number | CHARACTER VARYING | No | This is the primary Chromosome on which a Translocation Variant has been discovered.
|
| Start_B | INTEGER | No | This is the Average Start Boundary of a Variant. Alternatively, if only one Start Coordinate has been provided for a Variant, it will be stored here. |
| End_B | INTEGER | No | This is the Average End Boundary of a Variant. Alternatively, if only one End Coordinate has been provided for a Variant, it will be stored here. |
| Inner_Start_B | INTEGER | No | This is the Minimum Start Boundary of a Variant. |
| Inner_End_B | INTEGER | No | This is the Minimum End Boundary of a Variant. |
| Outer_Start_B | INTEGER | No | This is the Maximum Start Boundary of a Variant. |
| Outer_End_B | INTEGER | No | This is the Maximum End Boundary of a Variant. |
| Chromosome_Random_Num | CHARACTER VARYING | No | This field is to capture coordinates for variants that cannot be mapped on a chromosome but for which we know to what chromosome the variant belongs. |
| Start_Random | CHARACTER VARYING | No | This is the Start coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs. |
| End_Random | CHARACTER VARYING | No | This is the End coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs. |
| DGV_Start | INTEGER | No | This field is for internal use only. This and the 'DGV_End' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline. |
| DGV_End | INTEGER | No | This field is for internal use only. This and the 'DGV_Start' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline. |
| Variant_Number | INTEGER | Yes | This is a sequential number assigned to a Variant within a reported Study and is not the database identifier for that Variant (which is the 'Variant.Variant_ID').
For a Translocation Variant, the mapping for each part of the Translocation will be assigned the same Variant Number in the input file. In conjunction with the 'Variant_ID', a user will be able to determine which records/mappings group together to represent one Translocation Variant. |
| Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV. |
| Source_System | CHARACTER VARYING | No | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI
|
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
|
| Load_Date | TIMESTAMP | No | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Mapping, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Variant_Translocation_Mapping | Non Identifying | Variant | Translocation_Mapping | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Translocation_Mapping_1 | Variant_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Translocation_Mapping_ID |
| Column Constraint | Not Null | Variant_ID |
| Column Constraint | Not Null | Reference_Assembly |
| Column Constraint | Not Null | Study_Mapping |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Variant_Number |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Default | Y |
PK_Translocation_Mapping | Table Constraint | Primary Key | PRIMARY KEY (Translocation_Mapping_ID) |
TCC_OuterStartA_Translocation_Mapping_2 | Table Constraint | Check | Outer_Start_B <= Outer_End_B |
TCC_StartA_Translocation_Mapping_1 | Table Constraint | Check | Start_B <= End_B |
Variant_Translocation_Mapping | Table Constraint | Foreign Key | FOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID) |
Entity: Variant
Entity details:
Description | This table captures all Variants reported in Studies that have been loaded to the DGV. This table stores both Supporting and Merged Variants.
For Studies that were accessioned by DGVa, the Variants are assigned a DGVa accession number that starts with "e"; for Studies that were accessioned by dbVar, the Variants are assigned a dbVar accession number that starts with "n".
For example, a Supporting Variant, in DGVa will be assigned an accession number like "essv12345"; merged Variants, or what DGVa and dbVar call "asserted" Variants, would be assigned an accession numer like "esv45678". For dbVar, these examples, instead, would be, respectively, "nssv12345" and "nsv45678". |
Primary key constraint name | PK_Variant |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Variant_ID | SERIAL | Yes | This is the unique identifier for a Variant in the DGV. |
| Variant_Number | INTEGER | Yes | This is a sequential number assigned to each Variant discovered or validated in a Study. This value is not unique in this table or across the database.
For example, if five Variants were discovered in a Study, each Variant (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks. |
| Sample_Merged_Flag | CHARACTER VARYING | Yes | This field distinguishes whether a Variant is a Supporting Variant or a Merged Variant.
If the Variant is Ssupporting, the value in this field will be "S".
If the Variant is Merged, the value in this field will be "M". |
| Study_ID | INTEGER | No | This is the unique database identifier that represents one Study in the DGV. |
| Sample_Study_ID | INTEGER | No | This is the unique identifier in the DGV for the Sample in which the Variant was found in a given Study. It is the unique identifier from the 'Sample_Study' table.
This can be used to cross-reference one record in the 'Sample_Study' table. NB this value represents an instance of a Sample as it is analyzed in one Study. The same Sample could be analyzed in different Studies, each instance having a different 'Sample_Study_ID', but the same 'Sample_ID'. |
| Validation_Flag | CHARACTER VARYING | No | This field captures whether a Variant has been Validated or not, and the results of the Validation.
A Variant needs to be 'Discovered' in a Sample before it can be 'Validated'. A Variant is Validated by performing further assays.
-- Flag Values are as follows:
"Y" means that a Validation Method was applied and the Variant was validated
"A" means that a Validation Method was attempted but the Variant was not validated
"N" means that no Validation Method was applied or no Validation information was provided for a given Variant
NB: Multiple Validation Methods can be applied to one Variant; if any of those methods yeilds a positive validation result, the Validation_Flag="Y" |
| Validation_Method | CHARACTER VARYING | No | If a Discovered Variant is also Validated, and information about Validation is provided for that Variant, the Validation Method (if available) is captured here.
The list of Validation Methods can be found in the 'Exchange.XSD' file as the controlled list "MethodTypeCV" |
| Validation_Platform_Name | CHARACTER VARYING | No | If a Discovered Variant is also Validated, and information about Validation is provided for that Variant, the Validation Platform (if available) is captured here. |
| Landmark | CHARACTER VARYING | No | This field will capture any 'Landmark' that was used to discover a Variant.
Typically the Landmark is a Clone employed to find the location of a Variant on the genome. Thus the most common value in this field will be the Clone Name. E.g. "RP11-420B6"; "AL031643". |
| Inheritance | CHARACTER VARYING | No | This is to indicate whether a Variant is inherited or otherwise.
The value in this field could be for example: "Maternal", "Paternal", "Germline", "De novo", "Somatic", "Not tested", "Tested - inconclusive", "Biparental", or "Uniparental".
This informaiton may not always be provided in a published dataset. |
| Sequence | CHARACTER VARYING | No | The sequence of the variant, if available, is captured in this field. Alternatively, this field may not contain the actual sequence but rather may be a pointer to another table or file where the actual sequence is stored. |
| Probe_Count | INTEGER | No | This field refers to the number of probes on the array, within the region called as a Variant. |
| Merging_Criteria | TEXT | No | If a Variant is a Merged Variant, a brief description of the criteria used to merge Supporting Variants to the Merged Variants will be provided here. |
| Ethnicity | CHARACTER VARYING | No | In the case where a Variant is reported with Ethnicity information but with no corresponding Sample information, Ethnicity will be captured here. |
| Gender | CHARACTER VARYING | No | In the case where a Variant is reported with Gender information but with no corresponding Sample information, Gender will be captured here.
The value in this field can be "Male", "Female", or "Unknown", or NULL. |
| Cohort_Name | CHARACTER VARYING | No | This is the Cohort to which a Variant belongs; this information may or may not be provided in a Study. This information is more commonly found in the Sample related tables ('Sample', 'Sample_Xref') but could also be captured here in cases when a Variant is reported with no associated Sample.
Examples of Cohorts are "HapMap" or "Human Genome Diversity Project". |
| Platform_Name | CHARACTER VARYING | No | If the Platform is known for which a given Variant is associated, this information may be captured here. The value in this field is the name of the Platform used in analyzing a given Sample to discover or validate the Structural Variants described in this table. |
| DGV_Merged_Flag | CHARACTER VARYING | No | A value of "Y" in this field indicates that this is a Variant that was merged using the DGV-merging criteria (currently 70% reciprocal overlap). If the value in this field is blank or "N", this Variant is not a DGV-merged variant.
If the 'Supporting_Merged_Flag' = "S" and 'DGV_Merged_Flag = "N", this is a Supporting Variant that was reported in a Study. There should be no records in this table with a 'Supporting_Merged_Flag' = "S" and 'DGV_Merged_Flag = "Y". |
| DGVA_Inferred_Flag | CHARACTER VARYING | No | Some Studies do not submit sample-level calls (SSVs) to DGVa or dbVar, but rather only merged calls, or "asserted Variants" (SVs). Thus during accessioning of these Studies, DGVa or dbVar will create what DGVa calls "curated" SSVs to support the submitted SVs; DGV calls these "inferred Variants". A typical scenario in which dbVar or DGVa would created these Inferred Variants is to assign Sample IDs to a Merged Variant (one Inferred Variant per Sample ID).
In order to distinguish between SSVs that are submitted as part of a Study from SSVs that are created by DGVa/dbVar during the accessioning process, DGV has established this flag (DGVA_Inferred_Flag). If an SSV is created by DGVa/dbVar during the accessioning process, this flag will be set to "Y" (yes); if an SSV was submitted as part of a Study's data submission, this flag will be set to "N" (no). These Variants are filtered out in the DGV pipeline. |
| Comments | CHARACTER VARYING | No | If there is any supplemental information about a Variant that cannot be populated into any of the other fields in the 'Variant' table, that information will go here.
e.g. "does not validate experimentally" may need to be appended to/annotated for a Variant |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI
|
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
|
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Sample_Level | Non Identifying | Variant | Supporting_Merged_Variant | Zero Or More |
Variant_Supporting_Merged_Variant | Non Identifying | Variant | Supporting_Merged_Variant | Zero Or More |
Variant_Translocation_Mapping | Non Identifying | Variant | Translocation_Mapping | Zero Or More |
Variant_Variant_Analysis | Non Identifying | Variant | Variant_Analysis | Zero Or More |
Variant_Variant_Mapping | Non Identifying | Variant | Variant_Mapping | Zero Or More |
Variant_Variant_Type | Non Identifying | Variant | Variant_Type | Zero Or More |
Variant_Variant_Xref | Non Identifying | Variant | Variant_Xref | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Variant_1 | Variant_ID | UNIQUE | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Variant_ID |
| Column Constraint | Not Null | Variant_Number |
| Column Constraint | Not Null | Sample_Merged_Flag |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Variant | Table Constraint | Primary Key | PRIMARY KEY (Variant_ID) |
Entity: Variant_Analysis
Entity details:
Description | One or more Variants can be related to one or more 'Dataset_Analysis' records. This table resolves the M:M relationship between the 'Dataset_Analysis' and 'Variant' tables. This is in order to support the relationships defined in the dbVar/DGVa XSD.
For each 'Dataset_Analysis' record to which a Variant is associated, this table will store a record for each instance of this. For example, Variant_1 is associated to Dataset_Analysis 1, 2, and 3. So in the 'Variant_Analysis' table there would be three records, each containing the same 'Variant_ID' and each with a different 'Dataset_Analysis_ID', i.e. 1, 2, or 3. |
Primary key constraint name | PK_Variant_Analysis |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Variant_Analysis_ID | SERIAL | Yes | This is the unique identifier that represents one record in this table; i.e. one instance of a Dataset-Analysis and associated Variant. |
FK | Dataset_Analysis_ID | INTEGER | Yes | This uniquely identifies one Dataset that is compared to one Reference within a Study by one Analysis. |
FK | Variant_ID | INTEGER | Yes | This is the unique identifier for a Variant in the DGV. |
| Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI
|
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
|
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Analysis, etc.
|
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Dataset_Analysis_Variant_Analysis | Non Identifying | Dataset_Analysis | Variant_Analysis | Zero Or More |
Variant_Variant_Analysis | Non Identifying | Variant | Variant_Analysis | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Variant_Analysis_1 | Dataset_Analysis_ID | | |
IDX_Variant_Analysis_2 | Variant_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Variant_Analysis_ID |
| Column Constraint | Not Null | Dataset_Analysis_ID |
| Column Constraint | Not Null | Variant_ID |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
Dataset_Analysis_Variant_Analysis | Table Constraint | Foreign Key | FOREIGN KEY (Dataset_Analysis_ID) REFERENCES Dataset_Analysis(Dataset_Analysis_ID) |
PK_Variant_Analysis | Table Constraint | Primary Key | PRIMARY KEY (Variant_Analysis_ID) |
Variant_Variant_Analysis | Table Constraint | Foreign Key | FOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID) |
Entity: Variant_Mapping
Entity details:
Description | This table captures the mapping coordinates of a Variant.
The purpose of capturing these data in a separate table is to afford mappings to any of the human genome reference assemblies (e.g. hg17, hg18) for a given Variant |
Primary key constraint name | PK_Variant_Mapping |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Variant_Mapping_ID | SERIAL | Yes | This uniquely represents one mapping for one Variant stored in the DGV.
This table allows for multiple mappings of one Variant; e.g. to two iterations of the human genome reference assembly |
FK | Variant_ID | INTEGER | Yes | This is the unique identifer for a Variant in the DGV.
The 'Variant_ID' can appear more than once in this table if that Variant has been mapped to more than one reference assembly of the human genome.
This can be used to cross-reference one record in the 'Variant' table. |
| Reference_Assembly | CHARACTER VARYING | No | This is the Reference Assembly that was used to map a variant's coordinates.
Examples of Reference Assemblies are: GRCh37 (build 37), hg18/Build 36, and hg17/Build 35. |
| Study_Mapping | CHARACTER VARYING | Yes | This field indicates whether this was the original Variant Mapping that was published as part of a Study's dataset or whether the mapping was a result of migrating the Study-published mapping to another Reference Assembly (e.g. using the LiftOver tool to migrate from Hg17 to Hg18).
If this is the original Variant Mapping as published within a Study, the value in this field is "Y"; if it is not the original Variant Mapping, the value in this field will be "N" or blank. |
| Placement_Method | CHARACTER VARYING | No | This is a short term used (by DGVa and dbVar) to describe how a Variant was mapped to the genome. The default value is "Genomic" which describes the mapping provided by the Authors of a Study (i.e. the Submitter of the Study's data to dbVar/DGVa).
Other terms include: Artifact, DGV, Clone Registry, Clone ad-hoc, Cytogenetic, Flanking sequences, Remapped auto, Remapped ad-hoc. |
| Remap_Status | CHARACTER VARYING | No | This field captures a short descriptor of the status of a given Variant Mapping after it has been forward mapped to a newer Reference Assembly. This is a value used by dbVar and DGVa.
Values that could be in this field are: Perfect (perfectly mapped), Deletion (a deletion occured in the remapping), Insertion (an insertin occured in the remapping), No map (the placement did not map). |
| Recipient | CHARACTER VARYING | No | This field is specific to DGVa and dbVar and is captured here to provide full context of a Variant Mapping. For Variants that span across more than one locus (usually across Chromosomes), the value in this field describes whether that second location is either "known" or "ambiguous". |
| Unmapped_Flag | CHARACTER VARYING | No | A Variant might not map to a given Reference Assembly when that Variant is re-mapped, or migrated, to a newer version of a Reference Assembly (e.g. using the UCSC LiftOver tool to re-map/migrate from Hg17 to Hg18).
For Variants that do not map to a given Reference Assembly, the value in this field will be "Y". If a Variant was successfully mapped to a Reference Assembly, this fileld will be NULL. The current DGV pipeline may not use this field. |
| Span | INTEGER | No | This field captures the Span of a Variant, if available. The Span is be the distance between the mapped start and mapped end of the variant on the reference assembly.
For example, in a PEM study, the clone end anchors will delineate the start-end of variant, in aCGH, the first/last probe with signal intensity difference will delineate the variant span. |
| Size | INTEGER | No | This field captures the Size of the Variant, if available. The Size is be the number of basepairs contained within the variant. In many cases the size will be equal to the span. For PEM and insertions into the reference, these values may be different, for aCGH etc, they will be the same. |
| Estimated_Size | INTEGER | No | This field captures the Estimated Size of a Variant, if available. The Estimated Size will provide an approximate size, depending on the methods and resolution of the various approaches within a study.
For example, in the case where an insertion is called using a PEM strategy, the ends map to the reference at a distance less than expected by the known fragment insert size (in some studies less than 3 SD from the mean mapping size). Based on the size of the insert library (can't capture insertion larger than clone) and the SD (ability to resolve a small insertion from the inherent variability of insert sizes within that library), a max-min size can be estimated without sequencing the actual insert. |
| DGV_Calculated_Size | INTEGER | No | If no Size information was reported with a Variant, the DGV-Curation process may calculate the Size of the Variant using its Start and End coordinates. This calculation is only done if determined appropriate for a given Study.
The Size is calculated as such: 'End' - 'Start' + 1 |
| Chromosome_Number | CHARACTER VARYING | No | This is the primary Chromosome on which a Variant has been mapped.
The 'Chromosome_B_Number' captures the secondary Chromosome on which a Variant has been discoverd, for example, in the case of a translocation. |
| Start_A | INTEGER | No | This is the Average Start Boundary of a Variant. Alternatively, if only one Start Coordinate has been provided for a Variant, it will be stored here. |
| End_A | INTEGER | No | This is the Average End Boundary of a Variant. Alternatively, if only one End Coordinate has been provided for a Variant, it will be stored here. |
| Inner_Start_A | INTEGER | No | This is the Minimum Start Boundary of a Variant. |
| Inner_End_A | INTEGER | No | This is the Minimum End Boundary of a Variant. |
| Outer_Start_A | INTEGER | No | This is the Maximum Start Boundary of a Variant. |
| Outer_End_A | INTEGER | No | This is the Maximum End Boundary of a Variant. |
| Chromosome_Random_Num | CHARACTER VARYING | No | This field is to capture coordinates for variants that cannot be mapped on a chromosome but for which we know to what chromosome the variant belongs. |
| Start_Random | CHARACTER VARYING | No | This is the Start coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs. |
| End_Random | CHARACTER VARYING | No | This is the End coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs. |
| DGV_Start | INTEGER | No | This field is for internal use only. This and the 'DGV_End' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline. |
| DGV_End | INTEGER | No | This field is for internal use only. This and the 'DGV_Start' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline. |
| Study_ID | INTEGER | No | This is the unique database identifier that represents one Study in the DGV. |
| Variant_Number | INTEGER | No | This is a sequential number assigned to a Variant within a reported Study and is not the database identifier for that Variant (which is the 'Variant.Variant_ID').
For example, if five Variants were discovered in a Study, each Variant (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
|
| Source_System | CHARACTER VARYING | No | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI
|
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
|
| Load_Date | TIMESTAMP | No | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Mapping, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Variant_Variant_Mapping | Non Identifying | Variant | Variant_Mapping | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Variant_Mapping_1 | Variant_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Variant_Mapping_ID |
| Column Constraint | Not Null | Variant_ID |
| Column Constraint | Not Null | Study_Mapping |
| Column Constraint | Default | Y |
| Column Constraint | Default | Y |
PK_Variant_Mapping | Table Constraint | Primary Key | PRIMARY KEY (Variant_Mapping_ID) |
TCC_OuterStartA_Variant_Mapping_2 | Table Constraint | Check | Outer_Start_A <= Outer_End_A |
TCC_StartA_Variant_Mapping_1 | Table Constraint | Check | Start_A <= End_A |
Variant_Variant_Mapping | Table Constraint | Foreign Key | FOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID) |
Entity: Variant_Type
Entity details:
Description | This table contains a list of all possible Variant Types that can be attributed to a given Variant.
A Variant could have more than one type, e.g. Inversion + Deletion. As technologies develop further, more refined Variant Type descriptions could arise. This table will allow for multiple Variant Types to be ascribed to one Variant.
NB The ICGC nomenclature already describes Variant Types with sub-categories. This is an example of how a Variant could be described with multiple Variant Types. |
Primary key constraint name | PK_Variant_Type |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Variant_Type_ID | SERIAL | Yes | This is the unique identifier to represent one instance of a Variant Type in the DGV. |
FK | Variant_ID | INTEGER | Yes | This is the unique identifer that represents one Variant in the DGV. It is the Variant that is described by the 'Variant_Type'.
This can be used to cross-reference one record in the 'Variant' table. |
| Variant_Type | CHARACTER VARYING | Yes | This field will capture the Variant Type ascribed to one Varian. Typically a Variant will only be assigned one Variant Type; i.e. "CNV" or "Other"; a Variant can also be assigned a 'Variant_Sub_Type'; e.g. "Gain" or "Complex".
A Variant could be of more than one Type, e.g. CNV + Other. If more than one Variant Type is ascribed to a Variant, that Variant will have as many records in this table as are Types assigned to that Variant, each record having the same 'Variant_ID'.
|
| Variant_Sub_Type | CHARACTER VARYING | No | This field will capture the Variant Sub-Type ascribed to one Variant. Typically a Variant will be assigned only one 'Variant_Sub_Type'.
A Variant could have more than one Type+Sub-Type, e.g. CNV+Duplication and Other+Inversion. If more one Variant Type+Sub-Type is ascribed to a Variant there will be as many records in this table as there are for that number of combinations of Type+Sub-Type assigned to that Variant, each record having the same 'Variant_ID'. |
| Copy_Number_Count | INTEGER | No | For Variants that are of type "CNV", this field indicates the number of copies of a Variant sequence. If information about each copy of a CNV is available, that information will be captured in the 'Copy_Number' table (1:M relationship with the 'Variant_Type' table).
This information may not be provided in a published dataset. If this information is available, the value in this field willl typically be an absolute copy number state. |
| Variant_Type_Description | CHARACTER VARYING | No | This field provides further information about a Variant Type by way of a short description of the Variant Type.
An example of a Variant Type description for an Insertion is: "Insertion of DNA sequence of known position relative to a reference sequence, detected by sequence/paired-end-mapping technology". |
| Genotype | CHARACTER VARYING | No | This is the genotype of a particular Variant as described for a given individual.
This information may not be provided in a published dataset. |
| Zygosity | CHARACTER VARYING | No | This value is related to the Variant Sub-Type. The allowable values in this field are "Heterozygous", "Homozygous", or "Hemizygous".
For example, if a Variant Type = "CNV" and Sub-Type = "Deletion", the 'Zygosity' could thus be included as either "Heterozygous" or "Homozygous". |
| Number_of_Gains | INTEGER | No | This value applies only to Merged Variant calls, aka Regions or Events. This refers to the number of individuals that have a copy number gain within a merged CNV event or region. |
| Number_of_Losses | INTEGER | No | This value applies only to Merged Variant calls, aka Regions or Events. This refers to the number of individuals that have a copy number loss within a merged CNV event or region. |
| Count | INTEGER | No | This is a value related to the Frequency of a Variant. This value is captured during DGVa/dbVar archiving and is described by DGVa/dbVar as: "Number of samples that reported the structural variant". |
| Total_Reporting | INTEGER | No | This is a value related to the Frequency of a Variant. This value is captured during DGVa/dbVar archiving and is described by DGVa/dbVar as: "Number of samples assessed for structural variant". |
| Reported_Frequency | NUMERIC | No | This is a value related to the Frequency of a Variant. This value is captured during DGVa/dbVar archiving and is described by DGVa/dbVar as: "Frequency of variant"
This information may not be provided in a published dataset. |
| Calculated_Frequency | INTEGER | No | This field will serve as a placeholder in the case that DGV develops a frequency calculation step in the automated DGV-curation pipeline.
This is a frequency that would be calculated during the DGV data curation process. In cases where no 'Reported_Frequency' is available, a Calculated Frequency may be derived from other information that was reported in a Study.
The 'Calculated_Frequency' has an additional descriptor ('Calc_Frequency_Qualifier') to distinguish between the frequency as being "type" (number of Gains or Losses within a Study or Cohort) or "allele" (frequency of the allele in a Variant set). |
| Calc_Frequency_Qualifier | CHARACTER VARYING | No | This field will serve as a placeholder in the case that DGV develops a frequency calculation step in the automated DGV-curation pipeline.
The 'Calc_Frequency_Qualifier' is an additional descriptor of 'Calculated_Frequency' to distinguish between the frequency as being either:
"type" = number of Gains or Losses within a Study or Cohort OR
"allele" = frequency of the allele in a Variant set
This is a value that is assigned during the DGV data curation process. |
| Number_of_Gains_and_Losses | INTEGER | No | This value applies only to Merged Variant calls, aka Regions or Events. This refers to the total number of individuals that have a copy number change within a merged CNV event or region. |
| Comments | CHARACTER VARYING | No | If further details about the Variant Type are provided, those details could be captured in this field. |
| Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI
|
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
|
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Type, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Variant_Type_Copy_Number | Non Identifying | Variant_Type | Copy_Number | Zero Or More |
Variant_Variant_Type | Non Identifying | Variant | Variant_Type | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Variant_Type_1 | Variant_ID | | |
IDX_Variant_Type_2 | Variant_Type_ID | UNIQUE | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Variant_Type_ID |
| Column Constraint | Not Null | Variant_ID |
| Column Constraint | Not Null | Variant_Type |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Variant_Type | Table Constraint | Primary Key | PRIMARY KEY (Variant_Type_ID) |
Variant_Variant_Type | Table Constraint | Foreign Key | FOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID) |
Entity: Variant_Xref
Entity details:
Description | This table captures any external identifers used to reference a Variant that exists in the DGV.
For example, DGVa 'esv' (asserted variant) and 'essv' (supporting variant) accession numbers would be recorded here. Those identifiers would cross-referecnce Variants in the DGVa.
This table can also capture IDs from external databases for records that relate to a given Variant in this table. For example, if a Variant's sequence has been deposited to NCBI's Trace Archive (formally discontinued in February, 2011), that sequence will receive one or more TEMPLATE_IDs, e.g. TEMPLATE_ID = G248P82926C7. In this example, the 'External_ID_Source' = "Trace Archive" and the 'External_Variant_ID' = "G248P82926C7", with the corresponding 'Variant_ID' for that Variant from the 'Variant' table. |
Primary key constraint name | PK_Variant_Xref |
Attributes:
Key | Attribute name | Data type | Not null | Description |
PK | Variant_Xref_ID | SERIAL | Yes | This is the unique identifer for one mapping of a Variant in the DGV to an external identifier. |
FK | Variant_ID | INTEGER | Yes | This is the unique identifer from the 'Variant' table that represents one Variant in the DGV.
This can be used to cross-reference one record in the 'Variant' table. |
| External_Variant_ID | CHARACTER VARYING | Yes | This field captures an exernal identifier that is used to describe a Variant that exists in the DGV.
The external identifier could be an accession number, e.g. DGVa 'esv12345', an ID, or a Variant name, or the Variant ID assigned to a Variant within a Study (i.e. the Study's Variant ID). |
| External_ID_Source | CHARACTER VARYING | Yes | This is the original source system from which the 'External_Variant_ID' comes.
For example, this could be "DGVa" or "dbVar" or a PubMed ID. |
| External_ID_Type | CHARACTER VARYING | No | This field will capture additional information about the external identifier, if it is provided in a Study.
For example, DGVa may have a 'Variant Name'/'Submitter ID' and a 'Variant Accession Number'; this field will capture which label has been assigned to the identifier. |
| Study_ID | INTEGER | Yes | This is the unique database identifier that represents one Study in the DGV. |
| Source_System | CHARACTER VARYING | Yes | This field contains an indicator of the data store in which these data were originally stored. For example:
1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI
2) "dbVar" for data originally archived in dbVar at NCBI
|
| Current_Record_Flag | CHARACTER VARYING | No | This value in this field indicates whether a given record in this table is Current or not.
If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline.
For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
|
| Load_Date | TIMESTAMP | Yes | This field captures the load date of a record in this table.
This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Xref, etc. |
Relationships:
Relationship name | Type | Parent | Child | Cardinality |
Variant_Variant_Xref | Non Identifying | Variant | Variant_Xref | Zero Or More |
Indexes:
Index name | Index columns | Type | Sort order |
IDX_Variant_Xref_1 | Variant_ID | | |
Constraints:
Constraint name | Type | Level | Constraint |
| Column Constraint | Default | now() |
| Column Constraint | Not Null | Variant_Xref_ID |
| Column Constraint | Not Null | Variant_ID |
| Column Constraint | Not Null | External_Variant_ID |
| Column Constraint | Not Null | External_ID_Source |
| Column Constraint | Not Null | Study_ID |
| Column Constraint | Not Null | Source_System |
| Column Constraint | Default | Y |
| Column Constraint | Not Null | Load_Date |
PK_Variant_Xref | Table Constraint | Primary Key | PRIMARY KEY (Variant_Xref_ID) |
TUC_Variant_Xref_1 | Table Constraint | Unique | UNIQUE (External_Variant_ID, External_ID_Source) |
Variant_Variant_Xref | Table Constraint | Foreign Key | FOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID) |
|