DeZign for Databases v5.2

Logical model

Entity details

Entity: Analysis

Entity details:

Description	This table captures information about each Analysis that is performed in one Study. A Study may perform one or more Analyses within that Study. An Analysis is the comparison of the Study's Dataset to a Reference.
Primary key constraint name	PK_Analysis

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Analysis_ID	SERIAL	Yes	This is the unique database identifer for one Analysis performed within a Study.
FK	Study_ID	INTEGER	Yes	This is the unique identifier from the 'Study' table and represents the Study in which an instance of an Analysis was employed.
	Analysis_Description	CHARACTER VARYING	No	This field captures a brief description of the Analysis employed in a Study. This information is not always provided in a Study. Note: specific criteria used during the Analysis should be captured in the 'Analysis_Criteria' field and a general description captured in this field.
	Analysis_Type	CHARACTER VARYING	No	If this information is provided, the Analysis Type will be captured in this field; this information is not always provided in a Study. An example of an Analysis Type is "split-read mapping".
	Analysis_Tool	CHARACTER VARYING	No	This field captures the name of the Analysis Tool employed in a given Analysis. Examples of Analysis Tools are "Birdsuite", "CNAG", "Genemapper".
	Analysis_Tool_Version	CHARACTER VARYING	No	This is a version number for the Analysis Tool that was used in a given Study. This information is not always provided for an Analysis Tool.
	Analysis_Tool_Description	CHARACTER VARYING	No	This is a description of the Analysis Tool that was used in a given Study. This information is not always provided for an Analysis Tool.
	Analysis_Tool_Type	CHARACTER VARYING	No	This field captures whether an Analysis Tool is an Algorithm or a Software Suite. This information is not always provided for a given Analysis Tool in a Study; DGVa and dbVar currently do not capture this information.
	Analysis_Number	INTEGER	No	This is a sequential number assigned to each Analysis employed in a Study. This value is not unique in this table or across the database. For example, if five Analyses were employed in a Study, each Analysis (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
	Analysis_Criteria	CHARACTER VARYING	No	If the criteria to perform an Analysis are described in a Study, they are captured here. E.g. parameters set in an Analysis Tool.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Analysis_Study', etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Analysis_Dataset_Analysis	Non Identifying	Analysis	Dataset_Analysis	Zero Or More
Analysis_Sub_Analysis	Non Identifying	Analysis	Merged_Analysis	Zero Or More
Analysis_Super_Analysis	Non Identifying	Analysis	Merged_Analysis	Zero Or More
Study_Analysis	Non Identifying	Study	Analysis	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Analysis_1	Analysis_ID	UNIQUE
IDX_Analysis_2	Study_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Analysis_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Analysis	Table Constraint	Primary Key	PRIMARY KEY (Analysis_ID)
Study_Analysis	Table Constraint	Foreign Key	FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)

Entity: Cohort

Entity details:

Description	This table contains information about a given Cohort . A Cohort could be part of one Study or a Cohort could be examined in more than one Study. For example, Samples from the HapMap Project or the Human Genome Diversity Panel are Cohorts that are examined in several Studies.
Primary key constraint name	PK_Cohort

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Cohort_ID	SERIAL	Yes	This is the unique identifier for one Cohort that is in the DGV.
	Cohort_Name	CHARACTER VARYING	No	This is the name given to a Cohort within a given Study. Examples of Cohort Names are: "Human Genome Diversity Panel" or "HapMap".
	Cohort_Description	CHARACTER VARYING	No	This is a short description of the Cohort (within a Study). For example, the description can include details about what is common among the individuals within the Cohort, such as details about gender, age, etc.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally archived. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Cohort, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Cohort_Sample	Non Identifying	Cohort	Sample	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Cohort_1	Cohort_ID	UNIQUE

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Cohort_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Cohort	Table Constraint	Primary Key	PRIMARY KEY (Cohort_ID)
TUC_Cohort_1	Table Constraint	Unique	UNIQUE (Cohort_Name)

Entity: Copy_Number

Entity details:

Description	This table captures descriptions of each Copy of a Copy Number Variant (CNV). At present, this information is not provided in a Study (primarily due to existing technological constraints). When these data become available, they can be captured in this table.
Primary key constraint name	PK_Copy_Number

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Copy_Number_ID	SERIAL	Yes	This uniquely identifies one Copy of a Sample Level Copy Number Variant (CNV) in the DGV.
FK	Variant_Type_ID	INTEGER	Yes	This value represents one record from the 'Variant_Type' table and can be used to cross-reference a record in this table to its Variant (via 'Variant_ID') and associated Variant Type.
	Copy_Number	CHARACTER VARYING	Yes	This is the sequence number of the Copy in a given CNV. For example, if a CNV has 5 copies, there will be five records in this table, each representing one of the Copies of that CNV. For the five records in this example , 'Copy_Number' will have the values "1", "2", "3", "4", and "5". To find the total number of copies in one CNV, perform a query on the 'Variant' table (can use the 'Variant_ID' from this table) and get the 'Copy_Number_Count'.
	Copy_Description	CHARACTER VARYING	No	This is a desription of the structure, or composition of a given Copy of a CNV. An example of a description is "inverted copy" or "copy with SNP".
	Copy_Chromosome	CHARACTER VARYING	No	This is the Chromosome on which the Copy of the CNV is found.
	Copy_Start	CHARACTER VARYING	No	This is the start coordinate for the Copy.
	Copy_End	CHARACTER VARYING	No	This is the end coordinate for the Copy.
	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Copy_Number, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Variant_Type_Copy_Number	Non Identifying	Variant_Type	Copy_Number	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Copy_Number_1	Variant_Type_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Copy_Number_ID
	Column Constraint	Not Null	Variant_Type_ID
	Column Constraint	Not Null	Copy_Number
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Copy_Number	Table Constraint	Primary Key	PRIMARY KEY (Copy_Number_ID)
TCC_Copy_Number_1	Table Constraint	Check	Copy_Start <= Copy_End
Variant_Type_Copy_Number	Table Constraint	Foreign Key	FOREIGN KEY (Variant_Type_ID) REFERENCES Variant_Type(Variant_Type_ID)

Entity: Coverage

Entity details:

Description	This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
Primary key constraint name	PK_Coverage

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	ID	SERIAL	Yes	This is the primary key (PK) for this table; each record in this table will be assigned a unique PK.
	Chromosome	CHARACTER VARYING	No
	Nucleotides_Covered	BIGINT	No
	Chromosome_Length	BIGINT	No
	Percent_Coverage	DOUBLE PRECISION	No
	Reference_Assembly	CHARACTER VARYING	No
	Load_Date	TIMESTAMP	No

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	'NOW()'
	Column Constraint	Not Null	ID
PK_Coverage	Table Constraint	Primary Key	PRIMARY KEY (ID)

Entity: Cytoband_Overlap

Entity details:

Description	This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
Primary key constraint name	PK_Cytoband_Overlap

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	ID	SERIAL	Yes	This is the primary key (PK) for this table; each record in this table will be assigned a unique PK.
	Variant_ID	INTEGER	No
	Cytoband	CHARACTER VARYING	No
	Reference_Assembly	CHARACTER VARYING	No
	Load_Date	TIMESTAMP	No

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	'NOW()'
	Column Constraint	Not Null	ID
PK_Cytoband_Overlap	Table Constraint	Primary Key	PRIMARY KEY (ID)
TUC_Cytoband_Overlap_1	Table Constraint	Unique	UNIQUE (Variant_ID, Cytoband, Reference_Assembly)

Entity: Dataset

Entity details:

Description	This table represents an instance of output, or Dataset, generated by processing one or more Samples on a given Platform (employing a given Method) within a Study. One Study can have many Datasets.
Primary key constraint name	PK_Dataset

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Dataset_ID	SERIAL	Yes	This field uniquely represents one Dataset that was generated in one Study. A Dataset is the output of examining one or more Samples on a Platform.
	Dataset_Name	CHARACTER VARYING	No	If a Name has been assigned to a Dataset within a Study, that Name would be captured here. Currently DGVa and dbVar do not capture this information.
	Dataset_Number	INTEGER	No	This is a sequential number assigned to each Dataset generated in a Study. This value is not unique in this table or across the database. For example, if five Datasets were generated in a Study, each Dataset (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
	Dataset_Description	CHARACTER VARYING	No	This field captures a brief description of the Dataset analyzed in a Study. This information is currently not captured by DGVa or dbVar. For example, if a Study is performing a re-analysis of a Dataset generated in a previous Strudy, such details would be captured in this Description field.
FK	Study_ID	INTEGER	No	This is the unique identifier for a Study from the 'Study' table. This refers to the Study in which the Dataset was generated.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Dataset, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Dataset_Dataset_Analysis	Non Identifying	Dataset	Dataset_Analysis	Zero Or More
Dataset_Dataset_Xref	Non Identifying	Dataset	Dataset_Xref	Zero Or More
Dataset_Method_Platform_Sample	Non Identifying	Dataset	Method_Platform_Sample	Zero Or More
Dataset_Reference	Non Identifying	Dataset	Reference	Zero Or More
Study_Dataset	Non Identifying	Study	Dataset	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Dataset_1	Study_ID
IDX_Dataset_2	Dataset_ID	UNIQUE

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Dataset_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Dataset	Table Constraint	Primary Key	PRIMARY KEY (Dataset_ID)
Study_Dataset	Table Constraint	Foreign Key	FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
TUC_Dataset_1	Table Constraint	Unique	UNIQUE (Dataset_Number, Study_ID)

Entity: Dataset_Analysis

Entity details:

Description This table describes each Dataset that is compared to one Reference during one Analysis within one Study. One or more Analyses may be performed on the same Dataset in a Study. One or more Datasets may be Analyzed in one Study. An Analysis is the comparison of the Study's Dataset to a Reference using an Analysis Tool. The output of the Analysis is the Variant set (described primarily in the 'Variant' table and secondarily in the 'Variant_Type', 'Copy_Number', and 'Variant_Mapping' tables). For example, if one Analyses examined three Datasets, there would be three records in this table, each having the same 'Analysis_ID' but different 'Dataset_ID'.

Primary key constraint name PK_Dataset_Analysis

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Dataset_Analysis_ID	SERIAL	Yes	This uniquely identifies one Dataset that is compared to one Reference within a Study by one Analysis.
	Dataset_Analysis_Number	INTEGER	Yes	This is a sequential number assigned to each Dataset-Analysis employed in a Study. This value is not unique in this table or across the database. For example, if five Dataset-Analysis were employed in a Study, each Dataset-Analysis (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
FK	Analysis_ID	INTEGER	Yes	This is the unique identifer for one Analysis performed within a Study. This can be used to cross-reference one record in the 'Analysis' table.
FK	Reference_ID	INTEGER	Yes	This uniquely identifies a given Reference against which a Dataset is compared during an Analysis within one Study. This can be used to cross-reference one record in the 'Reference' table.
FK	Dataset_ID	INTEGER	Yes	This field uniquely represents one Dataset that was generated in one Study. This can be used to cross-reference one record in the 'Dataset' table.
FK	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Dataset_Analysis_Reference, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Analysis_Dataset_Analysis	Non Identifying	Analysis	Dataset_Analysis	Zero Or More
Dataset_Analysis_Variant_Analysis	Non Identifying	Dataset_Analysis	Variant_Analysis	Zero Or More
Dataset_Dataset_Analysis	Non Identifying	Dataset	Dataset_Analysis	Zero Or More
Reference_Dataset_Analysis	Non Identifying	Reference	Dataset_Analysis	Zero Or More
Study_Dataset_Analysis	Non Identifying	Study	Dataset_Analysis	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Dataset_Analysis_1	Analysis_ID
IDX_Dataset_Analysis_2	Reference_ID
IDX_Dataset_Analysis_3	Dataset_ID
IDX_Dataset_Analysis_4	Study_ID
IDX_Dataset_Analysis_5	Dataset_Analysis_ID	UNIQUE

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Dataset_Analysis_ID
	Column Constraint	Not Null	Dataset_Analysis_Number
	Column Constraint	Not Null	Analysis_ID
	Column Constraint	Not Null	Reference_ID
	Column Constraint	Not Null	Dataset_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
Analysis_Dataset_Analysis	Table Constraint	Foreign Key	FOREIGN KEY (Analysis_ID) REFERENCES Analysis(Analysis_ID)
Dataset_Dataset_Analysis	Table Constraint	Foreign Key	FOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID)
PK_Dataset_Analysis	Table Constraint	Primary Key	PRIMARY KEY (Dataset_Analysis_ID)
Reference_Dataset_Analysis	Table Constraint	Foreign Key	FOREIGN KEY (Reference_ID) REFERENCES Reference(Reference_ID)
Study_Dataset_Analysis	Table Constraint	Foreign Key	FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)

Entity: Dataset_Xref

Entity details:

Description	This table captures any external identifers used to reference a Dataset that exists in the DGV. This information is currently not being captured by DGVa or dbVar. For example, DGVa Dataset accession numbers would be recorded here. Those identifiers would cross-referecnce Datasets in the DGVa.
Primary key constraint name	PK_Dataset_Xref

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Dataset_Xref_ID	SERIAL	Yes	This is the unique identifer for one mapping of a Dataset in the DGV to an external identifier.
FK	Dataset_ID	INTEGER	Yes	This is the unique identifer from the 'Dataset' table that represents one Dataset in the DGV. This can be used to cross-reference one record in the 'Dataset' table.
	External_Dataset_ID	CHARACTER VARYING	Yes	A Study may give a Dataset a Name or Identifier-code; this could also be the identifier assigned by an external informaiton system (e.g. DGVa). This field will capture either one of these that has been assigned to a given Dataset. This information is currently not being captured by DGVa or dbVar. For example, this could be an accession number, e.g. DGVa Dataset accession number, or the Dataset ID assigned in a Study, e.g. "A01".
	External_ID_Source	CHARACTER VARYING	Yes	This is the original source system from which the 'External_Dataset_ID' comes. This information is currently not being captured by DGVa or dbVar. For example, this could be "DGVa" or "dbVar", "Smith et al 2009" or a PubMed ID.
	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Dataset_Xref, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Dataset_Dataset_Xref	Non Identifying	Dataset	Dataset_Xref	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Dataset_Xref_1	Dataset_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Dataset_Xref_ID
	Column Constraint	Not Null	Dataset_ID
	Column Constraint	Not Null	External_Dataset_ID
	Column Constraint	Not Null	External_ID_Source
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
Dataset_Dataset_Xref	Table Constraint	Foreign Key	FOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID)
PK_Dataset_Xref	Table Constraint	Primary Key	PRIMARY KEY (Dataset_Xref_ID)
TUC_Dataset_Xref_1	Table Constraint	Unique	UNIQUE (External_Dataset_ID, External_ID_Source)

Entity: Feature_Overlap

Entity details:

Description	This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
Primary key constraint name	PK_Feature_Overlap

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	ID	SERIAL	Yes	This is the primary key (PK) for this table; each record in this table will be assigned a unique PK.
	Feature	CHARACTER VARYING	No
	Number_of_Features	INTEGER	No
	Number_of_CNVs_Overlapped	INTEGER	No
	Percent_of_CNVs_Overlapped	DOUBLE PRECISION	No
	Number_of_Features_Overlapped	INTEGER	No
	Percent_of_Features_Overlapped	DOUBLE PRECISION	No
	Reference_Assembly	CHARACTER VARYING	No
	Load_Date	TIMESTAMP	No

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	'NOW()'
	Column Constraint	Not Null	ID
PK_Feature_Overlap	Table Constraint	Primary Key	PRIMARY KEY (ID)

Entity: Filter

Entity details:

Description This table stores all records that are filtered out during the DGV Filtering steps, which is part of the automated DGV-curation pipeline. Each record that is filtered out is recorded with its Primary Key (unique table ID), table name, reason for being filtered out, and at what step the record was filtered out. Timestamps on each record help keep track of when, specifically, a record was filtered out.

Primary key constraint name PK_Filter

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Filter_ID	SERIAL	Yes	This is the primary key of the Filter table; it uniquely represents a record in this table.
	Run_ID	INTEGER	Yes	This identifier uniquely represents one run of the Filtering steps during automated DGV data curation.
	Study_ID	INTEGER	Yes	This value represents the Study to which a given record in this table is associated. For example, if "Variant 123" from "Study 1" was Filtered out because it did not meet the DGV minimum size threshold, the value in this field would be "1".
	Table_Name	CHARACTER VARYING	Yes	This is the table from which the Filtered record originated. For example, if a record from the 'Variant' table was filtered out, the value in this field would be "Variant".
	Table_ID	INTEGER	Yes	This is the primary key for the record as it is represented in its original table. For example, if "Variant 123" was filtered out, the value in this field would be "123". This value in conjunction with the 'Table_Name' uniquely represents a record in the database. (I.e. 'table_name.primary_key')
	Filter_Step	INTEGER	Yes	The value in this field describes the step during the automated DGV curation process on which the record was filtered out. For example, if the "Variant 123" record was filtered out during step 2 of the curation process, the value in this field would be "2".
	Filter_Reason	CHARACTER VARYING	Yes	The value in this field represents the reason for which a record was filtered out during the automated DGV curation process. For example, if "Variant 123" was filtered out becuase it did not meet the DGV minimum size threshold, the value in this field would be "too small".
	Load_Date	TIMESTAMP	Yes	This is the timestamp that describes when a given record was loaded to this table. This is used for auditing and maintenance purposes.
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	Y
	Column Constraint	Not Null	Filter_ID
	Column Constraint	Not Null	Run_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Table_Name
	Column Constraint	Not Null	Table_ID
	Column Constraint	Not Null	Filter_Step
	Column Constraint	Not Null	Filter_Reason
	Column Constraint	Not Null	Load_Date
	Column Constraint	Default	'now()'
PK_Filter	Table Constraint	Primary Key	PRIMARY KEY (Filter_ID)

Entity: Gene_Overlap

Entity details:

Description	This stand-alone table is used for generating the Statistics charts on the DGV website. If users had access to the database (e.g. super-user access to write queries), they would not typically query this table. It is not part of the DGV per se, but is part of the same schema so queries can be written against the DGV and this table more efficiently.
Primary key constraint name	PK_Gene_Overlap

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	ID	SERIAL	Yes	This is the primary key (PK) for this table; each record in this table will be assigned a unique PK.
	Variant_ID	INTEGER	No
	Gene	CHARACTER VARYING	No
	Reference_Assembly	CHARACTER VARYING	No
	Load_Date	TIMESTAMP	No

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	'NOW()'
	Column Constraint	Not Null	ID
PK_Gene_Overlap	Table Constraint	Primary Key	PRIMARY KEY (ID)
TUC_Gene_Overlap_1	Table Constraint	Unique	UNIQUE (Variant_ID, Gene, Reference_Assembly)

Entity: Merged_Analysis

Entity details:

Description Data from multiple Analyses can be merged and further Analyzed within one Study. This table captures the relationshjps between these multiple Analyses. For example, Analysis 1, Analysis 2, and Analysis 3 could then be merged into Analysis 4. This table captures the relationship between Analyses 1, 2, and 3 and Analysis 4. Analyses 1, 2, 3 are called "Sub Analyses" and Analysis 4 is called a "Super Analysis".

Primary key constraint name PK_Merged_Analysis

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Merged_Analysis_ID	SERIAL	Yes	The value in this field uniquely identifies one record in this table. This value represents one relationship between a "Super Analysis" and one of the "Sub Analyses" that comprise the "Super Analysis".
FK	Sub_Analysis_ID	INTEGER	Yes	This represents the lower-level Analysis that was merged with another Analysis to create a higher-level Analysis, called the "Super Analysis".
FK	Super_Analysis_ID	INTEGER	Yes	This represents the higher-level Analysis that is comprised of two or more lower-level Analyses, called "Sub Analyses".
	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated (or was generated in the case of a join table composed of only database identifiers).
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample_Merged_Variant, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Analysis_Sub_Analysis	Non Identifying	Analysis	Merged_Analysis	Zero Or More
Analysis_Super_Analysis	Non Identifying	Analysis	Merged_Analysis	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Merged_Analysis_1	Sub_Analysis_ID
IDX_Merged_Analysis_2	Super_Analysis_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Merged_Analysis_ID
	Column Constraint	Not Null	Sub_Analysis_ID
	Column Constraint	Not Null	Super_Analysis_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
Analysis_Sub_Analysis	Table Constraint	Foreign Key	FOREIGN KEY (Sub_Analysis_ID) REFERENCES Analysis(Analysis_ID)
Analysis_Super_Analysis	Table Constraint	Foreign Key	FOREIGN KEY (Super_Analysis_ID) REFERENCES Analysis(Analysis_ID)
PK_Merged_Analysis	Table Constraint	Primary Key	PRIMARY KEY (Merged_Analysis_ID)

Entity: Method_Platform_Sample

Entity details:

Description This table captures information about an instance of a Method and of a Platform applied to a given Sample within a Study. Also captured is the Dataset to which the Sample-Method-Platform combination belongs. A Study can employ one or more Methods and one or more Platforms, in examining one or more Samples. Within one Study, the same Sample or Samples can be examined by one or more Methods or Platforms. This table captures the complex relationship between Method, Platform, and Sample.

Primary key constraint name PK_Method_Platform_Sample

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Method_Platform_Sample_ID	SERIAL	Yes	This is the unique identifer to repsesent one instance of a Sample that is examined using one Method and one Platform in a given Study.
FK	Method_Study_ID	INTEGER	Yes	This is the unique identifier from the 'Method_Study' table that represents one instance of a Method being employed in one Study. This can be used to cross-reference one record in the 'Method_Study' table.
FK	Platform_Study_ID	INTEGER	Yes	This is the unique identifier from the 'Platform_Study' table that represents one instance of a Platform being employed in one Study. This can be used to cross-reference one record in the 'Platform_Study' table.
FK	Sample_Study_ID	INTEGER	Yes	This is the unique identifer within the DGV of one instance of a Sample being examined in one Study.
FK	Dataset_ID	INTEGER	Yes	This is the unique identifier from the 'Dataset' table that describes to which Dataset a given Sample, analyzed by a given Method-Platform, belongs.
	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated.
	Source_System	CHARACTER VARYING	Yes	01.21.10 This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Method_Platform_Sample', etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Dataset_Method_Platform_Sample	Non Identifying	Dataset	Method_Platform_Sample	Zero Or More
Method_Study_Method_Platform_Sample	Non Identifying	Method_Study	Method_Platform_Sample	Zero Or More
Platform_Study_Method_Platform_Sample	Non Identifying	Platform_Study	Method_Platform_Sample	Zero Or More
Sample_Study_Method_Platform_Sample	Non Identifying	Sample_Study	Method_Platform_Sample	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Method_Platform_Sample_1	Platform_Study_ID
IDX_Method_Platform_Sample_2	Dataset_ID
IDX_Method_Platform_Sample_3	Sample_Study_ID
IDX_Method_Platform_Sample_4	Method_Study_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Method_Platform_Sample_ID
	Column Constraint	Not Null	Method_Study_ID
	Column Constraint	Not Null	Platform_Study_ID
	Column Constraint	Not Null	Sample_Study_ID
	Column Constraint	Not Null	Dataset_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
Dataset_Method_Platform_Sample	Table Constraint	Foreign Key	FOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID)
Method_Study_Method_Platform_Sample	Table Constraint	Foreign Key	FOREIGN KEY (Method_Study_ID) REFERENCES Method_Study(Method_Study_ID)
PK_Method_Platform_Sample	Table Constraint	Primary Key	PRIMARY KEY (Method_Platform_Sample_ID)
Platform_Study_Method_Platform_Sample	Table Constraint	Foreign Key	FOREIGN KEY (Platform_Study_ID) REFERENCES Platform_Study(Platform_Study_ID)
Sample_Study_Method_Platform_Sample	Table Constraint	Foreign Key	FOREIGN KEY (Sample_Study_ID) REFERENCES Sample_Study(Sample_Study_ID)

Entity: Method_Study

Entity details:

Description	This table captures each instance of a Method employed within one Study. One Study may employ one or more Methods, and/or one or more instances of the same Method.
Primary key constraint name	PK_Method_Study

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Method_Study_ID	SERIAL	Yes	This uniquely identifies within the DGV one instance of a Method employed in one Study. For example, BAC_aCGH may be employed twice in one Study; each instance of this Method being employed in that Study would get its own unique 'Method_Study_ID'
FK	Study_ID	INTEGER	Yes	This is the unique identifier from the 'Study' table and represents the Study in which an instance of a Method was employed.
	Method_Name	CHARACTER VARYING	No	This is the name of a given Method employed in a Study; e.g. BAC_aCGH; MCD_analysis.
	Method_Number	INTEGER	No	This is a sequential number assigned to each Method empoyed in a Study. This value is not unique in this table or across the database. For example, if five Methods were generated in a Study, each Method (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
	Method_Type	CHARACTER VARYING	No	This field indicates whether a Method employed in a given Study was either a 'Discovery Method' or a 'Validation Method'. The values in this field are either "Discovery" or "Validation".
	Method_Description	CHARACTER VARYING	No	This field captures a brief description of the Method employed in a Study. This information is not always provided for a given Study.
	Primary_Data_Flag	CHARACTER VARYING	No	This field will capture whether the data being analyzed in a given Study is Primary or a Re-Analysis of Primary data that was generated in a different Study. This information is not always provided in a Study (and currently may not be captured by DGVa or dbVar). The values in this field are, "Y", if the data is Primary data generated by the Study in question, or "N" if the data was generated in a different Study. If the 'Primary_Data_Flag' is "N", to find the Method employed in the original Study that generated the data, query the 'Original_Method' field in this table.
	Original_Method	CHARACTER VARYING	No	If data that was generated in a different Study is being re-analyzed/meta-analyzed in a new Study, the 'Primary_Data_Flag' is "N", indicating the data was originally generated in a previous Study. This field captures the Method used in the original Study to first generate the Dataset being analyzed in this Study. This information is not always provided in a Study (and currently may not be captured by DGVa or dbVar).
	Originators_of_Data	CHARACTER VARYING	No	The field should capture the original source of the data i.e. the group who ran the samples and generated the dataset. This field could capture, e.g. the PMID, the primary author of a Study, the Site at which/organization by whom the original Study was performed. This information is not always provided in a Study (and currently may not be captured by DGVa or dbVar).
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Method, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Method_Study_Method_Platform_Sample	Non Identifying	Method_Study	Method_Platform_Sample	Zero Or More
Study_Method_Study	Non Identifying	Study	Method_Study	One Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Method_Study_1	Study_ID
IDX_Method_Study_2	Method_Study_ID	UNIQUE

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Method_Study_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Method_Study	Table Constraint	Primary Key	PRIMARY KEY (Method_Study_ID)
Study_Method_Study	Table Constraint	Foreign Key	FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
TCC_Method_Study_1	Table Constraint	Check	Method_Type = 'Validation' OR Method_Type = 'Discovery'

Entity: Platform_Study

Entity details:

Description	This table captures each instance of a Platform being employed in a Study. One Study may employ one or more Platforms.
Primary key constraint name	PK_Platform_Study

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Platform_Study_ID	SERIAL	Yes	This is the database unique identifier that represents one instance of a Platform being employed in one Study.
FK	Study_ID	INTEGER	Yes	This is the unique identifier from the 'Study' table and represents the Study in which an instance of a Platform was employed.
	Platform_Number	INTEGER	No	This is a sequential number assigned to each Platform employed in a Study. This value is not unique in this table or across the database. For example, if five Platforms were employed in a Study, each Platform (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
	Platform_Name	CHARACTER VARYING	No	This is the name used to refer to a Platform. For example, "Affymetrix GeneChip Human Genome U95Av2 [HG_U95Av2]"
	Platform_Description	CHARACTER VARYING	No	This is a short description of a given Platform. Not all Platforms will necessarily have a description for them but if one is available, it will be captured here.
	Platform_Type	CHARACTER VARYING	No	This field captures the Type of a given Platform. This information may not necessarily be provided in a Study.
	Platform_Version	CHARACTER VARYING	No	This describes the version of a given Platform. This information may not necessarily be provided in a Study.
	GEO_Accession_Num	CHARACTER VARYING	No	This field will capture the identifier/accession number that represents a given Platform within the GEO data repository. This information is not always provided in a Study. e.g. GEO accession number, GPL1352
	ArrayExpress_Accession_Num	CHARACTER VARYING	No	This field will capture the identifier/accession number that represents a given Platform within the ArrayExpress data repository. This information is not always provided in a Study. e.g. ArrayExpress accession number, A-AFFY-65
	Source_System	CHARACTER VARYING	Yes	This field contains an identifier of the data repository in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Platform_Study', etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Platform_Study_Method_Platform_Sample	Non Identifying	Platform_Study	Method_Platform_Sample	Zero Or More
Study_Platform_Study	Non Identifying	Study	Platform_Study	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Platform_Study_1	Study_ID
IDX_Platform_Study_2	Platform_Study_ID	UNIQUE

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Platform_Study_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Platform_Study	Table Constraint	Primary Key	PRIMARY KEY (Platform_Study_ID)
Study_Platform_Study	Table Constraint	Foreign Key	FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)

Entity: Reference

Entity details:

Description	This table describes the Reference applied to one Dataset in one Study. One or more Datasets are compared against one or more References. Each instance of this is captured in this table. A Reference is that to which the Dataset set in a given Study is compared by Analysis.
Primary key constraint name	PK_Reference

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Reference_ID	SERIAL	Yes	This uniquely identifies a given Reference against which a Dataset is compared within one Study.
FK	Dataset_ID	INTEGER	Yes	This field uniquely represents one Dataset that was generated in one Study.
	Reference_Number	INTEGER	Yes	This is a sequential number assigned to each Reference employed in a Study. This value is not unique in this table or across the database. For example, if five References were employed in a Study, each Reference (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
	Reference_Type	CHARACTER VARYING	No	This is the type of Reference that could be employed in a Study. The Reference type could be: "Ref_sequence", "Sampleset", "Sample", "Control_tissue", "Merged", or "Other".
	Reference_Sample_ID	CHARACTER VARYING	No	If the Reference Type = "Sample", this field will capture the Sample identifier associated to that Sample, if one is provided. For example, if one HapMap Sample is used as a Reference, the HapMap Sample ID (e.g. NA07019) could be provided in this field.
	Reference_Sequence	CHARACTER VARYING	No	This field captures the specific Sequence name if the Reference Type = "Reference Sequence". Examples of Sequence names are "HuRef" and "Hg18/Build36".
	Reference_Description	CHARACTER VARYING	No	This is a brief description of the Reference, if one is available.
FK	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Reference_Dataset, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Dataset_Reference	Non Identifying	Dataset	Reference	Zero Or More
Reference_Dataset_Analysis	Non Identifying	Reference	Dataset_Analysis	Zero Or More
Study_Reference	Non Identifying	Study	Reference	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Reference_1	Reference_ID	UNIQUE
IDX_Reference_2	Dataset_ID
IDX_Reference_3	Study_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Reference_ID
	Column Constraint	Not Null	Dataset_ID
	Column Constraint	Not Null	Reference_Number
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
Dataset_Reference	Table Constraint	Foreign Key	FOREIGN KEY (Dataset_ID) REFERENCES Dataset(Dataset_ID)
PK_Reference	Table Constraint	Primary Key	PRIMARY KEY (Reference_ID)
Study_Reference	Table Constraint	Foreign Key	FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
TUC_Reference_1	Table Constraint	Unique	UNIQUE (Dataset_ID, Reference_Number)

Entity: Reference_Assembly

Entity details:

Description	This table is used as a back-end to serve a drop down list on the new DGV website for a user to select a given Reference Assembly in which to view the data.
Primary key constraint name	PK_Reference_Assembly

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Reference_Assembly_ID	SERIAL	Yes	This is the identifier that uniquely represents a given Reference Assembly in this table.
	Reference_Assembly_Name	CHARACTER VARYING	No	This is the common name used for a given Reference Assembly; e.g. NCBI36/hg18
	Load_Date	TIMESTAMP	No	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Study, etc.

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	'NOW()'
	Column Constraint	Not Null	Reference_Assembly_ID
PK_Reference_Assembly	Table Constraint	Primary Key	PRIMARY KEY (Reference_Assembly_ID)

Entity: Sample

Entity details:

Description This table contains all of the Samples that have been examined in any Study that is part of the DGV. Some Studies do not report Sample IDs but instead only a Variant ID. In these cases, it will not be possible to associate a given Variant to a Sample. This would typically be due to the need to protect the privacy of the individual who provided the Sample e.g. in cases where there is non-consent to share Sample information

Primary key constraint name PK_Sample

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Sample_ID	SERIAL	Yes	This is the unique identifier of a given Sample in the DGV. This identifier represents a Sample within the DGV and does not represent the unique identifier of a Sample in other databases. It is important to note that this identifier is to uniquely identify one Sample in the DGV. Many Studies may examine the same Sample, e.g. a HapMap Sample; this Sample should exist only once in this table but may be cross-referenced several times in the 'Sample_Study' and 'Sample_Xref' tables. we should only load that Sample once to the database
	Biological_Source	CHARACTER VARYING	No	This field captures a brief description of the Biological Source of a Sample. For example, the 'Biological_Source' could be "cell line", "blood", "tissue" etc. Please note how this field relates to the 'Source_Type' field.
	Source_Type	CHARACTER VARYING	No	This field describes the type of the Sample. For example, a Sample could be of the Type "Genomic DNA", "RNA", "Clone library", etc. Please note how this field relates to the 'Biological_Source' field.
	Sample_Description	CHARACTER VARYING	No	This field will capture a Description of the Sample when necessary/if available. For example, a Sample could be "a pool of 10 anonymous male donors". In these cases, a Description of the Sample is important, esp. with regards to representation of Sample/Subject numbers in the DGV (i.e. a pool of 10 anonymous donors translates from one Sample to 10 Subjects).
	Subject_ID	CHARACTER VARYING	No	If a Subject identifier of some kind is provided in a Study - as associated to a given Sample, that Subject identifier will be captured here. Note: one Subject can provide one or more Samples, each of which is captured in the Sample table. Also, in many Studies, the HapMap Sample ID (e.g. "NA19215") is also included as the Subject ID.
	Karyotype	CHARACTER VARYING	No	This field will capture the Karyotype of a Sample. One person can provide one or more Samples and each Sample may have a different Karyotype. For example, an individual with cancerous cells may have one Karyotype from a normal tissue Sample and another Karyotype from a cancer-tissue Sample. Examples of Karyotypes are: 46, XY; 46, XX, 5p-; 46, XX, del(5)(p15.2) (the first two being normal karyotypes) It should be noted that DGV currently only presents data for healthy controls so karyotype information will likely not be included for a given Sample.
	Ethnicity	CHARACTER VARYING	No	This field captures the "Ethnicity" of a Sample. "Ethnicity" is sometimes also referred to as "Population" and refers to the geographic origins of an individual. The use of the term 'Ethnicity' can vary between Studies. Examples of 'Ethnicity' are: "Asian", "Yoruban", "European", "African"
	Gender	CHARACTER VARYING	No	This field captures the gender of the Subject who provided the Sample. The value in this field can be "Male", "Female", "Unknown", or NULL.
	Family_ID	CHARACTER VARYING	No	This ID will represent a given Family whose data are stored in the DGV. The 'Family_ID' is submitted with a Study and not generated by the database. Thus it is possible that two different Families are assigned the same 'Family_ID', but not likely. For example, where HapMap Samples are all from the same Family, the same 'Family_ID' will be assigned to each Family member's respective Sample.
	Family_Type	CHARACTER VARYING	No	This field will describe the type of Family represented by Samples in this table. This field could describe whether a family is a trio, for example. This field could also capture whether a Family is 'Simplex' or 'Multiplex'.
	Family_Member	CHARACTER VARYING	No	This field will capture which member of a Family is represented by a given Sample. For example, if a Sample is from the mother in a Family, the value in this field will be "mother", from the father, this value will be "father", etc..
	Mother_ID	CHARACTER VARYING	No	If a Sample is part of a Family and Family information is provided, this field would contain the ID of the Mother of the individual who provided this Sample.This information can be useful when looking at pedigree.
	Father_ID	CHARACTER VARYING	No	If a Sample is part of a Family and Family information is provided, this field would contain the ID of the Father of the individual who provided this Sample . This information can be useful when looking at pedigree.
FK	Cohort_ID	INTEGER	No	This is the Cohort to which a Sample belongs. This field may not be populated. A Sample could belong to a commonly referenced Cohort such as HapMap or the Human Genome Diversity Panel. A Sample could also belong to a Cohort within a specific Study.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally archived. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Cohort_Sample	Non Identifying	Cohort	Sample	Zero Or More
Sample_Sample_Study	Non Identifying	Sample	Sample_Study	One Or More
Sample_Sample_Xref	Non Identifying	Sample	Sample_Xref	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Sample_1	Cohort_ID
IDX_Sample_2	Sample_ID	UNIQUE

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Sample_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
Cohort_Sample	Table Constraint	Foreign Key	FOREIGN KEY (Cohort_ID) REFERENCES Cohort(Cohort_ID)
PK_Sample	Table Constraint	Primary Key	PRIMARY KEY (Sample_ID)

Entity: Sample_Pooled

Entity details:

Description	This table captures the mapping between a Pooled Sample and the individual Samples that comprise the Pooled Sample - when this information is provided for a given a Study.
Primary key constraint name	PK_Sample_Pooled

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Sample_Pooled_ID	SERIAL	Yes	This field uniquely identifies one instance of a mapping between an individual Sample and a Pooled Sample. For each individual Sample that comprises a Pooled Sample in the database, a record will be stored in this table. For example, if a Pooled Sample is comprised of ten individual Samples, there will be ten records stored in this table, each with the same 'Pooled_Sample_ID' (which can be cross-referenced with a 'Sample_Study_ID' in the 'Sample_Study' table) and ten different 'Sample_Study_ID's (each of which can be cross-referenced with a 'Sample_Study_ID' in the 'Sample_Study' table), each record will have a unique 'Sample_Pooled_ID'.
FK	Individual_Sample_ID	INTEGER	Yes	This is the 'Sample_Study_ID' of one of the individual Samples that comprise a Pooled Sample. There will be two or more 'Individual_Sample_IDs' for every 'Pooled_Sample_ID' in this table.
FK	Pooled_Sample_ID	INTEGER	Yes	This is the unique identifer of one Pooled Sample within the database. For each 'Pooled_Sample_ID' there will be two or more 'Individual_Sample_IDs'.
	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV, that Study in which the Pooled Samples were analyzed.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally archived. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample/Pooled Sample, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Sample_Individual_to_Pooled	Non Identifying	Sample_Study	Sample_Pooled	Zero Or More
Sample_Pooled_to_Pooled	Non Identifying	Sample_Study	Sample_Pooled	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Sample_Pooled_1	Individual_Sample_ID
IDX_Sample_Pooled_2	Pooled_Sample_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Sample_Pooled_ID
	Column Constraint	Not Null	Individual_Sample_ID
	Column Constraint	Not Null	Pooled_Sample_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Sample_Pooled	Table Constraint	Primary Key	PRIMARY KEY (Sample_Pooled_ID)
Sample_Individual_to_Pooled	Table Constraint	Foreign Key	FOREIGN KEY (Individual_Sample_ID) REFERENCES Sample_Study(Sample_Study_ID)
Sample_Pooled_to_Pooled	Table Constraint	Foreign Key	FOREIGN KEY (Pooled_Sample_ID) REFERENCES Sample_Study(Sample_Study_ID)

Entity: Sample_Study

Entity details:

Description	This table resolves the many-to-many relationship between the Study and Sample tables. One Study may examine one or more Samples. One Sample may be examined in one or more Studies.
Primary key constraint name	PK_Sample_Study

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Sample_Study_ID	SERIAL	Yes	This is the unique identifer within the DGV of one instance of a Sample being examined in one Study. One Study may examine one or more Samples. One or more Samples may be examined by one or many Studies. This field is the unique identifer that resolves the many-to-many relationship between these entities (Sample and Study).
FK	Sample_ID	INTEGER	Yes	This value uniquely identifies a Sample in the database and will cross-reference one record in the 'Sample' table.
FK	Study_ID	INTEGER	Yes	This value uniquely identifies a Study in the database and will cross-reference one record in the 'Study' table.
	Case_or_Control	CHARACTER VARYING	No	This field will capture whether a Sample is a Case or a Control in a Study. This field will not always be populated. Note that for the current DGV only Controls, or healthy individuals, are part of the database.
	Sample_Number	INTEGER	No	This is the number assigned to a Sample within a Study. E.g. if 100 Samples were examined in a Study, those Samples would be assigned a number from 1 to 100. The 'Sample_Number' is meant to label each Sample differently WITHIN a Study and is not a unique identifier across Studies. The 'Sample_ID' and 'Sample_Study_ID' uniquely identify a Sample within the database.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given 'Sample_Study', etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Sample_Individual_to_Pooled	Non Identifying	Sample_Study	Sample_Pooled	Zero Or More
Sample_Pooled_to_Pooled	Non Identifying	Sample_Study	Sample_Pooled	Zero Or More
Sample_Sample_Study	Non Identifying	Sample	Sample_Study	One Or More
Sample_Study_Method_Platform_Sample	Non Identifying	Sample_Study	Method_Platform_Sample	Zero Or More
Study_Sample_Study	Non Identifying	Study	Sample_Study	One Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Sample_Study_1	Sample_ID
IDX_Sample_Study_2	Study_ID
IDX_Sample_Study_3	Sample_Study_ID	UNIQUE

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Sample_Study_ID
	Column Constraint	Not Null	Sample_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Sample_Study	Table Constraint	Primary Key	PRIMARY KEY (Sample_Study_ID)
Sample_Sample_Study	Table Constraint	Foreign Key	FOREIGN KEY (Sample_ID) REFERENCES Sample(Sample_ID)
Study_Sample_Study	Table Constraint	Foreign Key	FOREIGN KEY (Study_ID) REFERENCES Study(Study_ID)
TUC_Sample_Study_1	Table Constraint	Unique	UNIQUE (Study_ID, Sample_Number)

Entity: Sample_Xref

Entity details:

Description This table captures external identifiers used to refer to Samples stored in the DGV. For example, a Sample may be assigned an (external) identifier when that Sample is analyzed in a Study. When that Sample is loaded to the DGV, it will receive a DGV database identifier. This table will capture the cross-reference between the external sample identifiers/accession numbers and the DGV identifiers. There could be external identifiers from other systems or projects, such as the HapMap project, the Coriell repository, or the Human Genome Diversity Panel. This table resolves the cross-referencing of these identifiers/accession numbers and the DGV Sample identifiers.

Primary key constraint name PK_Sample_Xref

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Sample_Xref_ID	SERIAL	Yes	This is the unique identifer for one mapping of a DGV Sample identifier to an external identifier. For example, if a Sample has been assigned two external identifiers, there will be two records in this table for that Sample, each having one of the two external identifiers.
FK	Sample_ID	INTEGER	Yes	This is the unique identifier that represents one Sample in the DGV. A Sample should be loaded only once to the database and assigned only one Sample ID. This means that there could be one or more records in this table that have the same Sample ID (one for each external identifier assigned to that Sample).
	External_Sample_ID	CHARACTER VARYING	Yes	This field captures an exernal identifier that is used to describe a Sample that exists in the DGV. The external identifier could be an accession number, e.g. DGVa Sample accession number, or a Common Cohort Sample ID, e.g. the HamMap Sample ID "NA11994".
	External_ID_Source	CHARACTER VARYING	Yes	This is the original source system from which the 'External_Sample_ID' comes. For example, this could be "DGVa" or "dbVar" or "HapMap", or the PubMed ID from a given Study, etc.
	External_ID_Type	CHARACTER VARYING	No	This is a further qualifier for an External Sample Identifier, if any has been assigned to an external identifier. For example, this could say "accession number" or the like.
	Cohort_Name	CHARACTER VARYING	No	This is the Cohort to which a Sample belongs; this information may or may not be provided in a Study. Examples of Cohorts are "HapMap" or "Human Genome Diversity Panel/Project".
	Study_ID	INTEGER	No	This is the unique database identifier that represents one Study in the DGV, from which a given record in this table originated.
	Source_System	CHARACTER VARYING	Yes	This field contains an identifier of the data repository in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Sample_Xref, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Sample_Sample_Xref	Non Identifying	Sample	Sample_Xref	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Sample_Xref_1	Sample_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Sample_Xref_ID
	Column Constraint	Not Null	Sample_ID
	Column Constraint	Not Null	External_Sample_ID
	Column Constraint	Not Null	External_ID_Source
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Sample_Xref	Table Constraint	Primary Key	PRIMARY KEY (Sample_Xref_ID)
Sample_Sample_Xref	Table Constraint	Foreign Key	FOREIGN KEY (Sample_ID) REFERENCES Sample(Sample_ID)
TUC_Sample_Xref_1	Table Constraint	Unique	UNIQUE (External_Sample_ID, External_ID_Source)

Entity: Study

Entity details:

Description This table captures information about the Study in which structural variants were discovered and validated (not all discovered variants are validated in a Study). The 'universal identifier' of the study, usually the 'PubMed ID' (PMID) is captured in this table; the intention of this ID is to uniquely identify the Study across other databases. The 1000 Genomes Project Studies generate multiple input files per Study, or a "Study group". As each of those input files are loaded to the DGV, each will be assigned its own Study_ID. In these cases, all files from a Study group will have the same 'Study_Accession' value (e.g. "estd59" represents many input files from the 1000 Genomes project; one Study Accession number per multiple files). Sometimes an external database will be referenced in a record in the database. Below is a translation of the abbreviations one might find in a record in this database, and the full names of those abbreviations/external databases: AE = EBI Array Express CLONE = NCBI CLONE dbSNP = NCBI dbSNP dbGaP = NCBI dbGAP GENBANK = NCBI GENBANK GENE = NCBI GENE GEO = NCBI GEO HPO = Human Phenotype Ontology MeSH = Medical Subject Headings OMIM = Online Mendelian Inheritance in Man PROBE = NCBI PROBE SRA = NCBI Sequence Read Archive (to be discontinued in 2011-2012) ERA = EBI SRA TRACE = NCBI TRACE Archive (to be discontinued in 2011-2012)

Primary key constraint name PK_Study

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Study_ID	SERIAL	Yes	This is the unique identifier for a given Study within the DGV. This only uniquely identifies a Study in the DGV and not across other (external) databases.
	PubMed_ID	INTEGER	No	This is also known as 'PMID' and is the unique indentifer for a citation within PubMed. This identifier can be used to uniquely identify a publication, or Study, across many databases and is seen as a "universal" identifer for a given Study.
	Display_Name	CHARACTER VARYING	Yes	This is the common name used to refer to a Study. These names are also used by DGVa and dbVar to refer to a Study. Examples of common names are: "Cooper et al 2008" or "Kidd et al 2008".
	Study_Accession	CHARACTER VARYING	No	The value in this field will be the accession number assigned to a Study by either NCBI-dbVar or EBI-DGVa. This accession number can then be used to cross-reference the Study in either dbVar or DGVa. If the Study was accessioned in dbVar, this value will be prefixed with "nstd"; e.g. "nstd35" If the Study was accessioned in DGVa, this value willl be prefixed with "estd"; e.g. "estd1"
	OneK_Genomes_Project	CHARACTER VARYING	No	This is a flag to indicate whether a Study is part of the 1000 Genomes Project. Multiple "Studies" in the database may actually represent one Study from the 1000 Genomes Project. These Studies generate multiple input files per Study, thus generating multiple Study records in the database. These Studies will have a 1000 Genomes Project flag = "Y". The 'Study_Accession' value can be used to group records in this table (and subsequently on joins to other tables) to see which database Studies are related to one another, thereby representing one 1000 Genomes Project Study.
	Primary_Author_Name	CHARACTER VARYING	No	This is the name of the Author whose name appears first on a publication for a given Study. It is noted that there could be multiple authors on a given Study but these are not captured in the DGV. For the full list of authors of a Study, one should refer to the original publication.
	Primary_Author_ID	CHARACTER VARYING	No	This is a unique identifier that an Author would use to refer to himself/herself across multiple databases and within Studies. Examples of an 'Author_ID' are "ResearcherID", "OpenID", "Scopus ID"
	Project_ID	CHARACTER VARYING	No	This field may capture a Project Identifier if one has been assigned to a Study, or in the case that the Study is not in PubMed (note this is only in rare cases and these Studies will likely at least have a 'Study_Accession' from either dbVar or DGVa).
	Principal_Investigator	CHARACTER VARYING	No	This field will capture the Principal Investigator of a project if this information is provided for a given Study.
	NCBI_Tax_ID	INTEGER	No	This field captures the NCBI Taxonomy ID for the organism being Studied. For humans, the Taxonomy ID is "9606" so users should expect to always see this value in this field for the DGV. NB A very few Studies loaded to the DGV archival database (DGVS) may also contain data for non-humans; these data are filtered out during the automated DGV-curation pipeline and not presented to users.
	Sample_Size	INTEGER	No	This is the number of Samples examined in one Study and will be an integer greater than zero.
	Variant_Count	INTEGER	No	This is the reported number of Variants that were discovered in a given Study. This value is assigned by either DGVa or dbVar. It is possible that the reported Variant_Count (stored in this table) is different than the actual Variant Count. When performing computations using record counts, it is best to count the actual number of reported Variants as opposed to using this value.
	Site	CHARACTER VARYING	No	This refers to the primary or main Site or Organization at which a Study is performed. This information is not always provided in a Study. For example, if a 1000 Genomes Study was conducted at multiple sites, the value in this field would be "1000 Genomes: multiple sites".
	Description	CHARACTER VARYING	No	This field will capture any other descriptors of a Study that need to be stored in the DGV. For example, if there were special circumnstances under which a Study was performed, or if a special Pre-Publication Agreement exists, this information would be captured here.
	Curation_Comments	CHARACTER VARYING	No	This field will capture a summary of DGV curation applied to a given Study. This information will be a summary of the DGV Filtering and Merging process. This information will be made available on the Downloads page of the DGV website under the links to the original data ('Comments' field of the Studies' listing).
	Hold_Until_Publication_Flag	CHARACTER VARYING	No	* this field + its timestamp are only used in the DGVS (staging database for archiving, not for user-access) and not the public version, DGVC - and will infrequently, if at all, be used* This field will capture whether a Study is to be held until publication. If a Variant is to be held until publication, this value will be "Y"; if it is not to be held until publication, this value will be either "N" or "NULL". If the value in this field is "Y", there will also be a timestamp indicating the date/time when the Variant can be published; this will be captured in the 'Publish_Date' field.
	Publish_Date	TIMESTAMP	No	The value in this field will be the date the Study's Author has given us to publish the Study. This field will infrequently, if at all, be used. If the Author does not provide a specific date, this value will be set to a date far into the future which will be reset when the Author has given the go-ahead to publish the Variant.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Study, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Study_Analysis	Non Identifying	Study	Analysis	Zero Or More
Study_Dataset	Non Identifying	Study	Dataset	Zero Or More
Study_Dataset_Analysis	Non Identifying	Study	Dataset_Analysis	Zero Or More
Study_Method_Study	Non Identifying	Study	Method_Study	One Or More
Study_Platform_Study	Non Identifying	Study	Platform_Study	Zero Or More
Study_Reference	Non Identifying	Study	Reference	Zero Or More
Study_Sample_Study	Non Identifying	Study	Sample_Study	One Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Study_1	Study_ID	UNIQUE

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	'NOW()'
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Display_Name
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Study	Table Constraint	Primary Key	PRIMARY KEY (Study_ID)

Entity: Study_File_Prefix

Entity details:

Description
Primary key constraint name	PK_Study_File_Prefix

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Study_File_Prefix_ID	SERIAL	Yes
	Study_ID	INTEGER	Yes
	Study_File_Prefix	CHARACTER VARYING	Yes
	Source_System	CHARACTER VARYING	Yes
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	'now()'
	Column Constraint	Not Null	Study_File_Prefix_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Study_File_Prefix
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	'Y'
	Column Constraint	Not Null	Load_Date
PK_Study_File_Prefix	Table Constraint	Primary Key	PRIMARY KEY (Study_File_Prefix_ID)

Entity: Supporting_Merged_Variant

Entity details:

Description This table describes which Supporting Variants comprise a given Merged Variant. Each record in this table will include the Merged Variant ID and one of its corresponding Supporting Variant IDs; there will be 'n' number of records for every 'n' number of Supporting Variants that comprise a Merged Variant. Note: The Variant table stores both Supporting and Merged Variant records. One Merged Variant is comprised of one or more Supporting Variants; Supporting Variants are not always reported in a Study (i.e. sometimes only the Merged, or "asserted" Variants are reported). This, the 'Sample_Merged_Variant' table resolves the one-to-many relationship of Supporting to Merged Variants that exist in the 'Variant' table. Both dbVar and DGVa describe what DGV calls a "Merged Variant" as an "asserted Variant" (SV; e.g. esv12345) and all three systems call the Variants the comprise a Merged Variant, "Supporting Variants" (SSV; e.g. nssv12324).

Primary key constraint name PK_Supporting_Merged_Variant

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Supporting_Merged_Variant_ID	SERIAL	Yes	9.10.9 This field represents one instance of a mapping between a Sample Level and a Merged Variant. For each Merged Variant, there will be as many records in this table as there are Sample Level Variants that comprise the Merged Variant (provided these data are available from a published dataset).
FK	Merged_ID	INTEGER	Yes	This field captures the unique identifer of a given Merged Variant in the DGV. This field cross-references one record in the Variant table. The Variant table stores both Supporting and Merged Variants. In the case of DGVa and dbVar, this is the "asserted variant", which is comprised of "supporting variants" (those are referenced in the 'Supporting_Variant_ID' field).
FK	Supporting_Variant_ID	INTEGER	Yes	This field captures the unique identifer of a given Supporting Variant in the DGV. This field cross-references one record in the Variant table. The Variant table stores both Supporting and Merged Variants. In the case of DGVa and dbVar, this is the "supporting variant", which comprises the "asserted variant" (those are referenced in the 'Merged_ID' field).
	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Supporting_Merged_Variant, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Sample_Level	Non Identifying	Variant	Supporting_Merged_Variant	Zero Or More
Variant_Supporting_Merged_Variant	Non Identifying	Variant	Supporting_Merged_Variant	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Supporting_Merged_Variant_1	Merged_ID
IDX_Supporting_Merged_Variant_2	Supporting_Variant_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Supporting_Merged_Variant_ID
	Column Constraint	Not Null	Merged_ID
	Column Constraint	Not Null	Supporting_Variant_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Supporting_Merged_Variant	Table Constraint	Primary Key	PRIMARY KEY (Supporting_Merged_Variant_ID)
Sample_Level	Table Constraint	Foreign Key	FOREIGN KEY (Supporting_Variant_ID) REFERENCES Variant(Variant_ID)
Variant_Supporting_Merged_Variant	Table Constraint	Foreign Key	FOREIGN KEY (Merged_ID) REFERENCES Variant(Variant_ID)

Entity: Translocation_Mapping

Entity details:

Description This table captures the mapping coordinates of a Translocation Variant or any other Variant that may have mappings across two Chromosomes. e.g. translocation, transposon A translocation can span more than one Chromosome, creating a one-to-many relationship between a Variant and its respective mapping records. Note: For other Variant Types, their respective mappings are stored in the 'Variant_Mapping' table. Translocation Mapping information is not likely to be provided in a Study until technology is refined enough to detect such events.

Primary key constraint name PK_Translocation_Mapping

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Translocation_Mapping_ID	SERIAL	Yes	This uniquely represents one mapping for one Translocation Variant stored in the DGV. A Translocation Variant may have multiple mappings reported for each part of the Variant. Each part will be represented by this unique identifer. All of the parts comprising one Translocation Variant can be associated in this table by the 'Variant_ID'.
FK	Variant_ID	INTEGER	Yes	9.14.9 This is the unique identifer for a Variant in the DGV. The 'Variant_ID' can appear more than once in this table if that Variant has been mapped to more than one reference assembly of the human genome. This can be used to cross-reference one record in the 'Variant' table.
	Reference_Assembly	CHARACTER VARYING	Yes	This is the Reference Assembly that was used to map a variant's coordinates. Examples of Reference Assemblies are: GRCh37 (build 37), hg18/Build 36, and hg17/Build 35.
	Study_Mapping	CHARACTER VARYING	Yes	This field indicates whether this was the original Variant Mapping that was published as part of a Study's dataset or whether the mapping was a result of mitrating the Study-published mapping to another Reference Assembly (e.g. using the LiftOver tool to migrate from Hg17 to Hg18). If this is the original Variant Mapping as published within a Study, the value in this field is "Y"; if it is not the original Variant Mapping, the value in this field will be "N" or blank.
	Placement_Method	CHARACTER VARYING	No	This is a short term used (by DGVa and dbVar) to describe how a Variant was mapped to the genome. The default value is "Genomic" which describes the mapping provided by the Authors of a Study (i.e. the Submitter of the Study's data to dbVar/DGVa). Other terms include: Artifact, DGV, Clone Registry, Clone ad-hoc, Cytogenetic, Flanking sequences, Remapped auto, Remapped ad-hoc.
	Remap_Status	CHARACTER VARYING	No	This field captures a short descriptor of the status of a given Variant Mapping after it has been forward mapped to a newer Reference Assembly. This is a value used by dbVar and DGVa and may not always be provided in a Study. Values that could be in this field are: Perfect (perfectly mapped), Deletion (a deletion occured in the remapping), Insertion (an insertin occured in the remapping), No map (the placement did not map).
	Recipient	CHARACTER VARYING	No	This field is specific to DGVa and dbVar and is captured here to provide full context of a Variant Mapping. For Variants that span across more than one locus (usually across Chromosomes), the value in this field describes whether that second location is either "known" or "ambiguous".
	Unmapped_Flag	CHARACTER VARYING	No	A Variant might not map to a given Reference Assembly when that Variant is re-mapped, or migrated, to a newer version of a Reference Assembly (e.g. using the UCSC LiftOver tool to re-map/migrate from Hg17 to Hg18). For Variants that do not map to a given Reference Assembly, the value in this field will be "Y". If a Variant was successfully mapped to a Reference Assembly, this fileld will be NULL. The current DGV pipeline may not use this field.
	Span	INTEGER	No	This field captures the Span of a Variant, if available. The Span is be the distance between the mapped start and mapped end of the variant on the reference assembly. For example, in a PEM study, the clone end anchors will delineate the start-end of variant, in aCGH, the first/last probe with signal intensity difference will delineate the variant span.
	Size	INTEGER	No	This field captures the Size of the Variant, if available. The Size is be the number of basepairs contained within the variant. In many cases the size will be equal to the span. For PEM and insertions into the reference, these values may be different, for aCGH etc, they will be the same.
	Estimated_Size	INTEGER	No	This field captures the Estimated Size of a Variant, if available. The Estimated Size will provide an approximate size, depending on the methods and resolution of the various approaches within a study. For example, in the case where an insertion is called using a PEM strategy, the ends map to the reference at a distance less than expected by the known fragment insert size (in some studies less than 3 SD from the mean mapping size). Based on the size of the insert library (can't capture insertion larger than clone) and the SD (ability to resolve a small insertion from the inherent variability of insert sizes within that library), a max-min size can be estimated without sequencing the actual insert.
	DGV_Calculated_Size	INTEGER	No	If no Size information was reported with a Variant, the DGV-pipeline may calculate the Size of the Variant using its Start and End coordinates. This calculation is only done if determined appropriate for a given Study. The Size is calculated as such: 'End' - 'Start' + 1
	Chromosome_Number	CHARACTER VARYING	No	This is the primary Chromosome on which a Translocation Variant has been discovered.
	Start_B	INTEGER	No	This is the Average Start Boundary of a Variant. Alternatively, if only one Start Coordinate has been provided for a Variant, it will be stored here.
	End_B	INTEGER	No	This is the Average End Boundary of a Variant. Alternatively, if only one End Coordinate has been provided for a Variant, it will be stored here.
	Inner_Start_B	INTEGER	No	This is the Minimum Start Boundary of a Variant.
	Inner_End_B	INTEGER	No	This is the Minimum End Boundary of a Variant.
	Outer_Start_B	INTEGER	No	This is the Maximum Start Boundary of a Variant.
	Outer_End_B	INTEGER	No	This is the Maximum End Boundary of a Variant.
	Chromosome_Random_Num	CHARACTER VARYING	No	This field is to capture coordinates for variants that cannot be mapped on a chromosome but for which we know to what chromosome the variant belongs.
	Start_Random	CHARACTER VARYING	No	This is the Start coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs.
	End_Random	CHARACTER VARYING	No	This is the End coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs.
	DGV_Start	INTEGER	No	This field is for internal use only. This and the 'DGV_End' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline.
	DGV_End	INTEGER	No	This field is for internal use only. This and the 'DGV_Start' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline.
	Variant_Number	INTEGER	Yes	This is a sequential number assigned to a Variant within a reported Study and is not the database identifier for that Variant (which is the 'Variant.Variant_ID'). For a Translocation Variant, the mapping for each part of the Translocation will be assigned the same Variant Number in the input file. In conjunction with the 'Variant_ID', a user will be able to determine which records/mappings group together to represent one Translocation Variant.
	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV.
	Source_System	CHARACTER VARYING	No	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	No	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Mapping, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Variant_Translocation_Mapping	Non Identifying	Variant	Translocation_Mapping	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Translocation_Mapping_1	Variant_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Translocation_Mapping_ID
	Column Constraint	Not Null	Variant_ID
	Column Constraint	Not Null	Reference_Assembly
	Column Constraint	Not Null	Study_Mapping
	Column Constraint	Default	Y
	Column Constraint	Not Null	Variant_Number
	Column Constraint	Not Null	Study_ID
	Column Constraint	Default	Y
PK_Translocation_Mapping	Table Constraint	Primary Key	PRIMARY KEY (Translocation_Mapping_ID)
TCC_OuterStartA_Translocation_Mapping_2	Table Constraint	Check	Outer_Start_B <= Outer_End_B
TCC_StartA_Translocation_Mapping_1	Table Constraint	Check	Start_B <= End_B
Variant_Translocation_Mapping	Table Constraint	Foreign Key	FOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID)

Entity: Variant

Entity details:

Description This table captures all Variants reported in Studies that have been loaded to the DGV. This table stores both Supporting and Merged Variants. For Studies that were accessioned by DGVa, the Variants are assigned a DGVa accession number that starts with "e"; for Studies that were accessioned by dbVar, the Variants are assigned a dbVar accession number that starts with "n". For example, a Supporting Variant, in DGVa will be assigned an accession number like "essv12345"; merged Variants, or what DGVa and dbVar call "asserted" Variants, would be assigned an accession numer like "esv45678". For dbVar, these examples, instead, would be, respectively, "nssv12345" and "nsv45678".

Primary key constraint name PK_Variant

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Variant_ID	SERIAL	Yes	This is the unique identifier for a Variant in the DGV.
	Variant_Number	INTEGER	Yes	This is a sequential number assigned to each Variant discovered or validated in a Study. This value is not unique in this table or across the database. For example, if five Variants were discovered in a Study, each Variant (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
	Sample_Merged_Flag	CHARACTER VARYING	Yes	This field distinguishes whether a Variant is a Supporting Variant or a Merged Variant. If the Variant is Ssupporting, the value in this field will be "S". If the Variant is Merged, the value in this field will be "M".
	Study_ID	INTEGER	No	This is the unique database identifier that represents one Study in the DGV.
	Sample_Study_ID	INTEGER	No	This is the unique identifier in the DGV for the Sample in which the Variant was found in a given Study. It is the unique identifier from the 'Sample_Study' table. This can be used to cross-reference one record in the 'Sample_Study' table. NB this value represents an instance of a Sample as it is analyzed in one Study. The same Sample could be analyzed in different Studies, each instance having a different 'Sample_Study_ID', but the same 'Sample_ID'.
	Validation_Flag	CHARACTER VARYING	No	This field captures whether a Variant has been Validated or not, and the results of the Validation. A Variant needs to be 'Discovered' in a Sample before it can be 'Validated'. A Variant is Validated by performing further assays. -- Flag Values are as follows: "Y" means that a Validation Method was applied and the Variant was validated "A" means that a Validation Method was attempted but the Variant was not validated "N" means that no Validation Method was applied or no Validation information was provided for a given Variant NB: Multiple Validation Methods can be applied to one Variant; if any of those methods yeilds a positive validation result, the Validation_Flag="Y"
	Validation_Method	CHARACTER VARYING	No	If a Discovered Variant is also Validated, and information about Validation is provided for that Variant, the Validation Method (if available) is captured here. The list of Validation Methods can be found in the 'Exchange.XSD' file as the controlled list "MethodTypeCV"
	Validation_Platform_Name	CHARACTER VARYING	No	If a Discovered Variant is also Validated, and information about Validation is provided for that Variant, the Validation Platform (if available) is captured here.
	Landmark	CHARACTER VARYING	No	This field will capture any 'Landmark' that was used to discover a Variant. Typically the Landmark is a Clone employed to find the location of a Variant on the genome. Thus the most common value in this field will be the Clone Name. E.g. "RP11-420B6"; "AL031643".
	Inheritance	CHARACTER VARYING	No	This is to indicate whether a Variant is inherited or otherwise. The value in this field could be for example: "Maternal", "Paternal", "Germline", "De novo", "Somatic", "Not tested", "Tested - inconclusive", "Biparental", or "Uniparental". This informaiton may not always be provided in a published dataset.
	Sequence	CHARACTER VARYING	No	The sequence of the variant, if available, is captured in this field. Alternatively, this field may not contain the actual sequence but rather may be a pointer to another table or file where the actual sequence is stored.
	Probe_Count	INTEGER	No	This field refers to the number of probes on the array, within the region called as a Variant.
	Merging_Criteria	TEXT	No	If a Variant is a Merged Variant, a brief description of the criteria used to merge Supporting Variants to the Merged Variants will be provided here.
	Ethnicity	CHARACTER VARYING	No	In the case where a Variant is reported with Ethnicity information but with no corresponding Sample information, Ethnicity will be captured here.
	Gender	CHARACTER VARYING	No	In the case where a Variant is reported with Gender information but with no corresponding Sample information, Gender will be captured here. The value in this field can be "Male", "Female", or "Unknown", or NULL.
	Cohort_Name	CHARACTER VARYING	No	This is the Cohort to which a Variant belongs; this information may or may not be provided in a Study. This information is more commonly found in the Sample related tables ('Sample', 'Sample_Xref') but could also be captured here in cases when a Variant is reported with no associated Sample. Examples of Cohorts are "HapMap" or "Human Genome Diversity Project".
	Platform_Name	CHARACTER VARYING	No	If the Platform is known for which a given Variant is associated, this information may be captured here. The value in this field is the name of the Platform used in analyzing a given Sample to discover or validate the Structural Variants described in this table.
	DGV_Merged_Flag	CHARACTER VARYING	No	A value of "Y" in this field indicates that this is a Variant that was merged using the DGV-merging criteria (currently 70% reciprocal overlap). If the value in this field is blank or "N", this Variant is not a DGV-merged variant. If the 'Supporting_Merged_Flag' = "S" and 'DGV_Merged_Flag = "N", this is a Supporting Variant that was reported in a Study. There should be no records in this table with a 'Supporting_Merged_Flag' = "S" and 'DGV_Merged_Flag = "Y".
	DGVA_Inferred_Flag	CHARACTER VARYING	No	Some Studies do not submit sample-level calls (SSVs) to DGVa or dbVar, but rather only merged calls, or "asserted Variants" (SVs). Thus during accessioning of these Studies, DGVa or dbVar will create what DGVa calls "curated" SSVs to support the submitted SVs; DGV calls these "inferred Variants". A typical scenario in which dbVar or DGVa would created these Inferred Variants is to assign Sample IDs to a Merged Variant (one Inferred Variant per Sample ID). In order to distinguish between SSVs that are submitted as part of a Study from SSVs that are created by DGVa/dbVar during the accessioning process, DGV has established this flag (DGVA_Inferred_Flag). If an SSV is created by DGVa/dbVar during the accessioning process, this flag will be set to "Y" (yes); if an SSV was submitted as part of a Study's data submission, this flag will be set to "N" (no). These Variants are filtered out in the DGV pipeline.
	Comments	CHARACTER VARYING	No	If there is any supplemental information about a Variant that cannot be populated into any of the other fields in the 'Variant' table, that information will go here. e.g. "does not validate experimentally" may need to be appended to/annotated for a Variant
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Sample_Level	Non Identifying	Variant	Supporting_Merged_Variant	Zero Or More
Variant_Supporting_Merged_Variant	Non Identifying	Variant	Supporting_Merged_Variant	Zero Or More
Variant_Translocation_Mapping	Non Identifying	Variant	Translocation_Mapping	Zero Or More
Variant_Variant_Analysis	Non Identifying	Variant	Variant_Analysis	Zero Or More
Variant_Variant_Mapping	Non Identifying	Variant	Variant_Mapping	Zero Or More
Variant_Variant_Type	Non Identifying	Variant	Variant_Type	Zero Or More
Variant_Variant_Xref	Non Identifying	Variant	Variant_Xref	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Variant_1	Variant_ID	UNIQUE

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Variant_ID
	Column Constraint	Not Null	Variant_Number
	Column Constraint	Not Null	Sample_Merged_Flag
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Variant	Table Constraint	Primary Key	PRIMARY KEY (Variant_ID)

Entity: Variant_Analysis

Entity details:

Description One or more Variants can be related to one or more 'Dataset_Analysis' records. This table resolves the M:M relationship between the 'Dataset_Analysis' and 'Variant' tables. This is in order to support the relationships defined in the dbVar/DGVa XSD. For each 'Dataset_Analysis' record to which a Variant is associated, this table will store a record for each instance of this. For example, Variant_1 is associated to Dataset_Analysis 1, 2, and 3. So in the 'Variant_Analysis' table there would be three records, each containing the same 'Variant_ID' and each with a different 'Dataset_Analysis_ID', i.e. 1, 2, or 3.

Primary key constraint name PK_Variant_Analysis

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Variant_Analysis_ID	SERIAL	Yes	This is the unique identifier that represents one record in this table; i.e. one instance of a Dataset-Analysis and associated Variant.
FK	Dataset_Analysis_ID	INTEGER	Yes	This uniquely identifies one Dataset that is compared to one Reference within a Study by one Analysis.
FK	Variant_ID	INTEGER	Yes	This is the unique identifier for a Variant in the DGV.
	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Analysis, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Dataset_Analysis_Variant_Analysis	Non Identifying	Dataset_Analysis	Variant_Analysis	Zero Or More
Variant_Variant_Analysis	Non Identifying	Variant	Variant_Analysis	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Variant_Analysis_1	Dataset_Analysis_ID
IDX_Variant_Analysis_2	Variant_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Variant_Analysis_ID
	Column Constraint	Not Null	Dataset_Analysis_ID
	Column Constraint	Not Null	Variant_ID
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
Dataset_Analysis_Variant_Analysis	Table Constraint	Foreign Key	FOREIGN KEY (Dataset_Analysis_ID) REFERENCES Dataset_Analysis(Dataset_Analysis_ID)
PK_Variant_Analysis	Table Constraint	Primary Key	PRIMARY KEY (Variant_Analysis_ID)
Variant_Variant_Analysis	Table Constraint	Foreign Key	FOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID)

Entity: Variant_Mapping

Entity details:

Description	This table captures the mapping coordinates of a Variant. The purpose of capturing these data in a separate table is to afford mappings to any of the human genome reference assemblies (e.g. hg17, hg18) for a given Variant
Primary key constraint name	PK_Variant_Mapping

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Variant_Mapping_ID	SERIAL	Yes	This uniquely represents one mapping for one Variant stored in the DGV. This table allows for multiple mappings of one Variant; e.g. to two iterations of the human genome reference assembly
FK	Variant_ID	INTEGER	Yes	This is the unique identifer for a Variant in the DGV. The 'Variant_ID' can appear more than once in this table if that Variant has been mapped to more than one reference assembly of the human genome. This can be used to cross-reference one record in the 'Variant' table.
	Reference_Assembly	CHARACTER VARYING	No	This is the Reference Assembly that was used to map a variant's coordinates. Examples of Reference Assemblies are: GRCh37 (build 37), hg18/Build 36, and hg17/Build 35.
	Study_Mapping	CHARACTER VARYING	Yes	This field indicates whether this was the original Variant Mapping that was published as part of a Study's dataset or whether the mapping was a result of migrating the Study-published mapping to another Reference Assembly (e.g. using the LiftOver tool to migrate from Hg17 to Hg18). If this is the original Variant Mapping as published within a Study, the value in this field is "Y"; if it is not the original Variant Mapping, the value in this field will be "N" or blank.
	Placement_Method	CHARACTER VARYING	No	This is a short term used (by DGVa and dbVar) to describe how a Variant was mapped to the genome. The default value is "Genomic" which describes the mapping provided by the Authors of a Study (i.e. the Submitter of the Study's data to dbVar/DGVa). Other terms include: Artifact, DGV, Clone Registry, Clone ad-hoc, Cytogenetic, Flanking sequences, Remapped auto, Remapped ad-hoc.
	Remap_Status	CHARACTER VARYING	No	This field captures a short descriptor of the status of a given Variant Mapping after it has been forward mapped to a newer Reference Assembly. This is a value used by dbVar and DGVa. Values that could be in this field are: Perfect (perfectly mapped), Deletion (a deletion occured in the remapping), Insertion (an insertin occured in the remapping), No map (the placement did not map).
	Recipient	CHARACTER VARYING	No	This field is specific to DGVa and dbVar and is captured here to provide full context of a Variant Mapping. For Variants that span across more than one locus (usually across Chromosomes), the value in this field describes whether that second location is either "known" or "ambiguous".
	Unmapped_Flag	CHARACTER VARYING	No	A Variant might not map to a given Reference Assembly when that Variant is re-mapped, or migrated, to a newer version of a Reference Assembly (e.g. using the UCSC LiftOver tool to re-map/migrate from Hg17 to Hg18). For Variants that do not map to a given Reference Assembly, the value in this field will be "Y". If a Variant was successfully mapped to a Reference Assembly, this fileld will be NULL. The current DGV pipeline may not use this field.
	Span	INTEGER	No	This field captures the Span of a Variant, if available. The Span is be the distance between the mapped start and mapped end of the variant on the reference assembly. For example, in a PEM study, the clone end anchors will delineate the start-end of variant, in aCGH, the first/last probe with signal intensity difference will delineate the variant span.
	Size	INTEGER	No	This field captures the Size of the Variant, if available. The Size is be the number of basepairs contained within the variant. In many cases the size will be equal to the span. For PEM and insertions into the reference, these values may be different, for aCGH etc, they will be the same.
	Estimated_Size	INTEGER	No	This field captures the Estimated Size of a Variant, if available. The Estimated Size will provide an approximate size, depending on the methods and resolution of the various approaches within a study. For example, in the case where an insertion is called using a PEM strategy, the ends map to the reference at a distance less than expected by the known fragment insert size (in some studies less than 3 SD from the mean mapping size). Based on the size of the insert library (can't capture insertion larger than clone) and the SD (ability to resolve a small insertion from the inherent variability of insert sizes within that library), a max-min size can be estimated without sequencing the actual insert.
	DGV_Calculated_Size	INTEGER	No	If no Size information was reported with a Variant, the DGV-Curation process may calculate the Size of the Variant using its Start and End coordinates. This calculation is only done if determined appropriate for a given Study. The Size is calculated as such: 'End' - 'Start' + 1
	Chromosome_Number	CHARACTER VARYING	No	This is the primary Chromosome on which a Variant has been mapped. The 'Chromosome_B_Number' captures the secondary Chromosome on which a Variant has been discoverd, for example, in the case of a translocation.
	Start_A	INTEGER	No	This is the Average Start Boundary of a Variant. Alternatively, if only one Start Coordinate has been provided for a Variant, it will be stored here.
	End_A	INTEGER	No	This is the Average End Boundary of a Variant. Alternatively, if only one End Coordinate has been provided for a Variant, it will be stored here.
	Inner_Start_A	INTEGER	No	This is the Minimum Start Boundary of a Variant.
	Inner_End_A	INTEGER	No	This is the Minimum End Boundary of a Variant.
	Outer_Start_A	INTEGER	No	This is the Maximum Start Boundary of a Variant.
	Outer_End_A	INTEGER	No	This is the Maximum End Boundary of a Variant.
	Chromosome_Random_Num	CHARACTER VARYING	No	This field is to capture coordinates for variants that cannot be mapped on a chromosome but for which we know to what chromosome the variant belongs.
	Start_Random	CHARACTER VARYING	No	This is the Start coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs.
	End_Random	CHARACTER VARYING	No	This is the End coordinate that is randomly assigned to a variant that cannot be mapped to a chromosome but for which we know what chromosome the variant belongs.
	DGV_Start	INTEGER	No	This field is for internal use only. This and the 'DGV_End' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline.
	DGV_End	INTEGER	No	This field is for internal use only. This and the 'DGV_Start' field will store start and end coordinates used to determine DGV_Calculated_Size, which will be used for Filter/Merge pipeline.
	Study_ID	INTEGER	No	This is the unique database identifier that represents one Study in the DGV.
	Variant_Number	INTEGER	No	This is a sequential number assigned to a Variant within a reported Study and is not the database identifier for that Variant (which is the 'Variant.Variant_ID'). For example, if five Variants were discovered in a Study, each Variant (before being loaded to the database) will be assigned a sequential number from "1" to "5" inclusive. This allows records from the input file to be easily cross-referened in the database, especially for quality checks.
	Source_System	CHARACTER VARYING	No	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	No	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Mapping, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Variant_Variant_Mapping	Non Identifying	Variant	Variant_Mapping	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Variant_Mapping_1	Variant_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Variant_Mapping_ID
	Column Constraint	Not Null	Variant_ID
	Column Constraint	Not Null	Study_Mapping
	Column Constraint	Default	Y
	Column Constraint	Default	Y
PK_Variant_Mapping	Table Constraint	Primary Key	PRIMARY KEY (Variant_Mapping_ID)
TCC_OuterStartA_Variant_Mapping_2	Table Constraint	Check	Outer_Start_A <= Outer_End_A
TCC_StartA_Variant_Mapping_1	Table Constraint	Check	Start_A <= End_A
Variant_Variant_Mapping	Table Constraint	Foreign Key	FOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID)

Entity: Variant_Type

Entity details:

Description This table contains a list of all possible Variant Types that can be attributed to a given Variant. A Variant could have more than one type, e.g. Inversion + Deletion. As technologies develop further, more refined Variant Type descriptions could arise. This table will allow for multiple Variant Types to be ascribed to one Variant. NB The ICGC nomenclature already describes Variant Types with sub-categories. This is an example of how a Variant could be described with multiple Variant Types.

Primary key constraint name PK_Variant_Type

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Variant_Type_ID	SERIAL	Yes	This is the unique identifier to represent one instance of a Variant Type in the DGV.
FK	Variant_ID	INTEGER	Yes	This is the unique identifer that represents one Variant in the DGV. It is the Variant that is described by the 'Variant_Type'. This can be used to cross-reference one record in the 'Variant' table.
	Variant_Type	CHARACTER VARYING	Yes	This field will capture the Variant Type ascribed to one Varian. Typically a Variant will only be assigned one Variant Type; i.e. "CNV" or "Other"; a Variant can also be assigned a 'Variant_Sub_Type'; e.g. "Gain" or "Complex". A Variant could be of more than one Type, e.g. CNV + Other. If more than one Variant Type is ascribed to a Variant, that Variant will have as many records in this table as are Types assigned to that Variant, each record having the same 'Variant_ID'.
	Variant_Sub_Type	CHARACTER VARYING	No	This field will capture the Variant Sub-Type ascribed to one Variant. Typically a Variant will be assigned only one 'Variant_Sub_Type'. A Variant could have more than one Type+Sub-Type, e.g. CNV+Duplication and Other+Inversion. If more one Variant Type+Sub-Type is ascribed to a Variant there will be as many records in this table as there are for that number of combinations of Type+Sub-Type assigned to that Variant, each record having the same 'Variant_ID'.
	Copy_Number_Count	INTEGER	No	For Variants that are of type "CNV", this field indicates the number of copies of a Variant sequence. If information about each copy of a CNV is available, that information will be captured in the 'Copy_Number' table (1:M relationship with the 'Variant_Type' table). This information may not be provided in a published dataset. If this information is available, the value in this field willl typically be an absolute copy number state.
	Variant_Type_Description	CHARACTER VARYING	No	This field provides further information about a Variant Type by way of a short description of the Variant Type. An example of a Variant Type description for an Insertion is: "Insertion of DNA sequence of known position relative to a reference sequence, detected by sequence/paired-end-mapping technology".
	Genotype	CHARACTER VARYING	No	This is the genotype of a particular Variant as described for a given individual. This information may not be provided in a published dataset.
	Zygosity	CHARACTER VARYING	No	This value is related to the Variant Sub-Type. The allowable values in this field are "Heterozygous", "Homozygous", or "Hemizygous". For example, if a Variant Type = "CNV" and Sub-Type = "Deletion", the 'Zygosity' could thus be included as either "Heterozygous" or "Homozygous".
	Number_of_Gains	INTEGER	No	This value applies only to Merged Variant calls, aka Regions or Events. This refers to the number of individuals that have a copy number gain within a merged CNV event or region.
	Number_of_Losses	INTEGER	No	This value applies only to Merged Variant calls, aka Regions or Events. This refers to the number of individuals that have a copy number loss within a merged CNV event or region.
	Count	INTEGER	No	This is a value related to the Frequency of a Variant. This value is captured during DGVa/dbVar archiving and is described by DGVa/dbVar as: "Number of samples that reported the structural variant".
	Total_Reporting	INTEGER	No	This is a value related to the Frequency of a Variant. This value is captured during DGVa/dbVar archiving and is described by DGVa/dbVar as: "Number of samples assessed for structural variant".
	Reported_Frequency	NUMERIC	No	This is a value related to the Frequency of a Variant. This value is captured during DGVa/dbVar archiving and is described by DGVa/dbVar as: "Frequency of variant" This information may not be provided in a published dataset.
	Calculated_Frequency	INTEGER	No	This field will serve as a placeholder in the case that DGV develops a frequency calculation step in the automated DGV-curation pipeline. This is a frequency that would be calculated during the DGV data curation process. In cases where no 'Reported_Frequency' is available, a Calculated Frequency may be derived from other information that was reported in a Study. The 'Calculated_Frequency' has an additional descriptor ('Calc_Frequency_Qualifier') to distinguish between the frequency as being "type" (number of Gains or Losses within a Study or Cohort) or "allele" (frequency of the allele in a Variant set).
	Calc_Frequency_Qualifier	CHARACTER VARYING	No	This field will serve as a placeholder in the case that DGV develops a frequency calculation step in the automated DGV-curation pipeline. The 'Calc_Frequency_Qualifier' is an additional descriptor of 'Calculated_Frequency' to distinguish between the frequency as being either: "type" = number of Gains or Losses within a Study or Cohort OR "allele" = frequency of the allele in a Variant set This is a value that is assigned during the DGV data curation process.
	Number_of_Gains_and_Losses	INTEGER	No	This value applies only to Merged Variant calls, aka Regions or Events. This refers to the total number of individuals that have a copy number change within a merged CNV event or region.
	Comments	CHARACTER VARYING	No	If further details about the Variant Type are provided, those details could be captured in this field.
	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Type, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Variant_Type_Copy_Number	Non Identifying	Variant_Type	Copy_Number	Zero Or More
Variant_Variant_Type	Non Identifying	Variant	Variant_Type	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Variant_Type_1	Variant_ID
IDX_Variant_Type_2	Variant_Type_ID	UNIQUE

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Variant_Type_ID
	Column Constraint	Not Null	Variant_ID
	Column Constraint	Not Null	Variant_Type
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Variant_Type	Table Constraint	Primary Key	PRIMARY KEY (Variant_Type_ID)
Variant_Variant_Type	Table Constraint	Foreign Key	FOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID)

Entity: Variant_Xref

Entity details:

Description This table captures any external identifers used to reference a Variant that exists in the DGV. For example, DGVa 'esv' (asserted variant) and 'essv' (supporting variant) accession numbers would be recorded here. Those identifiers would cross-referecnce Variants in the DGVa. This table can also capture IDs from external databases for records that relate to a given Variant in this table. For example, if a Variant's sequence has been deposited to NCBI's Trace Archive (formally discontinued in February, 2011), that sequence will receive one or more TEMPLATE_IDs, e.g. TEMPLATE_ID = G248P82926C7. In this example, the 'External_ID_Source' = "Trace Archive" and the 'External_Variant_ID' = "G248P82926C7", with the corresponding 'Variant_ID' for that Variant from the 'Variant' table.

Primary key constraint name PK_Variant_Xref

Attributes:

Key	Attribute name	Data type	Not null	Description
PK	Variant_Xref_ID	SERIAL	Yes	This is the unique identifer for one mapping of a Variant in the DGV to an external identifier.
FK	Variant_ID	INTEGER	Yes	This is the unique identifer from the 'Variant' table that represents one Variant in the DGV. This can be used to cross-reference one record in the 'Variant' table.
	External_Variant_ID	CHARACTER VARYING	Yes	This field captures an exernal identifier that is used to describe a Variant that exists in the DGV. The external identifier could be an accession number, e.g. DGVa 'esv12345', an ID, or a Variant name, or the Variant ID assigned to a Variant within a Study (i.e. the Study's Variant ID).
	External_ID_Source	CHARACTER VARYING	Yes	This is the original source system from which the 'External_Variant_ID' comes. For example, this could be "DGVa" or "dbVar" or a PubMed ID.
	External_ID_Type	CHARACTER VARYING	No	This field will capture additional information about the external identifier, if it is provided in a Study. For example, DGVa may have a 'Variant Name'/'Submitter ID' and a 'Variant Accession Number'; this field will capture which label has been assigned to the identifier.
	Study_ID	INTEGER	Yes	This is the unique database identifier that represents one Study in the DGV.
	Source_System	CHARACTER VARYING	Yes	This field contains an indicator of the data store in which these data were originally stored. For example: 1) "DGVa" for data originally archived in the Database of Genomic Variants Archive (DGVa) at EBI 2) "dbVar" for data originally archived in dbVar at NCBI
	Current_Record_Flag	CHARACTER VARYING	No	This value in this field indicates whether a given record in this table is Current or not. If the value in this field is "Y", then this is the current record. If the value in this field is "N", then there is another record in this table that has more recently been added that supersedes the previous record, or the record failed one of the quality checks in the DGV pipeline. For example, if an error is discovered in a record and that record needs to be "overwritten" with a new, corrected record, there would be two similar, related records in the table, one with a 'Current_Record_Flag'="N" and one with 'Current_Record_Flag'="Y".
	Load_Date	TIMESTAMP	Yes	This field captures the load date of a record in this table. This field can be used for auditing purposes, e.g. to see when a record was loaded to the database/which version of a record existis if there are more than one for a given Variant_Xref, etc.

Relationships:

Relationship name	Type	Parent	Child	Cardinality
Variant_Variant_Xref	Non Identifying	Variant	Variant_Xref	Zero Or More

Indexes:

Index name	Index columns	Type	Sort order
IDX_Variant_Xref_1	Variant_ID

Constraints:

Constraint name	Type	Level	Constraint
	Column Constraint	Default	now()
	Column Constraint	Not Null	Variant_Xref_ID
	Column Constraint	Not Null	Variant_ID
	Column Constraint	Not Null	External_Variant_ID
	Column Constraint	Not Null	External_ID_Source
	Column Constraint	Not Null	Study_ID
	Column Constraint	Not Null	Source_System
	Column Constraint	Default	Y
	Column Constraint	Not Null	Load_Date
PK_Variant_Xref	Table Constraint	Primary Key	PRIMARY KEY (Variant_Xref_ID)
TUC_Variant_Xref_1	Table Constraint	Unique	UNIQUE (External_Variant_ID, External_ID_Source)
Variant_Variant_Xref	Table Constraint	Foreign Key	FOREIGN KEY (Variant_ID) REFERENCES Variant(Variant_ID)