A curated catalogue of human genomic structural variation




Variant Details

Variant: dgv248n6



Internal ID20148112
Landmark
Location Information
TypeCoordinatesAssemblyOther Links
chr8:65724921..65726877hg38UCSC Ensembl
chr8:66637156..66639112hg19UCSC Ensembl
chr8:66799710..66801666hg18UCSC Ensembl
chr8:66799710..66801666hg17UCSC Ensembl
Cytoband8q13.1
Allele length
AssemblyAllele length
hg381957
hg191957
hg181957
hg171957
Variant TypeCNV deletion
Copy Number
Allele State
Allele Origin
Probe Count
Validation Flag
Merged StatusM
Merged Variants
Supporting Variantsnsv395821, nsv395858
Samples
Known GenesPDE7A
MethodSequencing
AnalysisTraces obtained from TSC or NCBI first were trimmed to remove vector sequences using the VecScreen system from NCBI. Low quality regions containing at least five bases in a row with Phred scores below 25 (Ewing and Green 1998) then were trimmed using a custom PERL script. The longest high quality (LHQ) region from each trace was selected for further evaluation, and the remaining trimmed regions of the traces were set aside. The LHQ regions were further required to have average Phred scores of at least 25 and had to be longer than 100 bases in length. Repeats were identified and masked within the LHQ region of each trimmed trace using RepeatMasker and RepBase. The longest unmasked "anchor" region, which had to be at least 50 bases in length, then was used to assign each trace to a unique genomic location in build hg17 of the human genome using BLAST. Successfully mapped anchor sequences were required to have a single 100% match to a unique genomic location. Traces containing anchor sequences with more than one perfect match were set aside to avoid traces that mapped to segmental duplications (Bailey et al.2002). The LHQ regions of successfully mapped traces then were unmasked and aligned to their assigned genomic locations using BLAST2seq (NCBI). Polymorphisms were mined from these alignments using custom PERL scripts. We required the five bases on each side of a polymorphism candidate to have Phred scores that were 25 or higher. For SNP discovery, the SNP base also was required to have a Phred score of 25 or higher. Single-base pair INDELs were screened to identify double-hit INDELs, and only these were included in our final collections. Since BLAST only allows for up to a 16-base gap in the alignments, a custom PERL script was developed to identify INDELs that were larger than 16 bp in length. Upon encountering a region in the alignment that no longer matched the query, this program split trace data into two blocks. The first block (which matched the query) was maintained at the original position, whereas the second block (which did not match the query) was moved over one base at a time until a perfect match was obtained, or a distance of 10,000 bases (the maximum distance allowed by the program) was reached. There is a discrepancy of 2 variants from the published data due to the exclusion of two INDELs which were later found to be false positives.
PlatformNot reported
Comments
ReferenceMills_et_al_2006
Pubmed ID16902084
Accession Number(s)dgv248n6
Frequency
Sample Size24
Observed Gainn/a
Observed Lossn/a
Observed Complexn/a
Frequencyn/a


Hosted by The Centre for Applied Genomics
Grant support for DGV
Please read the usage disclaimer