A curated catalogue of human genomic structural variation




Variant Details

Variant: nssv434134



Internal ID16460025
Landmark
Location Information
TypeCoordinatesAssemblyOther Links
chr9:137879161..137882744hg38UCSC Ensembl
chr9:140773613..140777196hg19UCSC Ensembl
chr9:139893434..139897017hg18UCSC Ensembl
chr9:138049450..138053033hg17UCSC Ensembl
Cytoband9q34.3
Allele length
AssemblyAllele length
hg383584
hg193584
hg183584
hg173584
Variant TypeCNV deletion
Copy Number
Allele State
Allele Origin
Probe Count
Validation Flag
Merged StatusS
Merged Variantsnsv415556
Supporting Variants
Samples
Known GenesCACNA1B
MethodSequencing
AnalysisTraces obtained from TSC or NCBI first were trimmed to remove vector sequences using the VecScreen system from NCBI. Low quality regions containing at least five bases in a row with Phred scores below 25 (Ewing and Green 1998) then were trimmed using a custom PERL script. The longest high quality (LHQ) region from each trace was selected for further evaluation, and the remaining trimmed regions of the traces were set aside. The LHQ regions were further required to have average Phred scores of at least 25 and had to be longer than 100 bases in length. Repeats were identified and masked within the LHQ region of each trimmed trace using RepeatMasker and RepBase. The longest unmasked "anchor" region, which had to be at least 50 bases in length, then was used to assign each trace to a unique genomic location in build hg17 of the human genome using BLAST. Successfully mapped anchor sequences were required to have a single 100% match to a unique genomic location. Traces containing anchor sequences with more than one perfect match were set aside to avoid traces that mapped to segmental duplications (Bailey et al.2002). The LHQ regions of successfully mapped traces then were unmasked and aligned to their assigned genomic locations using BLAST2seq (NCBI). Polymorphisms were mined from these alignments using custom PERL scripts. We required the five bases on each side of a polymorphism candidate to have Phred scores that were 25 or higher. For SNP discovery, the SNP base also was required to have a Phred score of 25 or higher. Single-base pair INDELs were screened to identify double-hit INDELs, and only these were included in our final collections. Since BLAST only allows for up to a 16-base gap in the alignments, a custom PERL script was developed to identify INDELs that were larger than 16 bp in length. Upon encountering a region in the alignment that no longer matched the query, this program split trace data into two blocks. The first block (which matched the query) was maintained at the original position, whereas the second block (which did not match the query) was moved over one base at a time until a perfect match was obtained, or a distance of 10,000 bases (the maximum distance allowed by the program) was reached. There is a discrepancy of 2 variants from the published data due to the exclusion of two INDELs which were later found to be false positives.
PlatformNot reported
Comments
ReferenceMills_et_al_2006
Pubmed ID16902084
Accession Number(s)nssv434134
Frequency
Sample Size24
Observed Gainn/a
Observed Lossn/a
Observed Complexn/a
Frequencyn/a


Hosted by The Centre for Applied Genomics
Grant support for DGV
Please read the usage disclaimer