A curated catalogue of human genomic structural variation




Variant Details

Variant: nssv43983



Internal ID16466290
Landmark
Location Information
TypeCoordinatesAssemblyOther Links
chr10:95923269..95926815hg38UCSC Ensembl
chr10:97683026..97686572hg19UCSC Ensembl
chr10:97673016..97676562hg18UCSC Ensembl
chr10:97673016..97676562hg17UCSC Ensembl
Cytoband10q23.33
Allele length
AssemblyAllele length
hg383547
hg193547
hg183547
hg173547
Variant TypeCNV deletion
Copy Number
Allele State
Allele Origin
Probe Count
Validation Flag
Merged StatusS
Merged Variantsnsv25405
Supporting Variants
Samples
Known GenesC10orf131, ENTPD1-AS1
MethodSequencing
AnalysisTraces obtained from TSC or NCBI first were trimmed to remove vector sequences using the VecScreen system from NCBI. Low quality regions containing at least five bases in a row with Phred scores below 25 (Ewing and Green 1998) then were trimmed using a custom PERL script. The longest high quality (LHQ) region from each trace was selected for further evaluation, and the remaining trimmed regions of the traces were set aside. The LHQ regions were further required to have average Phred scores of at least 25 and had to be longer than 100 bases in length. Repeats were identified and masked within the LHQ region of each trimmed trace using RepeatMasker and RepBase. The longest unmasked "anchor" region, which had to be at least 50 bases in length, then was used to assign each trace to a unique genomic location in build hg17 of the human genome using BLAST. Successfully mapped anchor sequences were required to have a single 100% match to a unique genomic location. Traces containing anchor sequences with more than one perfect match were set aside to avoid traces that mapped to segmental duplications (Bailey et al.2002). The LHQ regions of successfully mapped traces then were unmasked and aligned to their assigned genomic locations using BLAST2seq (NCBI). Polymorphisms were mined from these alignments using custom PERL scripts. We required the five bases on each side of a polymorphism candidate to have Phred scores that were 25 or higher. For SNP discovery, the SNP base also was required to have a Phred score of 25 or higher. Single-base pair INDELs were screened to identify double-hit INDELs, and only these were included in our final collections. Since BLAST only allows for up to a 16-base gap in the alignments, a custom PERL script was developed to identify INDELs that were larger than 16 bp in length. Upon encountering a region in the alignment that no longer matched the query, this program split trace data into two blocks. The first block (which matched the query) was maintained at the original position, whereas the second block (which did not match the query) was moved over one base at a time until a perfect match was obtained, or a distance of 10,000 bases (the maximum distance allowed by the program) was reached. There is a discrepancy of 2 variants from the published data due to the exclusion of two INDELs which were later found to be false positives.
PlatformNot reported
Comments
ReferenceMills_et_al_2006
Pubmed ID16902084
Accession Number(s)nssv43983
Frequency
Sample Size24
Observed Gainn/a
Observed Lossn/a
Observed Complexn/a
Frequencyn/a


Hosted by The Centre for Applied Genomics
Grant support for DGV
Please read the usage disclaimer