Variant Details

Variant: nsv425475

Internal ID

17231198

Landmark

Location Information

Type	Coordinates	Assembly	Other Links
	chrX:3691433..3691487	hg38	UCSC Ensembl
	chrX:3609474..3609528	hg19	UCSC Ensembl
	chrX:3619474..3619528	hg18	UCSC Ensembl
	chrX:3602835..3602889	hg17	UCSC Ensembl

Cytoband

Xp22.33

Allele length

Assembly	Allele length
hg38	55
hg19	55
hg18	55
hg17	55

Variant Type

CNV deletion

Copy Number

Allele State

Allele Origin

Probe Count

Validation Flag

Merged Status

Merged Variants

Supporting Variants

nssv444053

Samples

Known Genes

PRKX

Method

Sequencing

Analysis

Traces obtained from TSC or NCBI first were trimmed to remove vector sequences using the VecScreen system from NCBI. Low quality regions containing at least five bases in a row with Phred scores below 25 (Ewing and Green 1998) then were trimmed using a custom PERL script. The longest high quality (LHQ) region from each trace was selected for further evaluation, and the remaining trimmed regions of the traces were set aside. The LHQ regions were further required to have average Phred scores of at least 25 and had to be longer than 100 bases in length. Repeats were identified and masked within the LHQ region of each trimmed trace using RepeatMasker and RepBase. The longest unmasked "anchor" region, which had to be at least 50 bases in length, then was used to assign each trace to a unique genomic location in build hg17 of the human genome using BLAST. Successfully mapped anchor sequences were required to have a single 100% match to a unique genomic location. Traces containing anchor sequences with more than one perfect match were set aside to avoid traces that mapped to segmental duplications (Bailey et al.2002). The LHQ regions of successfully mapped traces then were unmasked and aligned to their assigned genomic locations using BLAST2seq (NCBI). Polymorphisms were mined from these alignments using custom PERL scripts. We required the five bases on each side of a polymorphism candidate to have Phred scores that were 25 or higher. For SNP discovery, the SNP base also was required to have a Phred score of 25 or higher. Single-base pair INDELs were screened to identify double-hit INDELs, and only these were included in our final collections. Since BLAST only allows for up to a 16-base gap in the alignments, a custom PERL script was developed to identify INDELs that were larger than 16 bp in length. Upon encountering a region in the alignment that no longer matched the query, this program split trace data into two blocks. The first block (which matched the query) was maintained at the original position, whereas the second block (which did not match the query) was moved over one base at a time until a perfect match was obtained, or a distance of 10,000 bases (the maximum distance allowed by the program) was reached. There is a discrepancy of 2 variants from the published data due to the exclusion of two INDELs which were later found to be false positives.

Platform

Not reported

Comments

Reference

Mills_et_al_2006

Pubmed ID

16902084

Accession Number(s)

nsv425475

Frequency

Sample Size	24
Observed Gain	n/a
Observed Loss	n/a
Observed Complex	n/a
Frequency	n/a