| Analysis | Normalised microarray signal intensity data for both cohorts was analysed with PennCNV (2009Aug27 v.) and QuantiSNP (v.2) programs to call putative CNVs in each individual. Settings and parameters suggested by authors were used for both algorithms together with the 'genomic wave' adjustment for the signal intensity data. Additionally, with PennCNV we used separate B allele-frequency files (PFB-file) - for HapMap YRI we used PennCNV's default PFB-file based on HapMap YRI samples. As a quality control measure, we checked that all samples met the following quality criteria calculated by the PennCNV program: LRR_SD <= 0.25, BAF_SD <= 0.05, BAF_DRIFT <= 0.002 and GCWF <= |0.04|. Raw CNV calls from PennCNV and QuantiSNP were then merged (as intersection, for each individual separately) with custom PERL script and only CNVs that were similarly called (same type of overlapping copy number change - gain or loss) were considered. From the resulting list of CNVs we filtered out CNVs i) called on X/Y chromosomes; ii) shorter than 1000 bp in length; iii) with QuantiSNP log Bayes Factor (LBF) less than 5. To ensure high-quality of the EGCUT CNV dataset, CNVs detected by PennCNV and QuantiSNP algorithms were further visually confirmed with Illumina Genome Viewer. For each CNV locus, signal intensity data for all corresponding family members was loaded simultaneously and visually inspected to confirm CNV calls and family members with no CNV call. CNV regions containing no visually detectable CNVs (or CNVs not called but visually distinguishable) were excluded. Throughout this study we used the NCBI Build 36/hg18 assembly coordinates of the human reference sequence. |