Objective To investigate the important role of deduplication method in next generation sequencing(NGS) data analysis by comparing the difference of indicators obtained from deduplication and duplication methods.
Methods Tumor samples of a cohort of 58 NSCLC patients were collected. A panel of 286 genes was tested by NGS. The NGS data were analyzed by de-duplication method and common duplication method, respectively. Indicators of "Mapped Reads, On Target, Mean Depth and Uniformity" were compared.
Results The differences of Mapped Reads, On Target, Mean Depth and Uniformity were statistically significant between two methods, respectively(P < 0.001). Mapped Reads and On Target and Mean Depth analyzed by de-duplication method were found significantly different between plasma sample and other three types of samples, ie., formalin fixed and paraffin embedded(FFPE) sample and puncture biopsy and surgical tissue sample; while Uniformity was generated without significant difference between the four types of samples. The results by duplication analysis were opposite.
Conclusion Deduplication step plays an important role in NGS data analysis, which could improve the Uniformity and reflect the real DNA template amount and allele frequency of genomic alterations. Deduplication result is helpful for clinical decision.