TY - JOUR
T1 - Platanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions
AU - Kajitani, Rei
AU - Yoshimura, Dai
AU - Okuno, Miki
AU - Minakuchi, Yohei
AU - Kagoshima, Hiroshi
AU - Fujiyama, Asao
AU - Kubokawa, Kaoru
AU - Kohara, Yuji
AU - Toyoda, Atsushi
AU - Itoh, Takehiko
N1 - Funding Information:
We would like to thank the staff of Comparative Genomics Laboratory at NIG and Itoh Laboratory at Tokyo Tech for supporting genome sequencing. The experiment using TruSeq Synthetic Long Read Kit (Moleculo) is supported by Illumina inc. (San Diego, CA). This work was supported by MEXT KAKENHI Grant Number JP16H06279, JP16H04719, JP15H05979 and AMED Grant Number JP16gm6010003.
Publisher Copyright:
© 2019, The Author(s).
PY - 2019/12/1
Y1 - 2019/12/1
N2 - The ultimate goal for diploid genome determination is to completely decode homologous chromosomes independently, and several phasing programs from consensus sequences have been developed. These methods work well for lowly heterozygous genomes, but the manifold species have high heterozygosity. Additionally, there are highly divergent regions (HDRs), where the haplotype sequences differ considerably. Because HDRs are likely to direct various interesting biological phenomena, many genomic analysis targets fall within these regions. However, they cannot be accessed by existing phasing methods, and we have to adopt costly traditional methods. Here, we develop a de novo haplotype assembler, Platanus-allee (http://platanus.bio.titech.ac.jp/platanus2), which initially constructs each haplotype sequence and then untangles the assembly graphs utilizing sequence links and synteny information. A comprehensive benchmark analysis reveals that Platanus-allee exhibits high recall and precision, particularly for HDRs. Using this approach, previously unknown HDRs are detected in the human genome, which may uncover novel aspects of genome variability.
AB - The ultimate goal for diploid genome determination is to completely decode homologous chromosomes independently, and several phasing programs from consensus sequences have been developed. These methods work well for lowly heterozygous genomes, but the manifold species have high heterozygosity. Additionally, there are highly divergent regions (HDRs), where the haplotype sequences differ considerably. Because HDRs are likely to direct various interesting biological phenomena, many genomic analysis targets fall within these regions. However, they cannot be accessed by existing phasing methods, and we have to adopt costly traditional methods. Here, we develop a de novo haplotype assembler, Platanus-allee (http://platanus.bio.titech.ac.jp/platanus2), which initially constructs each haplotype sequence and then untangles the assembly graphs utilizing sequence links and synteny information. A comprehensive benchmark analysis reveals that Platanus-allee exhibits high recall and precision, particularly for HDRs. Using this approach, previously unknown HDRs are detected in the human genome, which may uncover novel aspects of genome variability.
UR - http://www.scopus.com/inward/record.url?scp=85064348804&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064348804&partnerID=8YFLogxK
U2 - 10.1038/s41467-019-09575-2
DO - 10.1038/s41467-019-09575-2
M3 - Article
C2 - 30979905
AN - SCOPUS:85064348804
SN - 2041-1723
VL - 10
JO - Nature communications
JF - Nature communications
IS - 1
M1 - 1702
ER -