The Lotus japonicus genome assembly ver 3.0 was constructed by a hybrid assembly integrating Sanger-sequencing data from TAC/BAC clones and shotgun approaches with Illumina sequencing data to 40x genome coverage. A total of 132 scaffolds covering 232 Mbp were aligned to the six L. japonicus chromosomes, which was increased from 195 Mbp in build 2.5. In addition 23,572 contigs, corresponding to 162 Mbp of the genome, were assigned to a virtual chromosome 0. The N50 values for anchored and unanchored contigs were 118,094 and 13,969 bp, respectively.
To annotate genes and evaluate the gene space coverage, gene models were generated based on ab initio predictions and on RNA-seq data. Confirming a substantial improvement in gene space coverage, the fraction of de novo assembled RNA-seq contigs mapping to the genome sequence increased from 91% (ver. 2.5) to 98 % (ver. 3.0). A total of 39,734 protein coding loci were detected in ver. 3.0.
In L. japonicus ver3.0, we adopted gene identifiers based on the pseudomolecules. Each protein-coding gene locus is assigned a unique identifier of the form "LjXg3vYYYYYYY.Z" where
Lj | Lj indicates Lotus japonicus. |
---|---|
Xg | Xg is a single digit chromosome identifier (1-6) or 0 meaning a tentative psudomolecule composed of unanchored contigs. |
3v | 3v indicates version 3.0. |
YYYYYYY | YYYYYYY is a unique seven digit numerical code. |
Z | Z is a single digit code to distinguish alternative splicing forms. |