About the site
Arabidopsis thaliana is a small dicotyledonous plant which has been a useful model organism for plant genetic and developmental studies. The genome size of A. thaliana is estimated to be 100 - 150 Mb, the smallest plant genome known.
Kazusa DNA Research Institute launched the A. thaliana genome sequencing project in December, 1995. The project is now being coordinated with European and the US groups as part of the Arabidopsis Genome Initiative (AGI). We are currently focusing on the long arm of chromosome V, and on the short arm of chromosome V in collaboration with the European and the CSHL - Wash.U - ABI consortiums. The clones used for DNA sequencing are Mitsui P1 clones [M----] (available through ABRC or KDRI) and Mitsui TAC clones [K----] (available from KDRI).
The finished sequences are subjected to similarity searches against databases of Arabidopsis ESTs, RNA genes and protein sequences. Then series of computer-aided analyses are performed for the predictions of protein coding regions, exon-intron boundaries and transfer RNA genes. All the output files obtained from the above analyses are processed and combined to create an HTML-based form by an annotation composing system integ for similarity-based gene modeling processes. Finally, the alignment of the each candidate, which is the most similar one, is re-examined by Smith-Waterman algorithm and then make annotated in four criteria i.e. a potential protein gene: exon(s) which showed similarity to a single reported gene throughout the alignment, a potential exon: exon(s) which matched only to portions of a reported protein gene, a transcribed region: region which matched only to ESTs and an RNA gene.
All the sequence data and annotations are made available through the international DNA databases and our database on the web, Kazusa Status, in the Kazusa Arabidopsis data Opening Site.