Codon Usage Database is an extended WWW version of CUTG (Codon Usage Tabulated from GenBank). The frequency of codon use in each organism is made searchable through this World Wide Web site.
CUTG was originally developed by Professor Toshimichi Ikemura at Laboratory of Evolutionary Genetics, National Institute of Genetics. (Currently, Dr. Ikemura is a professor at Hayama center of advanced research, the graduate university for advanced studies.)
Codon Usage Database is developed and mainteined by Yasukazu Nakamura at The First Laboratory for Plant Gene Research, Kazusa DNA Research Institute.
NCBI-GenBank Flat File Release 160.0 [June 15 2007].
3,027,973 complete protein coding genes (CDS's)
Files pri (primate sequence entries), rod (rodent sequence entries), mam (other mammalian sequence entries), vrt (other vertebrate sequence entries), inv (invertebrate sequence entries), pln (plant sequence entries), bct (bacterial sequence entries), vrl (viral sequence entries) and phg (phage sequence entries) were used.
Files for est (EST: expressed sequence tag sequence entries) and pat (patent sequence entries), rna (Structural RNA sequence entries), sts (STS: sequence tagged site sequence entries), syn (synthetic and chimeric sequence entries) and una (unanotated sequence entries) were not used.
All of the complete sequenced protein coding genes (CDS's) are used. Codons containing ambiguous base were excluded from count.
A query box to search a codon usage table for an organism, is presented. Search can be done with Latin name or its sub-string of organism. Default search process is case sensitive. Case insensitive option could be selected. Ambiguous query which hits over 100 organisms returns no answer.
Name of organism, shown in the answer list for query or alphabetical list, is followed by name of division of GenBank [gbbct, gbinv etc.], colon and number of compiled CDS. Like this;
Arabidopsis thaliana [gbpln]: 80395
If you select a link for an organism, codon usage table for the organism will be shown. The table shows frequency (per thousand) and count for each codon as a sum of all CDS's of the organism. Table which include amino acids or which is formatted as GCG style (construction) are also shown when one genetic code system is selected. Back translation program which is useful to design PCR primers in protein coding area also available (construction).
Selecting the link "Codon usage of each CDS" under the table, you will browse or download all codon usage tables of CDS's in the organism. The format of table is CUTG style (See below link for "CODEN_LABEL" file in CUTG).
Codon usage tables for all CDS's for each GenBank division (pri, rod, mam, vrt, inv, pln, bct, vrl and phg) will be downloaded from "FTP links for CUTG files" link in top page.
A document README contains the latest information on the database in plain text format. CODON_LABEL and SPSUM_LABEL files show file formats.
We wish to thank Dr. Ugawa * at The DNA Information and Stock Center, National Institute of Agrobiological Resources for help in constructing and distributing the database from 1996 to 1999.
[* Present address of Dr. Ugawa is Environmental Education Center, Miyagi University of Education]
This work was suported by a grant-in-Aid for Publication of Science Research Results from Japan Society for the Promotion of Science (JSPS).
Codon usage tabulated from the international DNA sequence databases: status for the year 2000. Nakamura, Y., Gojobori, T. and Ikemura, T. (2000) Nucl. Acids Res. 28, 292.
(NAR Database Issue page)
This article gives references to earlier papers.