Identification of Unknown Human Genes and Functional Analysis of their Gene Products
The human genome consists of approximately 3 billion base pairs and is estimated to encode some 100,000 genes, many of which are thought to intimately affect our health. Until recently, the total number of genes subjected to thorough analysis was only 7,000กม8,000, leaving the vast majority of human genes unexplored or undiscovered. The Human Genome Project was launched in the United States in 1990 in an attempt to systematically and comprehensively analyze the entire human genome. As a result of enormous efforts of international human genome sequencing teams, the complete sequence of the human genome is expected to become available at the beginning of the 21st century. However, correct assignment of the coding regions still remains technically difficult, given only the nucleotide sequence, and cDNA information is likely to be required because the coding regions of human genes are thought to be dispersed widely throughout the genome. Although determination of entire sequences of cDNA still represents a daunting challenge, it has strongly attracted our interest because accumulation of sequence-confirmed and expression-ready cDNA clones would serve as a critical step toward comprehensive analysis of human genes. In fact, projects focusing on the sequencing of full-insert human cDNA clones have been recently initiated worldwide. Taking the lead in establishing such a project, our cDNA-sequencing project has been in operation for the past five years.
Our project is distinct from others in that we are focusing our sequencing efforts on large cDNA inserts (> 4 kb). In general, large cDNAs are technically difficult to isolate and thus have been explored less extensively than small ones. On the other hand, the proteins encoded by the longer cDNA species are thought to play important roles in cellular functions such as signal transduction, cell-cell interaction, construction of structural elements, intracellular transportation, and regulation of protein and nucleic acid metabolism. It has also been found, through analysis of positionally cloned genes, that genes associated with various genetic diseases tend to encode larger proteins. More recently, comparative analyses of several prokaryotic and yeast genomes have shown that many of the larger proteins are found specifically in eukaryotic organisms. Considering these facts, we at the Kazusa DNA Research Institute have focused our efforts on systematically searching for previously undiscovered cDNA species with longer sequences. Our efforts include determination of the chromosomal location and expression pattern, and prediction of amino acid sequences and gene functions through comprehensive DNA sequence analysis. Earlier studies at Kazusa DNA Research Institute were directed toward the myeloid cell line KG-1, whereas current research is being centered on genes expressed in brain tissue.
We have been successful in determining the complete base sequences of approximately 2000 species of previously undiscovered cDNA from KG-1 cells and brain tissue, with an average length of 5 kb. The isolated cDNAs were found to frequently encode many important gene products (proteins), which clearly indicates the advantages of the "Kazusa approach". The data obtained through the Kazusa cDNA project are accessible at our database, called HUGE (Human Unidentified Gene-Encoded protein database, http://www.kazusa.or.jp/huge). Future efforts will be directed towards development of more comprehensive long cDNA libraries and the discovery and functional analysis of as many new genes as possible.
Besides characterization of unknown human cDNAs, we also analyze functions of newly identified human genes in the rodent system after isolating rodent counterparts of the human genes. In addition, we aim to develop new methods for functional analysis of genes and for rapid identification of genes associated with human genetic diseases. Our target is currently genes expressed in the central nervous system, but this will be extended to other biological systems. It is our ultimate goal to make a large contribution to the resolution of human health problems through analysis of human genes.