Description of the Gene/Protein Characteristic Table

Features of the cloned DNA sequence

This section describes features of the nucleotide sequences of cDNA clones actually characterized. Although the actual clones contained an oligo(dT)-NotI adapter primer sequence and a SalI adapter sequence at their 3'- and 5'-extremities, respectively, the nucleotide sequences of these adapters are not shown here. This section is intended to provide clone users with detailed information of clones, which is not available from the public databases.

(1) Physical map

The physical maps were constructed on the basis of the sequence data of the cDNA clones. The horizontal scale represents the cDNA length in kb. The ORFs and untranslated regions are shown by solid and open boxes, respectively. The positions of the first ATG codons are indicated by solid and open triangles to indicate respectively those that lie within and outside the confines of Kozak's rule. RepeatMasker, which is a program that screens DNA sequences for interspersed repeats known to exist in mammalian genomes, was applied to detect repeat sequences in cDNA sequences (Smit, A. F. A. and Green, P., RepeatMasker at http://ftp.genome.washington.edu/RM/RepeatMasker.html ). Alu sequences and other repetitive sequences detected in this way are displayed by dotted and hatched boxes, respectively.

(2) Restriction map

Commercially available restriction enzymes (REBASE; Roberts, R. J., Macelis, D. "REBASE - restriction enzymes and methylases" Nucleic Acids Res. 1998; 26: 338-350). ) are sorted according to the number of the restriction sites present in the cDNA insert.

(3) Prediction of the protein coding region (GeneMark analysis)

The graphic outputs of the GeneMark-RC analysis are displayed. Vertical lines given in the graphs indicate the positions of termination codons. If you would like to know more about the GeneMark-RC analysis, please read the paper by Hirosawa et al. (Hirosawa, M., Isono, K., Hayes, W., Borodovsky, M. "Gene identification and classification in the Synechocystis genomic sequence by recursive gene mark analysis" DNA Seq. 1997; 8(1-2): 17-29).

The GeneMark analysis gives the following warnings: (a) Warning for N-terminal truncation of the coding region; (b) Warning for spurious interruption of the coding region.

(4) Prediction of the genomic structure of the cDNA

The cDNA sequence was subjected to BLAST search (Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. " Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." 1997; Nucleic Acids Res 25: 3389-3402) against the human genome draft sequences in NCBI. When a genomic fragment was found to be considerably similar to the cDNA sequence (E-value = 0.0 and sequence identity is 90% or greater), the genomic structure of the cDNA was assigned by SIM4 (Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. " A computer program for aligning a cDNA sequence with a genomic DNA sequence " 1998; Genome Res. 8: 967-974) on the genomic fragment.

GENSCAN (Burge, C. and Karlin, S. 1997; " Prediction of complete gene structures in human genomic DNA." J. Mol. Biol. 268: 78-94) was also applied to detect the plausible gene structure on the genomic fragment. The result of comparison of the gene structures deduced from the cDNA and that predicted by GENSCAN were displayed in graphics.

Features of the predicted protein sequence

This section describes the features of the predicted protein sequence.

(1) FASTA homology searches against the nr database and Kazusa human cDNA database

Top 5 entries given the expectation value smaller than 0.001 in nr database and Kazusa human cDNA databases (HUGE & NEDO) are shown. nr is a non-redundant amino acid sequence database that was constructed in NCBI.

The numbers on the left and right sides of a black line in the graphical overview indicate the lengths (in amino acid residues) of the non-homologous N-terminal and C-terminal portions flanking the homologous region (indicated by the black line), respectively. The FASTA output and the multiple alignment of these entries can be obtained by clicking.

(2) Analysis of Motifs, Profiles, and Membrane-spanning regions

The predicted protein sequences were examined for motifs present in the PROSITE database. Because weakly defined sequence motifs appear too many times in the HUGE database and are, thus, unlikely to be informative, the following motifs were excluded from the analysis: amidation site; N-glycosylation site; cAMP- and cGMP-dependent protein kinase phosphorylation site; casein kinase II phosphorylation site; N-myristoylation site; protein kinase C phosphorylation site; and tyrosine kinase phosphorylation site.

Profile entries in the PROSITE database were also searched for by using pftools, a program developed by Philipp Bucher in the Swiss Institute for Experimental Cancer Research.

Domains in the Pfam database (Sonnhammer, E. L. L., Eddy, S. R., Birney, E., Bateman, A., and Durbin, R. "Pfam: multiple sequence alignments and HMM-profiles of protein domains" Nucleic Acids Res 1998; 26, 320-322) were searched for by using hmmer 2.1.

Membrane-spanning region were predicted by SOSUI (Hirokawa, T., Boon-Chieng, S., Mitaku, S. "SOSUI: classification and secondary structure prediction system for membrane proteins" Bioinformatics 1998; 14:378-379).

Expression profile

RT-PCR ELISA

⁺