HUGE

Description of the Gene/Protein Characteristic Table

Features of the cloned DNA sequence

This section describes features of the nucleotide sequences of cDNA clones actually characterized. Although the actual clones contained an oligo(dT)-NotI adapter primer sequence and a SalI adapter sequence at their 3'- and 5'-extremities, respectively, the nucleotide sequences of these adapters are not shown here. This section is intended to provide clone users with detailed information of clones, which is not available from the public databases.

(1) Physical map

The physical maps were constructed on the basis of the sequence data of the cDNA clones. The horizontal scale represents the cDNA length in kb. The ORFs and untranslated regions are shown by solid and open boxes, respectively. The positions of the first ATG codons are indicated by solid and open triangles to indicate respectively those that lie within and outside the confines of Kozak's rule. RepeatMasker, which is a program that screens DNA sequences for interspersed repeats known to exist in mammalian genomes, was applied to detect repeat sequences in cDNA sequences (Smit, A. F. A. and Green, P., RepeatMasker ). Alu sequences and other repetitive sequences detected in this way are displayed by dotted and hatched boxes, respectively.

(2) Restriction map

Commercially available restriction enzymes (REBASE; Roberts, R. J., Macelis, D. "REBASE - restriction enzymes and methylases" Nucleic Acids Res. 1998; 26: 338-350). ) are sorted according to the number of the restriction sites present in the cDNA insert.

(3) Prediction of the protein coding region (GeneMark analysis)

The graphic outputs of the GeneMark-RC analysis are displayed. Vertical lines given in the graphs indicate the positions of termination codons. If you would like to know more about the GeneMark-RC analysis, please read the paper by Hirosawa et al. (Hirosawa, M., Isono, K., Hayes, W., Borodovsky, M. "Gene identification and classification in the Synechocystis genomic sequence by recursive gene mark analysis" DNA Seq. 1997; 8(1-2): 17-29).

The GeneMark analysis gives the following warnings: (a) Warning for N-terminal truncation of the coding region; (b) Warning for spurious interruption of the coding region.

(4) Prediction of the genomic structure of the cDNA

The cDNA sequence was subjected to BLAST search (Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. " Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." 1997; Nucleic Acids Res 25: 3389-3402) against the human genome draft sequences in NCBI. When a genomic fragment was found to be considerably similar to the cDNA sequence (E-value = 0.0 and sequence identity is 90% or greater), the genomic structure of the cDNA was assigned by SIM4 (Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. " A computer program for aligning a cDNA sequence with a genomic DNA sequence " 1998; Genome Res. 8: 967-974) on the genomic fragment.

(5) DAS source for Ensembl ContigView

We set up our DAS server to see the predicted genomic structure of KIAA cDNAs on the Ensembl ContigView.
The URL of our DAS server is "http://ensembl.kazusa.or.jp:8888/cgi-bin/das". Please attach this DAS source at your site, referring to the instruction.

Features of the predicted protein sequence

This section describes the features of the predicted protein sequence.

(1) Revision of the cloned DNA sequence before prediction of the protein sequence

When necessary, the cDNA sequences of the isolated clones were revised by direct RT-PCR/sequencing experiments in order to eliminate artifact(s) signaled by the GeneMark-RC analysis (see "Features of the cloned DNA sequence"). Therefore, some of the predicted protein sequences are not derived from the cloned cDNA sequence but from the revised cDNA sequence. Hence, it should be noted that some cDNA clones do not carry the identical sequences to those deposited to GenBank/EMBL/DDBJ databases because they are the experimentally revised ones. To avoid any confusion, this section states whether or not the predicted protein sequence is obtained by conceptual translation of the revised cDNA sequence. The revised sequences by the RT-PCR/sequencing experiments can be accessible by clicking. As far as possible, the nature of the revision is also described (deletion, insertion, nonsense mutation, or removal of retianed intronic sequence).

(2) FASTA homology searches against the nr database and HUGE database

Top 5 entries given the expectation value smaller than 0.001 in nr database and HUGE database are shown. "nr" stands for the non-redundant amino acid sequence database that has been constructed by NCBI.

The numbers on the left and right sides of a black line in the graphical overview indicate the lengths (in amino acid residues) of the non-homologous N-terminal and C-terminal portions flanking the homologous region (indicated by the black line), respectively. The FASTA output and the multiple alignment of these entries can be obtained by clicking.

(3) Analysis of Motifs, Domains, and Membrane-spanning regions

The predicted protein sequences were examined for motifs present in the InterPro database. Because weakly defined sequence motifs appear too many times in the HUGE database and are, thus, unlikely to be informative, the following motifs were excluded from the analysis: amidation site; N-glycosylation site; cAMP- and cGMP-dependent protein kinase phosphorylation site; casein kinase II phosphorylation site; N-myristoylation site; protein kinase C phosphorylation site; and tyrosine kinase phosphorylation site.

Motifs/Domains in the InterPro database were searched for by InterProScan. (Zdobnov, EM, and Apweiler, R. InterProScan--an integration platform for the signature-recognition methods in InterPro" Bioinformatics 2001; 17:847-848).

Membrane-spanning region were predicted by SOSUI (Hirokawa, T., Boon-Chieng, S., Mitaku, S. "SOSUI: classification and secondary structure prediction system for membrane proteins" Bioinformatics 1998; 14:378-379).

Expression profile

Expression profiles of KIAA genes were examined by Northern blot analysis (KIAA0001-KIAA0280), or reverse transcription followed by the polymerase chain reaction (RT-PCR; KIAA0294-KIAA0710), or semi-quantitative RT-PCR by enzyme-linked immunosorbent assay (RT-PCR ELISA; after KIAA0711).

(1) Northern blot (KIAA0001-KIAA0280)

The steady-state levels and the sizes of individual mRNAs were examined by Northern blot hybridization in 16 different human tissues (human multiple tissue northern blot and blot II; Clontech, USA). The filters were hybridized with gene specific probes at 42 °C in the solution containing 50% formamide, 5x SSC, 5x Denhardt's solution, 0.25% SDS and 100 µg/ml herring sperm DNA, washed with 1x SSC and 0.1% SDS for 15 min at room temperature twice, and then finally washed with 0.1 x SSC and 0.1% SDS for 30 min at 53 °C twice. The hybridization signals were detected by autoradiography using X-ray films or with a BAS-2000 imaging system (Fuji film, Japan).

(2) RT-PCR (KIAA0294-KIAA0710)

cDNA templates for RT-PCR were synthesized from 1 µg of human poly A⁺ RNA (Clontech, USA) using Superscript II reverse transcriptase (GibcoBRL, USA) and random hexamer primers. After synthesis of the first-strand cDNA, the remaining RNA in the reaction mixture was degraded with RNase A. Unless otherwise stated, RT-PCR were performed as follows. An aliquot of the cDNA template mixture (2 µl, corresponding to 1 ng of the starting poly A⁺ RNA) was subjected to PCR using LA Taq DNA polymerase (0.5 units, Takara, Japan) in 10 µl with a set of primers specific to a gene of interest on a DNA Thermal Cycler PJ9600 (Perkin Elmer, USA). The thermal cycling conditions used were; the first denaturation at 95 °C for 1 min; 30 cycles of 0.5-min denaturation at 95 °C /0.5-min primer annealing at 55 °C/1-min extension at 72 °C/ the last extension at 72 °C for 6 min. To check the differences in amplification efficiency of PCR for respective genes, external control reactions were conducted in which the PCR products were generated from serial dilutions of each authentic cDNA plasmid (0.1 fg to 1 pg in 10µl of PCR mixture). The whole PCR products were run on 2.5% NuSieve GTG agarose gel (FMC BioProducts, USA) and detected by fluorescent staining with ethidium bromide. The gel images were recorded with a gel print 2000i/VGA (BioImage, USA).

(3) RT-PCR ELISA (after KIAA0711)

RT-PCR ELISA is a combination of RT-PCR described above and the following quantification of the products by ELISA. RT-PCR ELISA was carried out essentially as described in the instruction of a PCR ELISA (DIG labeling) kit (cat. no. 1 636 120; Boehringer Mannheim Biochemica). External control reactions using the authentic plasmid allowed mRNA levels to be expressed as equivalent amounts of the authentic plasmid DNA (fg) per ng of poly(A)⁺ RNA (the starting material of RT-PCR). For at-a-glance screening, the mRNA levels are displayed by color codes using the digit-color conversion panel shown in the figure (unit: fg of equivalent plasmid DNA / ng of poly(A) ⁺ RNA). Because this expression pattern was obtained from a single run of RT-PCR ELISA, the expression profile has a chance to include significant run-to-run variations. Accordingly, the expression profiles should be used only for the screening of genes on the basis of tissue specificity. If more accurate quantitative expression profiles are required, more statistically reliable approaches should be employed (e.g., multiplication of RT-PCR-ELISA measurements, DNA chip analysis, RNA blot analysis, etc.).

Mapping information

Chromosomal locations of individual cDNA clones were determined by using human-rodent hybrid panels GeneBridge 4, Stanford_G3 (Research Genetics Inc., USA) or CCR Coriell2 (Coriell Cell Repositories, USA). Numerals in parentheses are the actual length of PCR products which are larger than predicted one and these discrepancies may be caused by intron sequence(s) in the genome. PCR conditions are indicated except for the first denaturation at 95 °C for 2 min. Recently, chromosome numbers were fetched from the UniGene database (http://www.ncbi.nlm.nih. gov/UniGene), if mapping data of the cDNA clones were available in that database. For genes whose chromosomal locations were fetched from the UniGene database, we confirmed that the primer sets used in the determination of chromosomal location in the UniGene database were consistent with the sequences of the genes we determined. In case the genomic sequences corresponding to the cDNA sequences were available in the GenBank database, the chromosome numbers were referred from the database.