ORFeome
Description of the Gene/Protein Characteristic Table
Features of the cloned DNA sequence
This section describes features of the ORF sequence cloned in Flexi Vector.
- (1) Restriction map
- Commercially available restriction enzymes
(REBASE;
Roberts, R. J., Macelis, D.
"REBASE - restriction enzymes and methylases"
Nucleic Acids Res. 1998; 26: 338-350).
) are sorted according
to the number of the restriction sites present in the cDNA insert.
- (2) Prediction of the genomic structure of the cDNA
- The ORF sequence was subjected to BLAST search
(Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z.,
Miller, W., and Lipman, D.J.
"
Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs." 1997; Nucleic Acids Res 25: 3389-3402)
against
the human genome draft sequences in NCBI.
When a genomic fragment was found to be considerably similar to the cDNA
sequence (E-value = 0.0 and sequence identity is 90% or greater), the genomic
structure of the cDNA was assigned by
SIM4
(Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. "
A computer program for aligning a cDNA sequence with a genomic DNA sequence
"
1998; Genome Res. 8: 967-974)
on the genomic fragment.
-
GENSCAN
(Burge, C. and Karlin, S. 1997; "
Prediction of complete gene structures
in human genomic DNA." J. Mol. Biol. 268: 78-94)
was also applied to detect the plausible gene structure on the genomic
fragment. The result of comparison of the gene structures deduced from
the cDNA and that predicted by GENSCAN were displayed in graphics.
Features of the predicted protein sequence
This section describes the features of the predicted protein sequence.
- (1) FASTA homology searches against the
nr database and the ORFeome database in our site
- Top 5 entries given the expectation value smaller than 0.001 in nr
database and the ORFeome database are shown.
"nr" stands for the non-redundant amino acid sequence database
that has been constructed by NCBI. The ORFeome database in our site means the
collection of ORFeome clones we can provide from our site.
The numbers on the left and right
sides of a black line in the graphical overview indicate the lengths
(in amino acid residues) of the non-homologous N-terminal and
C-terminal portions flanking the homologous region (indicated by the
black line), respectively. The FASTA output and the multiple alignment
of these entries can be obtained by clicking.
- (2) Analysis of Motifs, Domains, and Membrane-spanning regions
-
The predicted protein sequences were examined for motifs present
in the InterPro database.
Because weakly defined sequence motifs appear too many times in
the ORFeome database and are, thus, unlikely to be informative,
the following motifs were excluded from the
analysis: amidation site; N-glycosylation site; cAMP- and
cGMP-dependent protein kinase phosphorylation site; casein kinase II
phosphorylation site; N-myristoylation site; protein kinase C
phosphorylation site; and tyrosine kinase phosphorylation site.
Motifs/Domains in the InterPro database were searched for by InterProScan. (Zdobnov, EM, and Apweiler, R. InterProScan--an integration platform for the signature-recognition methods in InterPro" Bioinformatics 2001; 17:847-848).
Membrane-spanning region were predicted by
SOSUI
(Hirokawa, T., Boon-Chieng, S., Mitaku, S.
"SOSUI: classification and
secondary structure prediction system for membrane proteins"
Bioinformatics 1998; 14:378-379).