Miyakogusa Predicted Gene

Lj0g3v0356229.1
Show Alignment: 

BLASTP 2.2.25 [Feb-01-2011]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Reference for compositional score matrix adjustment: Altschul, Stephen F., 
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.

Query= Lj0g3v0356229.1 Non Chatacterized Hit- tr|C5XFJ9|C5XFJ9_SORBI
Putative uncharacterized protein Sb03g043330
OS=Sorghu,28.35,7e-19,seg,NULL; LEA_2,Late embryogenesis abundant
protein, LEA-14,CUFF.24522.1
         (247 letters)

Database: TAIR10_pep 
           35,386 sequences; 14,482,855 total letters

Searching..................................................done



                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

AT4G26490.1 | Symbols:  | Late embryogenesis abundant (LEA) hydr...   240   8e-64
AT5G56050.1 | Symbols:  | FUNCTIONS IN: molecular_function unkno...   226   8e-60
AT3G26350.1 | Symbols:  | LOCATED IN: chloroplast; EXPRESSED IN:...   139   1e-33
AT1G13050.1 | Symbols:  | unknown protein; BEST Arabidopsis thal...   124   7e-29
AT1G13050.2 | Symbols:  | unknown protein; FUNCTIONS IN: molecul...   119   1e-27
AT5G22870.1 | Symbols:  | Late embryogenesis abundant (LEA) hydr...    56   2e-08
AT5G56070.1 | Symbols:  | unknown protein; BEST Arabidopsis thal...    56   2e-08
AT3G11650.1 | Symbols: NHL2 | NDR1/HIN1-like 2 | chr3:3676264-36...    53   2e-07
AT2G35460.1 | Symbols:  | Late embryogenesis abundant (LEA) hydr...    52   3e-07
AT3G52470.1 | Symbols:  | Late embryogenesis abundant (LEA) hydr...    50   2e-06
AT3G11660.1 | Symbols: NHL1 | NDR1/HIN1-like 1 | chr3:3679031-36...    47   8e-06

>AT4G26490.1 | Symbols:  | Late embryogenesis abundant (LEA)
           hydroxyproline-rich glycoprotein family |
           chr4:13380425-13381231 FORWARD LENGTH=268
          Length = 268

 Score =  240 bits (612), Expect = 8e-64,   Method: Compositional matrix adjust.
 Identities = 115/233 (49%), Positives = 162/233 (69%), Gaps = 9/233 (3%)

Query: 18  TKPLSLDQIVISKQPT---NHLSLEPNSSNSAKTKLSRPPALRFQRTNPIIWFASVLCLI 74
           T+   + Q+V++K  T   N L  EP      +  L +P   R  RT+  IW  +  C +
Sbjct: 42  TQSTPVGQMVLTKPATVRFNGLDAEPRKD---RVILRQP---RSSRTSLWIWCVAGFCFV 95

Query: 75  FSLVLIFFGVATLTIFLGIKPRNPTFDIPNANLNALYFDSPQYFNGDFTLLANFTNPNTK 134
           FSL+LIFF +ATL +FL I+PR P FDIPNANL+ +YFD+P++FNGD ++L NFTNPN K
Sbjct: 96  FSLLLIFFAIATLIVFLAIRPRIPVFDIPNANLHTIYFDTPEFFNGDLSMLVNFTNPNKK 155

Query: 135 IDVSFESLDIELFFSDRIISSQSIEPFTQRRRESRLQSLHFISSLVFLPKDLGVMLEKQV 194
           I+V FE L IELFF +R+I++Q ++PF Q++ E+RL+ +  ISSLV LP +  V L +Q+
Sbjct: 156 IEVKFEKLRIELFFFNRLIAAQVVQPFLQKKHETRLEPIRLISSLVGLPVNHAVELRRQL 215

Query: 195 QSNLVNYNVRGTFKVRVTLGLIHLSYLLHSRCQIEMTSPPTGGLVARKCITKR 247
           ++N + Y +RGTFKV+   G+IH SY LH RCQ++MT PPTG L++R C TK+
Sbjct: 216 ENNKIEYEIRGTFKVKAHFGMIHYSYQLHGRCQLQMTGPPTGILISRNCTTKK 268


>AT5G56050.1 | Symbols:  | FUNCTIONS IN: molecular_function unknown;
           INVOLVED IN: biological_process unknown; LOCATED IN:
           chloroplast; BEST Arabidopsis thaliana protein match is:
           Late embryogenesis abundant (LEA) hydroxyproline-rich
           glycoprotein family (TAIR:AT4G26490.1); Has 1807 Blast
           hits to 1807 proteins in 277 species: Archae - 0;
           Bacteria - 0; Metazoa - 736; Fungi - 347; Plants - 385;
           Viruses - 0; Other Eukaryotes - 339 (source: NCBI
           BLink). | chr5:22701167-22702018 REVERSE LENGTH=283
          Length = 283

 Score =  226 bits (577), Expect = 8e-60,   Method: Compositional matrix adjust.
 Identities = 110/238 (46%), Positives = 155/238 (65%), Gaps = 17/238 (7%)

Query: 21  LSLDQIVISKQPTNHLSLEPNSSNSAKTKLSRPPA-----------LRFQRTNPIIWFAS 69
           ++L ++++SK P +      N  + A  KL    A           LR  RTNP IW  +
Sbjct: 51  IALTEVIVSKSPLS------NQKSPATPKLDSMEAHPLHETMVLLQLRTSRTNPWIWCGA 104

Query: 70  VLCLIFSLVLIFFGVATLTIFLGIKPRNPTFDIPNANLNALYFDSPQYFNGDFTLLANFT 129
            LC IFS++LI FG+ATL ++L +KPR P FDI NA LN + F+SP YFNGD  L  NFT
Sbjct: 105 ALCFIFSILLIVFGIATLILYLAVKPRTPVFDISNAKLNTILFESPVYFNGDMLLQLNFT 164

Query: 130 NPNTKIDVSFESLDIELFFSDRIISSQSIEPFTQRRRESRLQSLHFISSLVFLPKDLGVM 189
           NPN K++V FE+L +EL+F+D  I++Q + PF+QR  ++RL+ +  IS+LVFLP +  + 
Sbjct: 165 NPNKKLNVRFENLMVELWFADTKIATQGVLPFSQRNGKTRLEPIRLISNLVFLPVNHILE 224

Query: 190 LEKQVQSNLVNYNVRGTFKVRVTLGLIHLSYLLHSRCQIEMTSPPTGGLVARKCITKR 247
           L +QV SN + Y +R  F+V+   G+IH SY+LH  CQ++++SPP GGLV R C TKR
Sbjct: 225 LRRQVTSNRIAYEIRSNFRVKAIFGMIHYSYMLHGICQLQLSSPPAGGLVYRNCTTKR 282


>AT3G26350.1 | Symbols:  | LOCATED IN: chloroplast; EXPRESSED IN:
           root, pedicel, carpel, stamen; EXPRESSED DURING: 4
           anthesis, petal differentiation and expansion stage;
           CONTAINS InterPro DOMAIN/s: Late embryogenesis abundant
           protein, group 2 (InterPro:IPR004864); BEST Arabidopsis
           thaliana protein match is: unknown protein
           (TAIR:AT1G13050.1); Has 3534 Blast hits to 2704 proteins
           in 342 species: Archae - 6; Bacteria - 192; Metazoa -
           1076; Fungi - 505; Plants - 1162; Viruses - 224; Other
           Eukaryotes - 369 (source: NCBI BLink). |
           chr3:9653660-9654730 REVERSE LENGTH=356
          Length = 356

 Score =  139 bits (351), Expect = 1e-33,   Method: Compositional matrix adjust.
 Identities = 78/196 (39%), Positives = 113/196 (57%), Gaps = 3/196 (1%)

Query: 53  PPALRFQRTNPIIWFASVLCLIFSLVLIFFGVATLTIFLGIKPRNPTFDIPNANLNALYF 112
           PP  R   TN + W A+  C IF ++LI  G+  L ++L  +PR+P  DI  ANLNA Y 
Sbjct: 163 PPPSR--ETNAMTWSAAFCCAIFWVILILGGLIILIVYLVYRPRSPYVDISAANLNAAYL 220

Query: 113 DSPQYFNGDFTLLANFTNPNTKIDVSFESLDIELFFSDRIISSQSIEPFTQRRRESRLQS 172
           D     NGD T+LAN TNP+ K  V F  +  EL++ + +I++Q IEPF   ++ S   +
Sbjct: 221 DMGFLLNGDLTILANVTNPSKKSSVEFSYVTFELYYYNTLIATQYIEPFKVPKKTSMFAN 280

Query: 173 LHFISSLVFLPKDLGVMLEKQVQSNLVNYNVRGTFKVRVTLG-LIHLSYLLHSRCQIEMT 231
           +H +SS V L       L++Q+++  V  N+RG F  R  +G L   SY LH+ C + + 
Sbjct: 281 VHLVSSQVQLQATQSRELQRQIETGPVLLNLRGMFHARSHIGPLFRYSYKLHTHCSVSLN 340

Query: 232 SPPTGGLVARKCITKR 247
            PP G + AR+C TKR
Sbjct: 341 GPPLGAMRARRCNTKR 356


>AT1G13050.1 | Symbols:  | unknown protein; BEST Arabidopsis
           thaliana protein match is: unknown protein
           (TAIR:AT3G26350.1); Has 538 Blast hits to 510 proteins
           in 88 species: Archae - 0; Bacteria - 23; Metazoa - 81;
           Fungi - 36; Plants - 361; Viruses - 8; Other Eukaryotes
           - 29 (source: NCBI BLink). | chr1:4450568-4451521
           FORWARD LENGTH=317
          Length = 317

 Score =  124 bits (310), Expect = 7e-29,   Method: Compositional matrix adjust.
 Identities = 85/235 (36%), Positives = 123/235 (52%), Gaps = 8/235 (3%)

Query: 16  NQTKPLSLDQIVISKQPTNHLSLEPNSSNSAKTKLSRPPALRF--QRTNPIIWFASVLCL 73
           N  +PL L      + P      EP     A T+    PA +   +RT P+   A++ C 
Sbjct: 88  NSARPLQLSPEE-QRPPHRGYGSEPTPWRRAPTR----PAYQQGPKRTKPMTLPATICCA 142

Query: 74  IFSLVLIFFGVATLTIFLGIKPRNPTFDIPNANLNALYFDSPQYFNGDFTLLANFTNPNT 133
           I  +VLI  G+  L ++L  +PR+P FDI  A LN    D     NGD  ++ NFTNP+ 
Sbjct: 143 ILLIVLILSGLILLLVYLANRPRSPYFDISAATLNTANLDMGYVLNGDLAVVVNFTNPSK 202

Query: 134 KIDVSFESLDIELFFSDRIISSQSIEPFTQRRRESRLQSLHFISSLVFLPKDLGVMLEKQ 193
           K  V F  +  EL+F + +I+++ IEPF   +  S   S H +SS V +       L+ Q
Sbjct: 203 KSSVDFSYVMFELYFYNTLIATEHIEPFIVPKGMSMFTSFHLVSSQVQIQMIQSQDLQLQ 262

Query: 194 VQSNLVNYNVRGTFKVRVTLG-LIHLSYLLHSRCQIEMTSPPTGGLVARKCITKR 247
           + +  V  N+RGTF  R  LG L+  SY LH++C I + +PP G + AR+C TKR
Sbjct: 263 LGTGPVLLNLRGTFHARSNLGSLMRYSYWLHTQCSISLNTPPAGTMRARRCNTKR 317


>AT1G13050.2 | Symbols:  | unknown protein; FUNCTIONS IN:
           molecular_function unknown; INVOLVED IN:
           biological_process unknown; LOCATED IN: endomembrane
           system; EXPRESSED IN: 14 plant structures; EXPRESSED
           DURING: 9 growth stages; BEST Arabidopsis thaliana
           protein match is: unknown protein (TAIR:AT3G26350.1);
           Has 260 Blast hits to 259 proteins in 20 species: Archae
           - 0; Bacteria - 0; Metazoa - 0; Fungi - 0; Plants - 260;
           Viruses - 0; Other Eukaryotes - 0 (source: NCBI BLink).
           | chr1:4450964-4451521 FORWARD LENGTH=185
          Length = 185

 Score =  119 bits (299), Expect = 1e-27,   Method: Compositional matrix adjust.
 Identities = 71/181 (39%), Positives = 103/181 (56%), Gaps = 1/181 (0%)

Query: 68  ASVLCLIFSLVLIFFGVATLTIFLGIKPRNPTFDIPNANLNALYFDSPQYFNGDFTLLAN 127
           A++ C I  +VLI  G+  L ++L  +PR+P FDI  A LN    D     NGD  ++ N
Sbjct: 5   ATICCAILLIVLILSGLILLLVYLANRPRSPYFDISAATLNTANLDMGYVLNGDLAVVVN 64

Query: 128 FTNPNTKIDVSFESLDIELFFSDRIISSQSIEPFTQRRRESRLQSLHFISSLVFLPKDLG 187
           FTNP+ K  V F  +  EL+F + +I+++ IEPF   +  S   S H +SS V +     
Sbjct: 65  FTNPSKKSSVDFSYVMFELYFYNTLIATEHIEPFIVPKGMSMFTSFHLVSSQVQIQMIQS 124

Query: 188 VMLEKQVQSNLVNYNVRGTFKVRVTLG-LIHLSYLLHSRCQIEMTSPPTGGLVARKCITK 246
             L+ Q+ +  V  N+RGTF  R  LG L+  SY LH++C I + +PP G + AR+C TK
Sbjct: 125 QDLQLQLGTGPVLLNLRGTFHARSNLGSLMRYSYWLHTQCSISLNTPPAGTMRARRCNTK 184

Query: 247 R 247
           R
Sbjct: 185 R 185


>AT5G22870.1 | Symbols:  | Late embryogenesis abundant (LEA)
           hydroxyproline-rich glycoprotein family |
           chr5:7647056-7647679 REVERSE LENGTH=207
          Length = 207

 Score = 56.2 bits (134), Expect = 2e-08,   Method: Compositional matrix adjust.
 Identities = 41/155 (26%), Positives = 71/155 (45%), Gaps = 3/155 (1%)

Query: 69  SVLCLIF--SLVLIFFG-VATLTIFLGIKPRNPTFDIPNANLNALYFDSPQYFNGDFTLL 125
           S++C IF   L LIF   V  L  +L  KP+   + + NA++      +  + +  F   
Sbjct: 24  SLICYIFLVILTLIFMAAVGFLITWLETKPKKLRYTVENASVQNFNLTNDNHMSATFQFT 83

Query: 126 ANFTNPNTKIDVSFESLDIELFFSDRIISSQSIEPFTQRRRESRLQSLHFISSLVFLPKD 185
               NPN +I V + S++I + F D+ ++  ++EPF Q R   +      I+  V + K 
Sbjct: 84  IQSHNPNHRISVYYSSVEIFVKFKDQTLAFDTVEPFHQPRMNVKQIDETLIAENVAVSKS 143

Query: 186 LGVMLEKQVQSNLVNYNVRGTFKVRVTLGLIHLSY 220
            G  L  Q     + + V    +VR  +G+   S+
Sbjct: 144 NGKDLRSQNSLGKIGFEVFVKARVRFKVGIWKSSH 178


>AT5G56070.1 | Symbols:  | unknown protein; BEST Arabidopsis
           thaliana protein match is: unknown protein
           (TAIR:AT5G56050.1); Has 1807 Blast hits to 1807 proteins
           in 277 species: Archae - 0; Bacteria - 0; Metazoa - 736;
           Fungi - 347; Plants - 385; Viruses - 0; Other Eukaryotes
           - 339 (source: NCBI BLink). | chr5:22708679-22709204
           FORWARD LENGTH=119
          Length = 119

 Score = 56.2 bits (134), Expect = 2e-08,   Method: Compositional matrix adjust.
 Identities = 22/52 (42%), Positives = 36/52 (69%)

Query: 191 EKQVQSNLVNYNVRGTFKVRVTLGLIHLSYLLHSRCQIEMTSPPTGGLVARK 242
           ++QV SN++ Y +   F+V+V +G I+ SY L   CQ+++TSPP   L++RK
Sbjct: 47  QRQVTSNMIEYEIISRFRVKVVIGYINYSYWLKGSCQLQLTSPPADDLLSRK 98


>AT3G11650.1 | Symbols: NHL2 | NDR1/HIN1-like 2 |
           chr3:3676264-3676986 REVERSE LENGTH=240
          Length = 240

 Score = 52.8 bits (125), Expect = 2e-07,   Method: Compositional matrix adjust.
 Identities = 43/171 (25%), Positives = 75/171 (43%), Gaps = 6/171 (3%)

Query: 69  SVLCLIFSLVLIFFGVATLTIFLGIKPRNPTFDIPNANLNALYFDSPQYFNGDFTLLANF 128
           S++C I   V +  GVA L ++L  +P    F + +ANLN   FD     N  ++L  NF
Sbjct: 53  SLICNILIAVAVILGVAALILWLIFRPNAVKFYVADANLNRFSFDPNN--NLHYSLDLNF 110

Query: 129 T--NPNTKIDVSFESLDIELFFSDRIISSQSIEPFTQRRRESRLQSLHFIS-SLVFLPKD 185
           T  NPN ++ V ++   +  ++ D+   S ++  F Q  + + +        +LV L   
Sbjct: 111 TIRNPNQRVGVYYDEFSVSGYYGDQRFGSANVSSFYQGHKNTTVILTKIEGQNLVVLGDG 170

Query: 186 LGVMLEKQVQSNLVNYNVRGTFKVRVTLGLIHLSYLLHSRCQIEMTSPPTG 236
               L+   +S +   N +    VR     I  S+ L  + + +    P G
Sbjct: 171 ARTDLKDDEKSGIYRINAKLRLSVRFKFWFIK-SWKLKPKIKCDDLKIPLG 220


>AT2G35460.1 | Symbols:  | Late embryogenesis abundant (LEA)
           hydroxyproline-rich glycoprotein family |
           chr2:14905788-14906504 FORWARD LENGTH=238
          Length = 238

 Score = 52.4 bits (124), Expect = 3e-07,   Method: Compositional matrix adjust.
 Identities = 36/168 (21%), Positives = 79/168 (47%), Gaps = 5/168 (2%)

Query: 69  SVLCLIFSLVLIFFGVATLTIFLGIKPRNPTFDIPNANLNALYFDSPQYFNGDFTLLANF 128
           +++C I   VL+  GV  L ++  ++P    F +  A+L    FD P+  N  + +  NF
Sbjct: 52  NIICNILIGVLVCLGVVALILWFILRPNVVKFQVTEADLTRFEFD-PRSHNLHYNISLNF 110

Query: 129 T--NPNTKIDVSFESLDIELFFSDRIISSQSIEPFTQRRRESRLQSLHFIS-SLVFLPKD 185
           +  NPN ++ + ++ L++  ++ D+  S+ ++  F Q  + + +         LV L   
Sbjct: 111 SIRNPNQRLGIHYDQLEVRGYYGDQRFSAANMTSFYQGHKNTTVVGTELNGQKLVLLGAG 170

Query: 186 LGVMLEKQVQSNLVNYNVRGTFKVRVTLGLIHLSYLLHSRCQIEMTSP 233
                 +  +S +   +V+  FK+R   G ++ S+ +  + +  +  P
Sbjct: 171 GRRDFREDRRSGVYRIDVKLRFKLRFKFGFLN-SWAVRPKIKCHLKVP 217


>AT3G52470.1 | Symbols:  | Late embryogenesis abundant (LEA)
           hydroxyproline-rich glycoprotein family |
           chr3:19450750-19451376 FORWARD LENGTH=208
          Length = 208

 Score = 49.7 bits (117), Expect = 2e-06,   Method: Compositional matrix adjust.
 Identities = 44/181 (24%), Positives = 77/181 (42%), Gaps = 10/181 (5%)

Query: 71  LCLIFSLVLIFFGVATLTIFLG---IKPRNPTFDIPNANLNALYFDSPQYFNGDFTLLAN 127
           LC   + ++ F  +  +TIFL    ++P  P F + +A + A     P     +F +   
Sbjct: 19  LC---AAIIAFIVIVLITIFLVWVILRPTKPRFVLQDATVYAFNLSQPNLLTSNFQVTIA 75

Query: 128 FTNPNTKIDVSFESLDI-ELFFSDRIISSQSIEPFTQRRRESRLQSLHFISSLVFLPKDL 186
             NPN+KI + ++ L +   + + +I    +I P  Q  +E  + S     + V +    
Sbjct: 76  SRNPNSKIGIYYDRLHVYATYMNQQITLRTAIPPTYQGHKEVNVWSPFVYGTAVPIAPYN 135

Query: 187 GVMLEKQVQSNLVNYNVRGTFKVRVTL-GLIHLSYLLHSRCQ--IEMTSPPTGGLVARKC 243
            V L ++     V   +R    VR  +  LI   Y +H RCQ  I + +   G LV    
Sbjct: 136 SVALGEEKDRGFVGLMIRADGTVRWKVRTLITGKYHIHVRCQAFINLGNKAAGVLVGDNA 195

Query: 244 I 244
           +
Sbjct: 196 V 196


>AT3G11660.1 | Symbols: NHL1 | NDR1/HIN1-like 1 |
           chr3:3679031-3679660 REVERSE LENGTH=209
          Length = 209

 Score = 47.4 bits (111), Expect = 8e-06,   Method: Compositional matrix adjust.
 Identities = 42/158 (26%), Positives = 72/158 (45%), Gaps = 6/158 (3%)

Query: 73  LIFSLVLIFFGVATLTIFLGIKPRNPTFDIPNANLNALYFDS--PQYFNGDFTLLANFTN 130
           +IF L +IF  +  L I+  ++P  P F + +A + A       P     +F +  +  N
Sbjct: 22  IIFVLFIIFLTI--LLIWAILQPSKPRFILQDATVYAFNVSGNPPNLLTSNFQITLSSRN 79

Query: 131 PNTKIDVSFESLDI-ELFFSDRIISSQSIEPFTQRRRESRLQSLHFISSLVFLPKDLGVM 189
           PN KI + ++ LD+   + S +I    SI P  Q  ++  + S     + V +    GV 
Sbjct: 80  PNNKIGIYYDRLDVYATYRSQQITFPTSIPPTYQGHKDVDIWSPFVYGTSVPIAPFNGVS 139

Query: 190 LEKQVQSNLVNYNVRGTFKVRVTLG-LIHLSYLLHSRC 226
           L+    + +V   +R   +VR  +G  I   Y LH +C
Sbjct: 140 LDTDKDNGVVLLIIRADGRVRWKVGTFITGKYHLHVKC 177