Miyakogusa Predicted Gene
- Lj4g3v2226740.1
BLASTP 2.2.25 [Feb-01-2011]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Reference for compositional score matrix adjustment: Altschul, Stephen F.,
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.
Query= Lj4g3v2226740.1 Non Chatacterized Hit- tr|I1LYF0|I1LYF0_SOYBN
Uncharacterized protein OS=Glycine max PE=4 SV=1,73.15,0,seg,NULL;
coiled-coil,NULL; SUBFAMILY NOT NAMED,NULL; FAMILY NOT
NAMED,NULL,CUFF.50573.1
(583 letters)
Database: TAIR10_pep
35,386 sequences; 14,482,855 total letters
Searching..................................................done
Score E
Sequences producing significant alignments: (bits) Value
AT1G48280.1 | Symbols: | hydroxyproline-rich glycoprotein famil... 446 e-125
AT3G25690.3 | Symbols: CHUP1 | Hydroxyproline-rich glycoprotein ... 272 6e-73
AT3G25690.2 | Symbols: CHUP1 | Hydroxyproline-rich glycoprotein ... 271 7e-73
AT3G25690.1 | Symbols: CHUP1 | Hydroxyproline-rich glycoprotein ... 271 7e-73
AT4G18570.1 | Symbols: | Tetratricopeptide repeat (TPR)-like su... 252 4e-67
AT1G07120.1 | Symbols: | FUNCTIONS IN: molecular_function unkno... 214 9e-56
AT4G04980.1 | Symbols: | unknown protein; BEST Arabidopsis thal... 68 2e-11
AT1G61080.1 | Symbols: | Hydroxyproline-rich glycoprotein famil... 53 7e-07
>AT1G48280.1 | Symbols: | hydroxyproline-rich glycoprotein family
protein | chr1:17835196-17837553 FORWARD LENGTH=558
Length = 558
Score = 446 bits (1147), Expect = e-125, Method: Compositional matrix adjust.
Identities = 255/530 (48%), Positives = 340/530 (64%), Gaps = 27/530 (5%)
Query: 68 ISSTRAKSVPPDLKNSSKAKRGLVLNKAKSIEE---VEGSQKGREGEEAKVVVLSAAAR- 123
++ + KS D+KN +R ++L +AKS EE V Q+ R VV R
Sbjct: 35 LTGGKPKSSGYDVKNDPAKRRSILLKRAKSAEEEMAVLAPQRARSVNRPAVVEQFGCPRR 94
Query: 124 -IRRRVGD--FGLRRGEDDPDGXXXXXXXXXXXXXXXXXXXXXXLIKNLQSEMLELKVEL 180
I R+ + ED+ LIK+LQ ++L LK EL
Sbjct: 95 PISRKSEETVMATAAAEDE--------KRKRMEELEEKLVVNESLIKDLQLQVLNLKTEL 146
Query: 181 DKARSVNMELESQNRKLTQELSAAEAKIAALGNSVKEPIGEHQSPKFKDIQKLIADKLER 240
++AR+ N+ELE NRKL+Q+L +AEAKI++L ++ K P EHQ+ +FKDIQ+LIA KLE+
Sbjct: 147 EEARNSNVELELNNRKLSQDLVSAEAKISSLSSNDK-PAKEHQNSRFKDIQRLIASKLEQ 205
Query: 241 SKVKKETVPEAIFVKASIPAPTPSRAV---------PETNIGRK--SXXXXXXXXXXXXX 289
KVKKE E+ + P+P+ P +++G++ +
Sbjct: 206 PKVKKEVAVESSRLSPPSPSPSRLPPTPPLPKFLVSPASSLGKRDENSSPFAPPTPPPPP 265
Query: 290 XXXXXXXXAKLANIQKPPAIVELFHSLKNQDGKKDSKGMVNHQRPVASSAHSSIVGEIQN 349
AK A QK P + +LF L QD ++ VN + +SAH+SIVGEIQN
Sbjct: 266 PPPPPRPLAKAARAQKSPPVSQLFQLLNKQDNSRNLSQSVNGNKSQVNSAHNSIVGEIQN 325
Query: 350 RSAHLLAIRADIETKGEFINDLIKKVVDAAYVDIEEVLKFVDWLDGELSTLADERAVLKH 409
RSAHL+AI+ADIETKGEFINDLI+KV+ + D+E+V+KFVDWLD EL+TLADERAVLKH
Sbjct: 326 RSAHLIAIKADIETKGEFINDLIQKVLTTCFSDMEDVMKFVDWLDKELATLADERAVLKH 385
Query: 410 FKWPEKKADAMREAAVEYRELKMLEQEISSFKDDLDIPCGASLRKMACLLDKSERSIQRL 469
FKWPEKKAD ++EAAVEYRELK LE+E+SS+ DD +I G +L+KMA LLDKSE+ I+RL
Sbjct: 386 FKWPEKKADTLQEAAVEYRELKKLEKELSSYSDDPNIHYGVALKKMANLLDKSEQRIRRL 445
Query: 470 IKLRSSAIRSYQVYNIPTAWMLDSGMMSKIKQASMTLVKMYMKRLTIELESIRNSDRESS 529
++LR S++RSYQ + IP WMLDSGM+ KIK+AS+ L K YM R+ EL+S RN DRES+
Sbjct: 446 VRLRGSSMRSYQDFKIPVEWMLDSGMICKIKRASIKLAKTYMNRVANELQSARNLDREST 505
Query: 530 QDSLLLQGVHFAYKAHQFAGGLDSETLCAFEEIRQRVPGNLAGSRELLAG 579
+++LLLQGV FAY+ HQFAGGLD ETLCA EEI+QRVP +L +R +AG
Sbjct: 506 KEALLLQGVRFAYRTHQFAGGLDPETLCALEEIKQRVPSHLRLARGNMAG 555
>AT3G25690.3 | Symbols: CHUP1 | Hydroxyproline-rich glycoprotein
family protein | chr3:9354061-9357757 FORWARD LENGTH=863
Length = 863
Score = 272 bits (695), Expect = 6e-73, Method: Compositional matrix adjust.
Identities = 124/264 (46%), Positives = 192/264 (72%), Gaps = 1/264 (0%)
Query: 303 IQKPPAIVELFHSLKNQDGKKD-SKGMVNHQRPVASSAHSSIVGEIQNRSAHLLAIRADI 361
+ + P +VE + SL ++ KK+ + +++ +S+A ++++GEI+NRS LLA++AD+
Sbjct: 577 VHRAPELVEFYQSLMKRESKKEGAPSLISSGTGNSSAARNNMIGEIENRSTFLLAVKADV 636
Query: 362 ETKGEFINDLIKKVVDAAYVDIEEVLKFVDWLDGELSTLADERAVLKHFKWPEKKADAMR 421
ET+G+F+ L +V +++ DIE++L FV WLD ELS L DERAVLKHF WPE KADA+R
Sbjct: 637 ETQGDFVQSLATEVRASSFTDIEDLLAFVSWLDEELSFLVDERAVLKHFDWPEGKADALR 696
Query: 422 EAAVEYRELKMLEQEISSFKDDLDIPCGASLRKMACLLDKSERSIQRLIKLRSSAIRSYQ 481
EAA EY++L LE++++SF DD ++ C +L+KM LL+K E+S+ L++ R AI Y+
Sbjct: 697 EAAFEYQDLMKLEKQVTSFVDDPNLSCEPALKKMYKLLEKVEQSVYALLRTRDMAISRYK 756
Query: 482 VYNIPTAWMLDSGMMSKIKQASMTLVKMYMKRLTIELESIRNSDRESSQDSLLLQGVHFA 541
+ IP W+ D+G++ KIK +S+ L K YMKR+ EL+S+ SD++ +++ LLLQGV FA
Sbjct: 757 EFGIPVDWLSDTGVVGKIKLSSVQLAKKYMKRVAYELDSVSGSDKDPNREFLLLQGVRFA 816
Query: 542 YKAHQFAGGLDSETLCAFEEIRQR 565
++ HQFAGG D+E++ AFEE+R R
Sbjct: 817 FRVHQFAGGFDAESMKAFEELRSR 840
>AT3G25690.2 | Symbols: CHUP1 | Hydroxyproline-rich glycoprotein
family protein | chr3:9354061-9357757 FORWARD
LENGTH=1004
Length = 1004
Score = 271 bits (694), Expect = 7e-73, Method: Compositional matrix adjust.
Identities = 124/264 (46%), Positives = 192/264 (72%), Gaps = 1/264 (0%)
Query: 303 IQKPPAIVELFHSLKNQDGKKD-SKGMVNHQRPVASSAHSSIVGEIQNRSAHLLAIRADI 361
+ + P +VE + SL ++ KK+ + +++ +S+A ++++GEI+NRS LLA++AD+
Sbjct: 718 VHRAPELVEFYQSLMKRESKKEGAPSLISSGTGNSSAARNNMIGEIENRSTFLLAVKADV 777
Query: 362 ETKGEFINDLIKKVVDAAYVDIEEVLKFVDWLDGELSTLADERAVLKHFKWPEKKADAMR 421
ET+G+F+ L +V +++ DIE++L FV WLD ELS L DERAVLKHF WPE KADA+R
Sbjct: 778 ETQGDFVQSLATEVRASSFTDIEDLLAFVSWLDEELSFLVDERAVLKHFDWPEGKADALR 837
Query: 422 EAAVEYRELKMLEQEISSFKDDLDIPCGASLRKMACLLDKSERSIQRLIKLRSSAIRSYQ 481
EAA EY++L LE++++SF DD ++ C +L+KM LL+K E+S+ L++ R AI Y+
Sbjct: 838 EAAFEYQDLMKLEKQVTSFVDDPNLSCEPALKKMYKLLEKVEQSVYALLRTRDMAISRYK 897
Query: 482 VYNIPTAWMLDSGMMSKIKQASMTLVKMYMKRLTIELESIRNSDRESSQDSLLLQGVHFA 541
+ IP W+ D+G++ KIK +S+ L K YMKR+ EL+S+ SD++ +++ LLLQGV FA
Sbjct: 898 EFGIPVDWLSDTGVVGKIKLSSVQLAKKYMKRVAYELDSVSGSDKDPNREFLLLQGVRFA 957
Query: 542 YKAHQFAGGLDSETLCAFEEIRQR 565
++ HQFAGG D+E++ AFEE+R R
Sbjct: 958 FRVHQFAGGFDAESMKAFEELRSR 981
>AT3G25690.1 | Symbols: CHUP1 | Hydroxyproline-rich glycoprotein
family protein | chr3:9354061-9357757 FORWARD
LENGTH=1004
Length = 1004
Score = 271 bits (694), Expect = 7e-73, Method: Compositional matrix adjust.
Identities = 124/264 (46%), Positives = 192/264 (72%), Gaps = 1/264 (0%)
Query: 303 IQKPPAIVELFHSLKNQDGKKD-SKGMVNHQRPVASSAHSSIVGEIQNRSAHLLAIRADI 361
+ + P +VE + SL ++ KK+ + +++ +S+A ++++GEI+NRS LLA++AD+
Sbjct: 718 VHRAPELVEFYQSLMKRESKKEGAPSLISSGTGNSSAARNNMIGEIENRSTFLLAVKADV 777
Query: 362 ETKGEFINDLIKKVVDAAYVDIEEVLKFVDWLDGELSTLADERAVLKHFKWPEKKADAMR 421
ET+G+F+ L +V +++ DIE++L FV WLD ELS L DERAVLKHF WPE KADA+R
Sbjct: 778 ETQGDFVQSLATEVRASSFTDIEDLLAFVSWLDEELSFLVDERAVLKHFDWPEGKADALR 837
Query: 422 EAAVEYRELKMLEQEISSFKDDLDIPCGASLRKMACLLDKSERSIQRLIKLRSSAIRSYQ 481
EAA EY++L LE++++SF DD ++ C +L+KM LL+K E+S+ L++ R AI Y+
Sbjct: 838 EAAFEYQDLMKLEKQVTSFVDDPNLSCEPALKKMYKLLEKVEQSVYALLRTRDMAISRYK 897
Query: 482 VYNIPTAWMLDSGMMSKIKQASMTLVKMYMKRLTIELESIRNSDRESSQDSLLLQGVHFA 541
+ IP W+ D+G++ KIK +S+ L K YMKR+ EL+S+ SD++ +++ LLLQGV FA
Sbjct: 898 EFGIPVDWLSDTGVVGKIKLSSVQLAKKYMKRVAYELDSVSGSDKDPNREFLLLQGVRFA 957
Query: 542 YKAHQFAGGLDSETLCAFEEIRQR 565
++ HQFAGG D+E++ AFEE+R R
Sbjct: 958 FRVHQFAGGFDAESMKAFEELRSR 981
>AT4G18570.1 | Symbols: | Tetratricopeptide repeat (TPR)-like
superfamily protein | chr4:10231439-10234534 FORWARD
LENGTH=642
Length = 642
Score = 252 bits (644), Expect = 4e-67, Method: Compositional matrix adjust.
Identities = 127/270 (47%), Positives = 185/270 (68%), Gaps = 7/270 (2%)
Query: 301 ANIQKPPAIVELFHSLKNQDG---KKDSKGMVNH--QRPVASSAHSSIVGEIQNRSAHLL 355
A +++ P +VE +HSL +D ++DS G N + +A+S ++GEI+NRS +LL
Sbjct: 351 AKVRRVPEVVEFYHSLMRRDSTNSRRDSTGGGNAAAEAILANSNARDMIGEIENRSVYLL 410
Query: 356 AIRADIETKGEFINDLIKKVVDAAYVDIEEVLKFVDWLDGELSTLADERAVLKHFKWPEK 415
AI+ D+ET+G+FI LIK+V +AA+ DIE+V+ FV WLD ELS L DERAVLKHF+WPE+
Sbjct: 411 AIKTDVETQGDFIRFLIKEVGNAAFSDIEDVVPFVKWLDDELSYLVDERAVLKHFEWPEQ 470
Query: 416 KADAMREAAVEYRELKMLEQEISSFKDDLDIPCGASLRKMACLLDKSERSIQRLIKLRSS 475
KADA+REAA Y +LK L E S F++D ++L+KM L +K E + L ++R S
Sbjct: 471 KADALREAAFCYFDLKKLISEASRFREDPRQSSSSALKKMQALFEKLEHGVYSLSRMRES 530
Query: 476 AIRSYQVYNIPTAWMLDSGMMSKIKQASMTLVKMYMKRLTIELESIRNSDRESSQDSLLL 535
A ++ + IP WML++G+ S+IK AS+ L YMKR++ ELE+I ++ L++
Sbjct: 531 AATKFKSFQIPVDWMLETGITSQIKLASVKLAMKYMKRVSAELEAIEGG--GPEEEELIV 588
Query: 536 QGVHFAYKAHQFAGGLDSETLCAFEEIRQR 565
QGV FA++ HQFAGG D+ET+ AFEE+R +
Sbjct: 589 QGVRFAFRVHQFAGGFDAETMKAFEELRDK 618
>AT1G07120.1 | Symbols: | FUNCTIONS IN: molecular_function unknown;
INVOLVED IN: biological_process unknown; LOCATED IN:
chloroplast envelope; EXPRESSED IN: inflorescence
meristem, petal, leaf whorl, flower; EXPRESSED DURING: 4
anthesis, petal differentiation and expansion stage;
BEST Arabidopsis thaliana protein match is:
Tetratricopeptide repeat (TPR)-like superfamily protein
(TAIR:AT4G18570.1); Has 288 Blast hits to 260 proteins
in 50 species: Archae - 0; Bacteria - 8; Metazoa - 27;
Fungi - 15; Plants - 163; Viruses - 0; Other Eukaryotes
- 75 (source: NCBI BLink). | chr1:2184874-2186580
REVERSE LENGTH=392
Length = 392
Score = 214 bits (546), Expect = 9e-56, Method: Compositional matrix adjust.
Identities = 110/271 (40%), Positives = 168/271 (61%), Gaps = 7/271 (2%)
Query: 303 IQKPPAIVELFHSLKNQDGKKDSKGMVNHQRPVASSAHSSIVGEIQNRSAHLLAIRADIE 362
+++ P +VE + +L ++ +K +N ++ + + +++GEI+NRS +L I++D +
Sbjct: 128 VRRAPEVVEFYRALTKRESHMGNK--INQNGVLSPAFNRNMIGEIENRSKYLSDIKSDTD 185
Query: 363 TKGEFINDLIKKVVDAAYVDIEEVLKFVDWLDGELSTLADERAVLKHF-KWPEKKADAMR 421
+ I+ LI KV A + DI EV FV W+D ELS+L DERAVLKHF KWPE+K D++R
Sbjct: 186 RHRDHIHILISKVEAATFTDISEVETFVKWIDEELSSLVDERAVLKHFPKWPERKVDSLR 245
Query: 422 EAAVEYRELKMLEQEISSFKDDLDIPCGASLRKMACLLDKSERSIQRLIKLRSSAIRSYQ 481
EAA Y+ K L EI SFKD+ +L+++ L D+ E S+ K+R S + Y+
Sbjct: 246 EAACNYKRPKNLGNEILSFKDNPKDSLTQALQRIQSLQDRLEESVNNTEKMRDSTGKRYK 305
Query: 482 VYNIPTAWMLDSGMMSKIKQASMTLVKMYMKRLTIELESIRNSDRESSQDSLLLQGVHFA 541
+ IP WMLD+G++ ++K +S+ L + YMKR+ ELE S+ + +L+LQGV FA
Sbjct: 306 DFQIPWEWMLDTGLIGQLKYSSLRLAQEYMKRIAKELE----SNGSGKEGNLMLQGVRFA 361
Query: 542 YKAHQFAGGLDSETLCAFEEIRQRVPGNLAG 572
Y HQFAGG D ETL F E+++ G G
Sbjct: 362 YTIHQFAGGFDGETLSIFHELKKITTGETRG 392
>AT4G04980.1 | Symbols: | unknown protein; BEST Arabidopsis
thaliana protein match is: unknown protein
(TAIR:AT1G11070.1); Has 23100 Blast hits to 15699
proteins in 1063 species: Archae - 116; Bacteria - 2262;
Metazoa - 8308; Fungi - 3268; Plants - 3181; Viruses -
958; Other Eukaryotes - 5007 (source: NCBI BLink). |
chr4:2544210-2547893 REVERSE LENGTH=880
Length = 880
Score = 67.8 bits (164), Expect = 2e-11, Method: Compositional matrix adjust.
Identities = 67/280 (23%), Positives = 130/280 (46%), Gaps = 32/280 (11%)
Query: 301 ANIQKPPAIVELFHSLKNQ--------DGKKDSKGM--VNHQRPV--ASSAHSSIVGEIQ 348
+ +++ I L+ +LK + KK SKG V + PV A S + + E+
Sbjct: 588 SKLRRSAQIANLYWALKGKLEGRGVEGKTKKASKGQNSVAEKSPVKVARSGMADALAEMT 647
Query: 349 NRSAHLLAIRADIETKGEFINDLIKKVVDAAYVDIEEVLKFVDWLDGELSTLADERAVLK 408
RS++ I D++ + I +L + D++E+L+F ++ L L DE VL
Sbjct: 648 KRSSYFQQIEEDVQKYAKSIEELKSSIHSFQTKDMKELLEFHSKVESILEKLTDETQVLA 707
Query: 409 HFK-WPEKKADAMREAAVEYRELKMLEQEISSFKDDLDIPCGASLRKMACLLDKSERSIQ 467
F+ +PEKK + +R A Y++L + E+ ++K ++ P L K+ +K + I+
Sbjct: 708 RFEGFPEKKLEVIRTAGALYKKLDGILVELKNWK--IEPPLNDLLDKIERYFNKFKGEIE 765
Query: 468 RLIKLRSSAIRSYQVYNIPTAWMLDSGMMSKIKQASMTLVKMYMKRLTIELESIRNSDRE 527
+ + + + ++ YNI +D ++ ++K+ T+V + + + L+ R ++ E
Sbjct: 766 TVERTKDEDAKMFKRYNIN----IDFEVLVQVKE---TMVDVSSNCMELALKERREANEE 818
Query: 528 SSQDS----------LLLQGVHFAYKAHQFAGGLDSETLC 557
+ L + FA+K + FAGG D C
Sbjct: 819 AKNGEESKMKEERAKRLWRAFQFAFKVYTFAGGHDERADC 858
>AT1G61080.1 | Symbols: | Hydroxyproline-rich glycoprotein family
protein | chr1:22493194-22497019 REVERSE LENGTH=907
Length = 907
Score = 52.8 bits (125), Expect = 7e-07, Method: Compositional matrix adjust.
Identities = 58/222 (26%), Positives = 94/222 (42%), Gaps = 56/222 (25%)
Query: 344 VGEIQNRSAHLLAIRADIETKGEFINDLIKKVVDAAYVDIEEVLKFVDWLDGELSTLADE 403
+ EI +SA+ L I+ADI IN+L ++ D+ E+L F ++ L L DE
Sbjct: 707 LAEITKKSAYFLQIQADIAKYMTSINELKIEITKFQTKDMTELLSFHRRVESVLENLTDE 766
Query: 404 RAVLKHFK-WPEKKADAMREAAVEYRELKMLEQEISSFKDDLDIPCGASLRKMACLLDKS 462
VL + +P+KK +AMR A Y +L + E+ + K ++ P LLDK
Sbjct: 767 SQVLARCEGFPQKKLEAMRMAVALYTKLHGMITELQNMK--IEPPLNQ-------LLDKV 817
Query: 463 ERSIQRLIKLRSSAIRSYQVYNIPTAWMLDSGMMSKIKQASMTLVKMYMKRLTIELESIR 522
ER +KIK+ T+V + + + L+ R
Sbjct: 818 ER------------------------------YFTKIKE---TMVDISSNCMELALKEKR 844
Query: 523 NSDRESSQDS------------LLLQGVHFAYKAHQFAGGLD 552
+ ++ S D+ +L + FA+K + FAGG D
Sbjct: 845 D-EKLVSPDAKPSLKKTVGSAKMLWRAFQFAFKVYTFAGGHD 885