FASTA searches a protein or DNA sequence data bank 36.3.4 Apr, 2011 Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448 Query: pF1KB9650, 315 aa 1>>>pF1KB9650 315 - 315 aa - 315 aa Library: human.CCDS.faa 18511270 residues in 32554 sequences Statistics: Expectation_n fit: rho(ln(x))= 7.6548+/-0.000695; mu= 9.7906+/- 0.042 mean_var=211.1230+/-42.814, 0's: 0 Z-trim(118.6): 57 B-trim: 418 in 2/52 Lambda= 0.088269 statistics sampled from 19543 (19603) to 19543 sequences Algorithm: FASTA (3.7 Nov 2010) [optimized] Parameters: BL50 matrix (15:-5), open/ext: -10/-2 ktup: 2, E-join: 1 (0.854), E-opt: 0.2 (0.602), width: 16 Scan time: 3.410 The best scores are: opt bits E(32554) CCDS12995.1 SOX12 gene_id:6666|Hs108|chr20 ( 315) 2222 294.3 8.4e-80 CCDS4547.1 SOX4 gene_id:6659|Hs108|chr6 ( 474) 647 93.9 2.6e-19 CCDS1654.1 SOX11 gene_id:6664|Hs108|chr2 ( 441) 586 86.1 5.5e-17 CCDS9473.1 SOX21 gene_id:11166|Hs108|chr13 ( 276) 439 67.2 1.7e-11 CCDS32549.1 SOX15 gene_id:6665|Hs108|chr17 ( 233) 424 65.2 5.8e-11 >>CCDS12995.1 SOX12 gene_id:6666|Hs108|chr20 (315 aa) initn: 2222 init1: 2222 opt: 2222 Z-score: 1546.2 bits: 294.3 E(32554): 8.4e-80 Smith-Waterman score: 2222; 100.0% identity (100.0% similar) in 315 aa overlap (1-315:1-315) 10 20 30 40 50 60 pF1KB9 MVQQRGARAKRDGGPPPPGPGPAEEGAREPGWCKTPSGHIKRPMNAFMVWSQHERRKIMD :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS12 MVQQRGARAKRDGGPPPPGPGPAEEGAREPGWCKTPSGHIKRPMNAFMVWSQHERRKIMD 10 20 30 40 50 60 70 80 90 100 110 120 pF1KB9 QWPDMHNAEISKRLGRRWQLLQDSEKIPFVREAERLRLKHMADYPDYKYRPRKKSKGAPA :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS12 QWPDMHNAEISKRLGRRWQLLQDSEKIPFVREAERLRLKHMADYPDYKYRPRKKSKGAPA 70 80 90 100 110 120 130 140 150 160 170 180 pF1KB9 KARPRPPGGSGGGSRLKPGPQLPGRGGRRAAGGPLGGGAAAPEDDDEDDDEELLEVRLVE :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS12 KARPRPPGGSGGGSRLKPGPQLPGRGGRRAAGGPLGGGAAAPEDDDEDDDEELLEVRLVE 130 140 150 160 170 180 190 200 210 220 230 240 pF1KB9 TPGRELWRMVPAGRAARGQAERAQGPSGEGAAAAAAASPTPSEDEEPEEEEEEAAAAEEG :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS12 TPGRELWRMVPAGRAARGQAERAQGPSGEGAAAAAAASPTPSEDEEPEEEEEEAAAAEEG 190 200 210 220 230 240 250 260 270 280 290 300 pF1KB9 EEETVASGEESLGFLSRLPPGPAGLDCSALDRDPDLQPPSGTSHFEFPDYCTPEVTEMIA :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS12 EEETVASGEESLGFLSRLPPGPAGLDCSALDRDPDLQPPSGTSHFEFPDYCTPEVTEMIA 250 260 270 280 290 300 310 pF1KB9 GDWRPSSIADLVFTY ::::::::::::::: CCDS12 GDWRPSSIADLVFTY 310 >>CCDS4547.1 SOX4 gene_id:6659|Hs108|chr6 (474 aa) initn: 926 init1: 580 opt: 647 Z-score: 460.0 bits: 93.9 E(32554): 2.6e-19 Smith-Waterman score: 652; 51.6% identity (67.6% similar) in 219 aa overlap (16-217:34-246) 10 20 30 40 pF1KB9 MVQQRGARAKRDGGPPPPGPGPAEEG-AREPGWCKTPSGHIKRPM : :: . : : .:.::::::::::::: CCDS45 QTNNAENTEALLAGESSDSGAGLELGIASSPTPGSTASTGGKADDPSWCKTPSGHIKRPM 10 20 30 40 50 60 50 60 70 80 90 100 pF1KB9 NAFMVWSQHERRKIMDQWPDMHNAEISKRLGRRWQLLQDSEKIPFVREAERLRLKHMADY :::::::: ::::::.: :::::::::::::.::.::.::.::::.:::::::::::::: CCDS45 NAFMVWSQIERRKIMEQSPDMHNAEISKRLGKRWKLLKDSDKIPFIREAERLRLKHMADY 70 80 90 100 110 120 110 120 130 140 150 pF1KB9 PDYKYRPRKK--------SKGAPAKARPRPPG----GSGGGSRLKPGPQLPGRGGRRAAG :::::::::: :..: :...: : ::::: : : :: :: CCDS45 PDYKYRPRKKVKSGNANSSSSAAASSKPGEKGDKVGGSGGG-----GHGGGGGGGSSNAG 130 140 150 160 170 160 170 180 190 200 pF1KB9 GPLGGGAAAPEDDDEDDDEELLEVRLVETPG----RELWRMVPAGRAARGQAERAQGPSG : ::::.. ... ... ... : . ... :: .. :.: : . : CCDS45 GG-GGGASGGGANSKPAQKKSCGSKVAGGAGGGVSKPHAKLILAGGGGGGKAAAAAAASF 180 190 200 210 220 230 210 220 230 240 250 260 pF1KB9 EGAAAAAAASPTPSEDEEPEEEEEEAAAAEEGEEETVASGEESLGFLSRLPPGPAGLDCS . :.::: CCDS45 AAEQAGAAALLPLGAAADHHSLYKARTPSASASASSAASASAALAAPGKHLAEKKVKRVY 240 250 260 270 280 290 >>CCDS1654.1 SOX11 gene_id:6664|Hs108|chr2 (441 aa) initn: 820 init1: 562 opt: 586 Z-score: 418.4 bits: 86.1 E(32554): 5.5e-17 Smith-Waterman score: 604; 38.4% identity (56.9% similar) in 318 aa overlap (22-283:31-338) 10 20 30 40 50 pF1KB9 MVQQRGARAKRDGGPPPPGPGPAEEGAREPGWCKTPSGHIKRPMNAFMVWS :. .: :::: ::::::::::::::: CCDS16 MVQQAESLEAESNLPREALDTEEGEFMACSPVALDESDPDWCKTASGHIKRPMNAFMVWS 10 20 30 40 50 60 60 70 80 90 100 110 pF1KB9 QHERRKIMDQWPDMHNAEISKRLGRRWQLLQDSEKIPFVREAERLRLKHMADYPDYKYRP . ::::::.: :::::::::::::.::..:.:::::::.::::::::::::::::::::: CCDS16 KIERRKIMEQSPDMHNAEISKRLGKRWKMLKDSEKIPFIREAERLRLKHMADYPDYKYRP 70 80 90 100 110 120 120 130 140 pF1KB9 RKKSKGAPAKARP---RPP------------GGSGGGSRLKPGP-------QLPGRGGRR ::: : :. :.: . : ::..::.. . : . :. .: . CCDS16 RKKPKMDPS-AKPSASQSPEKSAAGGGGGSAGGGAGGAKTSKGSSKKCGKLKAPAAAGAK 130 140 150 160 170 150 160 170 pF1KB9 AAGGPL-----------------------GGGAAAP--------EDDDEDDDEELLEVRL :..: :::.:. ::::.:::.. :.... CCDS16 AGAGKAAQSGDYGGAGDDYVLGSLRVSGSGGGGAGKTVKCVFLDEDDDDDDDDDELQLQI 180 190 200 210 220 230 180 190 200 210 220 230 pF1KB9 VETPGRELWRMVPAGRAARGQAERAQGPSGEGAAAAAAASPTPSEDEEPEEEEEEAAAAE . : .: . : . . ... . . .: . :::: : . : : .. CCDS16 KQEPDEED-EEPPHQQLLQPPGQQPSQLLRRYNVAKVPASPTLSSSAESPEGASLYDEVR 240 250 260 270 280 290 240 250 260 270 280 290 pF1KB9 EGEEETVASGEE---SLGFLSRLPPGPAGLDCSALDRDPDLQPPSGTSHFEFPDYCTPEV : ...: . :. ... : : . .: :.: :. : CCDS16 AGATSGAGGGSRLYYSFKNITKQHPPPLA--------QPALSPASSRSVSTSSSSSSGSS 300 310 320 330 340 350 300 310 pF1KB9 TEMIAGDWRPSSIADLVFTY CCDS16 SGSSGEDADDLMFDLSLNFSQSAHSASEQQLGGGAAAGNLSLSLVDKDLDSFSEGSLGSH 360 370 380 390 400 410 >>CCDS9473.1 SOX21 gene_id:11166|Hs108|chr13 (276 aa) initn: 487 init1: 385 opt: 439 Z-score: 319.7 bits: 67.2 E(32554): 1.7e-11 Smith-Waterman score: 439; 34.4% identity (57.5% similar) in 273 aa overlap (36-295:4-265) 10 20 30 40 50 60 pF1KB9 GARAKRDGGPPPPGPGPAEEGAREPGWCKTPSGHIKRPMNAFMVWSQHERRKIMDQWPDM : :.:::::::::::. .:::. .. : : CCDS94 MSKPVDHVKRPMNAFMVWSRAQRRKMAQENPKM 10 20 30 70 80 90 100 110 120 pF1KB9 HNAEISKRLGRRWQLLQDSEKIPFVREAERLRLKHMADYPDYKYRPRKKSKGAPAK---A ::.::::::: .:.:: .::: ::. ::.::: :: ..::::::::.: : : : CCDS94 HNSEISKRLGAEWKLLTESEKRPFIDEAKRLRAMHMKEHPDYKYRPRRKPKTLLKKDKFA 40 50 60 70 80 90 130 140 150 160 170 180 pF1KB9 RPRPPGGSGGGSRLKPGPQLPGRGGRRAAGGPLGGGAAAPEDDDEDDDEELLEVRLVETP : : : :: : : . .: .:.. ::. .::. . .. . . . CCDS94 FPVPYG--LGGVADAEHPALKAGAGLHAGA----GGGLVPESLLANPEKAAAAA--AAAA 100 110 120 130 140 190 200 210 220 230 pF1KB9 GRELWRMVPAGRAARGQAERAQGPSGEGAAAAAAASPTPSEDEEPEEEE----EEAAAAE .: .. . :. :: . : : .: . .. : . : . : .:.: CCDS94 ARVFFPQSAAAAAAAAAAAAAGSPYSLLDLGSKMAEISSSSSGLPYASSLGYPTAGAGAF 150 160 170 180 190 200 240 250 260 270 280 290 pF1KB9 EGEEETVASGEESLGFLSRLPPGPAG------LDCSALDRDPDLQPPSGTSHFEFPDYCT .: ..:.. . : .. :.:.. .::: .: :::: ... .: . CCDS94 HGAAAAAAAAAAAAGGHTHSHPSPGNPGYMIPCNCSAWP-SPGLQPP--LAYILLPGMGK 210 220 230 240 250 260 300 310 pF1KB9 PEVTEMIAGDWRPSSIADLVFTY :.. CCDS94 PQLDPYPAAYAAAL 270 >>CCDS32549.1 SOX15 gene_id:6665|Hs108|chr17 (233 aa) initn: 450 init1: 392 opt: 424 Z-score: 310.3 bits: 65.2 E(32554): 5.8e-11 Smith-Waterman score: 425; 50.7% identity (68.8% similar) in 144 aa overlap (21-161:28-153) 10 20 30 40 50 pF1KB9 MVQQRGARAKRDGGPPPPGPGPAE-EGAREPGWCKT-PSGHIKRPMNAFMVWS :: : ::: :. : : ..::::::::::: CCDS32 MALPGSSQDQAWSLEPPAATAAASSSSGPQEREGAGSPAAPGTLPLEKVKRPMNAFMVWS 10 20 30 40 50 60 60 70 80 90 100 110 pF1KB9 QHERRKIMDQWPDMHNAEISKRLGRRWQLLQDSEKIPFVREAERLRLKHMADYPDYKYRP . .::.. .: : :::.::::::: .:.::...:: :::.::.::: .:. ::::::::: CCDS32 SAQRRQMAQQNPKMHNSEISKRLGAQWKLLDEDEKRPFVEEAKRLRARHLRDYPDYKYRP 70 80 90 100 110 120 120 130 140 150 160 170 pF1KB9 RKKSKGAPAKARPRPPGGSGGGSRLKPGPQLPGRG-GRRAAGGPLGGGAAAPEDDDEDDD :.:.:.. : ::. :.: : :.:::: : . : CCDS32 RRKAKSSGA------------------GPSRCGQGRGNLASGGPLWGPGYATTQPSRGFG 130 140 150 160 180 190 200 210 220 230 pF1KB9 EELLEVRLVETPGRELWRMVPAGRAARGQAERAQGPSGEGAAAAAAASPTPSEDEEPEEE CCDS32 YRPPSYSTAYLPGSYGSSHCKLEAPSPCSLPQSDPRLQGELLPTYTHYLPPGSPTPYNPP 170 180 190 200 210 220 315 residues in 1 query sequences 18511270 residues in 32554 library sequences Tcomplib [36.3.4 Apr, 2011] (8 proc) start: Sat Nov 5 22:54:03 2016 done: Sat Nov 5 22:54:03 2016 Total Scan time: 3.410 Total Display time: -0.020 Function used was FASTA [36.3.4 Apr, 2011]