FASTA searches a protein or DNA sequence data bank 36.3.4 Apr, 2011 Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448 Query: pF1KB9648, 391 aa 1>>>pF1KB9648 391 - 391 aa - 391 aa Library: human.CCDS.faa 18511270 residues in 32554 sequences Statistics: Expectation_n fit: rho(ln(x))= 8.9186+/-0.00101; mu= 4.2540+/- 0.061 mean_var=298.1848+/-61.698, 0's: 0 Z-trim(114.8): 68 B-trim: 0 in 0/50 Lambda= 0.074273 statistics sampled from 15261 (15328) to 15261 sequences Algorithm: FASTA (3.7 Nov 2010) [optimized] Parameters: BL50 matrix (15:-5), open/ext: -10/-2 ktup: 2, E-join: 1 (0.765), E-opt: 0.2 (0.471), width: 16 Scan time: 3.140 The best scores are: opt bits E(32554) CCDS9523.1 SOX1 gene_id:6656|Hs108|chr13 ( 391) 2681 300.3 2e-81 CCDS14669.1 SOX3 gene_id:6658|Hs108|chrX ( 446) 833 102.3 9.1e-22 CCDS3239.1 SOX2 gene_id:6657|Hs108|chr3 ( 317) 780 96.5 3.7e-20 CCDS9473.1 SOX21 gene_id:11166|Hs108|chr13 ( 276) 611 78.3 9.6e-15 CCDS3094.1 SOX14 gene_id:8403|Hs108|chr3 ( 240) 602 77.3 1.7e-14 CCDS32549.1 SOX15 gene_id:6665|Hs108|chr17 ( 233) 485 64.7 9.9e-11 >>CCDS9523.1 SOX1 gene_id:6656|Hs108|chr13 (391 aa) initn: 2681 init1: 2681 opt: 2681 Z-score: 1575.2 bits: 300.3 E(32554): 2e-81 Smith-Waterman score: 2681; 100.0% identity (100.0% similar) in 391 aa overlap (1-391:1-391) 10 20 30 40 50 60 pF1KB9 MYSMMMETDLHSPGGAQAPTNLSGPAGAGGGGGGGGGGGGGGGAKANQDRVKRPMNAFMV :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS95 MYSMMMETDLHSPGGAQAPTNLSGPAGAGGGGGGGGGGGGGGGAKANQDRVKRPMNAFMV 10 20 30 40 50 60 70 80 90 100 110 120 pF1KB9 WSRGQRRKMAQENPKMHNSEISKRLGAEWKVMSEAEKRPFIDEAKRLRALHMKEHPDYKY :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS95 WSRGQRRKMAQENPKMHNSEISKRLGAEWKVMSEAEKRPFIDEAKRLRALHMKEHPDYKY 70 80 90 100 110 120 130 140 150 160 170 180 pF1KB9 RPRRKTKTLLKKDKYSLAGGLLAAGAGGGGAAVAMGVGVGVGAAAVGQRLESPGGAAGGG :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS95 RPRRKTKTLLKKDKYSLAGGLLAAGAGGGGAAVAMGVGVGVGAAAVGQRLESPGGAAGGG 130 140 150 160 170 180 190 200 210 220 230 240 pF1KB9 YAHVNGWANGAYPGSVAAAAAAAAMMQEAQLAYGQHPGAGGAHPHAHPAHPHPHHPHAHP :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS95 YAHVNGWANGAYPGSVAAAAAAAAMMQEAQLAYGQHPGAGGAHPHAHPAHPHPHHPHAHP 190 200 210 220 230 240 250 260 270 280 290 300 pF1KB9 HNPQPMHRYDMGALQYSPISNSQGYMSASPSGYGGLPYGAAAAAAAAAGGAHQNSAVAAA :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS95 HNPQPMHRYDMGALQYSPISNSQGYMSASPSGYGGLPYGAAAAAAAAAGGAHQNSAVAAA 250 260 270 280 290 300 310 320 330 340 350 360 pF1KB9 AAAAAASSGALGALGSLVKSEPSGSPPAPAHSRAPCPGDLREMISMYLPAGEGGDPAAAA :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS95 AAAAAASSGALGALGSLVKSEPSGSPPAPAHSRAPCPGDLREMISMYLPAGEGGDPAAAA 310 320 330 340 350 360 370 380 390 pF1KB9 AAAAQSRLHSLPQHYQGAGAGVNGTVPLTHI ::::::::::::::::::::::::::::::: CCDS95 AAAAQSRLHSLPQHYQGAGAGVNGTVPLTHI 370 380 390 >>CCDS14669.1 SOX3 gene_id:6658|Hs108|chrX (446 aa) initn: 1125 init1: 724 opt: 833 Z-score: 504.3 bits: 102.3 E(32554): 9.1e-22 Smith-Waterman score: 1326; 58.9% identity (77.1% similar) in 389 aa overlap (12-391:102-446) 10 20 30 40 pF1KB9 MYSMMMETDLHSPGGA-QAPTNLSGPAGAGGGGGGGGGGGG .:::: .. .: .: :..:::..::..::: CCDS14 PAPAMYSLLETELKNPVGTPTQAAGTGGPAAPGGAGKSSANAAGGANSGGGSSGGASGGG 80 90 100 110 120 130 50 60 70 80 90 100 pF1KB9 GGGAKANQDRVKRPMNAFMVWSRGQRRKMAQENPKMHNSEISKRLGAEWKVMSEAEKRPF :: ..::::::::::::::::::::::: ::::::::::::::::.::....:::::: CCDS14 GG---TDQDRVKRPMNAFMVWSRGQRRKMALENPKMHNSEISKRLGADWKLLTDAEKRPF 140 150 160 170 180 110 120 130 140 150 160 pF1KB9 IDEAKRLRALHMKEHPDYKYRPRRKTKTLLKKDKYSLAGGLLAAGAGGGGAAVAMGVGVG :::::::::.::::.:::::::::::::::::::::: .::: ::....::.: ..... CCDS14 IDEAKRLRAVHMKEYPDYKYRPRRKTKTLLKKDKYSLPSGLLPPGAAAAAAAAAAAAAAA 190 200 210 220 230 240 170 180 190 200 210 220 pF1KB9 VGAAAVGQRLESPGGAAGGGYAHVNGWANGAYPGSVAAAAAAAAMMQEAQLAYGQHPGAG . ..:::::.. :.:::::::::: ...:: ::.:.: :. . CCDS14 SSPVGVGQRLDT--------YTHVNGWANGAY-----------SLVQE-QLGYAQPPSMS 250 260 270 280 230 240 250 260 270 pF1KB9 GAHPHAHPAHPHPHHPHAHPHNPQPMHRYDMGALQYSPI--SNSQGYMS-----ASPSGY . : : : : : :::::::..:::::. ..:.::. :. ::: CCDS14 S---------PPP--PPALP----PMHRYDMAGLQYSPMMPPGAQSYMNVAAAAAAASGY 290 300 310 320 330 280 290 300 310 320 330 pF1KB9 GGLPYGAAAAAAAAAGGAHQNSAVAAAAAAAAASSGALGALGSLVKSEPSGSPPAPA-HS ::. .:.:::::: : :. :.:::::::::. .:: .::.::::::. ::: : :: CCDS14 GGMAPSATAAAAAAYG---QQPATAAAAAAAAAAM-SLGPMGSVVKSEPSSPPPAIASHS 340 350 360 370 380 340 350 360 370 380 390 pF1KB9 RAPCPGDLREMISMYLPAGEGGDPAAAAAAAAQSRLHSLPQHYQGAGAGVNGTVPLTHI . : ::::.::::::: ::: : ::. .:::.. :::::::..:::::::::: CCDS14 QRACLGDLRDMISMYLPP--GGDAADAASPLPGGRLHGVHQHYQGAGTAVNGTVPLTHI 390 400 410 420 430 440 >>CCDS3239.1 SOX2 gene_id:6657|Hs108|chr3 (317 aa) initn: 1037 init1: 728 opt: 780 Z-score: 475.4 bits: 96.5 E(32554): 3.7e-20 Smith-Waterman score: 1167; 52.9% identity (69.4% similar) in 399 aa overlap (1-391:1-317) 10 20 30 40 50 60 pF1KB9 MYSMMMETDLHSPGGAQAPTNLSGPAGAGGGGGGGGGGGGGGGAKANQDRVKRPMNAFMV ::.:: ::.:. :: :. .:::::.. ....::. : . :::::::::::: CCDS32 MYNMM-ETELKPPGPQQT---------SGGGGGNSTAAAAGGNQKNSPDRVKRPMNAFMV 10 20 30 40 50 70 80 90 100 110 120 pF1KB9 WSRGQRRKMAQENPKMHNSEISKRLGAEWKVMSEAEKRPFIDEAKRLRALHMKEHPDYKY ::::::::::::::::::::::::::::::..::.::::::::::::::::::::::::: CCDS32 WSRGQRRKMAQENPKMHNSEISKRLGAEWKLLSETEKRPFIDEAKRLRALHMKEHPDYKY 60 70 80 90 100 110 130 140 150 160 170 pF1KB9 RPRRKTKTLLKKDKYSLAGGLLAAGAGGGGAAVAMGVGVGVG-AAAVGQRLESPGGAAGG :::::::::.:::::.: ::::: : : ..: :::::.: .:.:.::..: CCDS32 RPRRKTKTLMKKDKYTLPGGLLAPG----GNSMASGVGVGAGLGAGVNQRMDS------- 120 130 140 150 180 190 200 210 220 230 pF1KB9 GYAHVNGWANGAYPGSVAAAAAAAAMMQEAQLAYGQHPGAGGAHPHAHPAHPHPHHPHAH :::.:::.::.: .:::. ::.: :::: .:: : CCDS32 -YAHMNGWSNGSY-----------SMMQD-QLGYPQHPGL-----NAHGAAQM------- 160 170 180 190 240 250 260 270 280 290 pF1KB9 PHNPQPMHRYDMGALQYSPISNSQGYMSASPSGYGGLPYGAAAAAAAAAGGAHQNSAVAA :::::::..::::. ...:: ::..::. :. . . .. : : CCDS32 ----QPMHRYDVSALQYNSMTSSQTYMNGSPT------YSMSYSQQGTPGMA-------- 200 210 220 230 300 310 320 330 340 350 pF1KB9 AAAAAAASSGALGALGSLVKSEPSGSPP---APAHSRAPCP-GDLREMISMYLPAGEGGD ::..::.:::: :.::: . .:::::: ::::.:::::::..: . CCDS32 -----------LGSMGSVVKSEASSSPPVVTSSSHSRAPCQAGDLRDMISMYLPGAEVPE 240 250 260 270 280 360 370 380 390 pF1KB9 PAAAAAAAAQSRLHSLPQHYQGA---GAGVNGTVPLTHI ::: :::: . ::::.. :...:::.::.:. CCDS32 PAAP------SRLH-MSQHYQSGPVPGTAINGTLPLSHM 290 300 310 >>CCDS9473.1 SOX21 gene_id:11166|Hs108|chr13 (276 aa) initn: 742 init1: 555 opt: 611 Z-score: 378.2 bits: 78.3 E(32554): 9.6e-15 Smith-Waterman score: 706; 46.4% identity (64.1% similar) in 323 aa overlap (49-364:6-275) 20 30 40 50 60 70 pF1KB9 PTNLSGPAGAGGGGGGGGGGGGGGGAKANQDRVKRPMNAFMVWSRGQRRKMAQENPKMHN :.:::::::::::::.:::::::::::::: CCDS94 MSKPVDHVKRPMNAFMVWSRAQRRKMAQENPKMHN 10 20 30 80 90 100 110 120 130 pF1KB9 SEISKRLGAEWKVMSEAEKRPFIDEAKRLRALHMKEHPDYKYRPRRKTKTLLKKDKYSLA ::::::::::::...:.::::::::::::::.::::::::::::::: ::::::::... CCDS94 SEISKRLGAEWKLLTESEKRPFIDEAKRLRAMHMKEHPDYKYRPRRKPKTLLKKDKFAFP 40 50 60 70 80 90 140 150 160 170 180 190 pF1KB9 GGLLAAGAGGGGAAVAMGVGVGVGAAAVGQRLESPGGAAGGGYAHVNGWANGAYPGSVAA . : :: . : .. .:.: : .:::: . . :: : ..:: CCDS94 ---VPYGLGGVADAEHPALKAGAGLHA----------GAGGGLVPESLLAN---PEKAAA 100 110 120 130 200 210 220 230 240 250 pF1KB9 AAAAAAMMQEAQLAYGQHPGAGGAHPHAHPAHPHPHHPHAHPHNPQPMHRYDMGALQYSP :::::: :.. . : .:..: : : .:. :.:. ... CCDS94 AAAAAA----ARVFFPQSAAAAAAAAAAAAAG-------------SPYSLLDLGS-KMAE 140 150 160 170 180 260 270 280 290 300 310 pF1KB9 ISNSQGYMSASPSGYGGLPYGAAAAAAAAAGGAHQNSAVAAAAAAAAASSGALGALGSLV ::.:.. ::::... . .:..:: ...:.::::::::: :. . CCDS94 ISSSSS----------GLPYASSLGYPTAGAGAFHGAAAAAAAAAAAA--------GGHT 190 200 210 220 320 330 340 350 360 370 pF1KB9 KSEPSGSPPA---PAHSRA-PCPGDLREMISMYLPAGEGG---DPAAAAAAAAQSRLHSL .:.:: . :. : . : : :: . . :: : : :: :: ::: CCDS94 HSHPSPGNPGYMIPCNCSAWPSPGLQPPLAYILLP-GMGKPQLDPYPAAYAAAL 230 240 250 260 270 380 390 pF1KB9 PQHYQGAGAGVNGTVPLTHI >>CCDS3094.1 SOX14 gene_id:8403|Hs108|chr3 (240 aa) initn: 590 init1: 563 opt: 602 Z-score: 373.7 bits: 77.3 E(32554): 1.7e-14 Smith-Waterman score: 602; 46.5% identity (64.7% similar) in 241 aa overlap (49-285:6-239) 20 30 40 50 60 70 pF1KB9 PTNLSGPAGAGGGGGGGGGGGGGGGAKANQDRVKRPMNAFMVWSRGQRRKMAQENPKMHN :..::::::::::::::::::::::::::: CCDS30 MSKPSDHIKRPMNAFMVWSRGQRRKMAQENPKMHN 10 20 30 80 90 100 110 120 130 pF1KB9 SEISKRLGAEWKVMSEAEKRPFIDEAKRLRALHMKEHPDYKYRPRRKTKTLLKKDKYSLA ::::::::::::..:::::::.::::::::: ::::::::::::::: :.:::::.: . CCDS30 SEISKRLGAEWKLLSEAEKRPYIDEAKRLRAQHMKEHPDYKYRPRRKPKNLLKKDRYVFP 40 50 60 70 80 90 140 150 160 170 180 190 pF1KB9 GGLLAAGAGGGGAAVAMGVGVGVGAAAVGQRLESPGGAAGGGYAHVNGWANGAYP--GSV :. .:.. .:.. :. .: : : ..: . ....: : : CCDS30 LPYLGDTDPLKAAGLPVGASDGLLSAPEKARAFLPPASAPYSLLDPAQFSSSAIQKMGEV 100 110 120 130 140 150 200 210 220 230 240 250 pF1KB9 AAAAAAAAMMQEAQLAYGQHPGAGGAHPHAHPAHPHPH-HPHAHPHNPQPMHRYDMGALQ . :..:. . :.: . :: :. . : : : : : :: . . : . CCDS30 PHTLATGALPYASTLGY--QNGAFGSL-----SCPSQHTHTHPSPTNPGYVVPCNCTAWS 160 170 180 190 200 260 270 280 290 300 310 pF1KB9 YSPISNSQGYMSASPSGYGGL-PYGAAAAAAAAAGGAHQNSAVAAAAAAAAASSGALGAL : .. .:. :. ::..: :.: CCDS30 ASTLQPPVAYILFPGMTKTGIDPYSSAHATAM 210 220 230 240 320 330 340 350 360 370 pF1KB9 GSLVKSEPSGSPPAPAHSRAPCPGDLREMISMYLPAGEGGDPAAAAAAAAQSRLHSLPQH >>CCDS32549.1 SOX15 gene_id:6665|Hs108|chr17 (233 aa) initn: 476 init1: 446 opt: 485 Z-score: 306.1 bits: 64.7 E(32554): 9.9e-11 Smith-Waterman score: 495; 39.8% identity (59.8% similar) in 246 aa overlap (10-246:14-228) 10 20 30 40 50 pF1KB9 MYSMMMETDLHSPGGAQAPTNLSGPAGAGGGGGGGGGGGGGGGAKANQDRVKRPMN :. :... : .. ::: :.:. .. : ..:::::: CCDS32 MALPGSSQDQAWSLEPPAATAAASSSSGPQEREGAGSPAAPG------TLPLEKVKRPMN 10 20 30 40 50 60 70 80 90 100 110 pF1KB9 AFMVWSRGQRRKMAQENPKMHNSEISKRLGAEWKVMSEAEKRPFIDEAKRLRALHMKEHP :::::: .:::.:::.:::::::::::::::.::...: :::::..::::::: :....: CCDS32 AFMVWSSAQRRQMAQQNPKMHNSEISKRLGAQWKLLDEDEKRPFVEEAKRLRARHLRDYP 60 70 80 90 100 110 120 130 140 150 160 170 pF1KB9 DYKYRPRRKTKTLLKKDKYSLAGGLLAAGAGGGGAAVAMGVGVGVGAAAVGQRLESPGGA :::::::::.:. .::: . : : : : : : .:: : CCDS32 DYKYRPRRKAKS---------------SGAGPSR------CGQGRGNLASGGPLWGPGYA 120 130 140 150 180 190 200 210 220 pF1KB9 A-----GGGYAHVNGWANGAYPGSVAAAAAAAAMMQEAQLAYGQHPGAGGAHP-HAH--- . : :: . ..... ::: ... . .: .. : : ..: CCDS32 TTQPSRGFGY-RPPSYSTAYLPGSYGSSHCKLEAPSPCSLPQSDPRLQGELLPTYTHYLP 160 170 180 190 200 210 230 240 250 260 270 280 pF1KB9 PAHPHPHHPHAHPHNPQPMHRYDMGALQYSPISNSQGYMSASPSGYGGLPYGAAAAAAAA :. : :..: : :: CCDS32 PGSPTPYNP---PLAGAPMPLTHL 220 230 391 residues in 1 query sequences 18511270 residues in 32554 library sequences Tcomplib [36.3.4 Apr, 2011] (8 proc) start: Tue Nov 8 02:06:50 2016 done: Tue Nov 8 02:06:51 2016 Total Scan time: 3.140 Total Display time: 0.010 Function used was FASTA [36.3.4 Apr, 2011]