이전까지는 유전체 문제를 풀어보았는데 이제 단백체 문제를 풀어보도록 하겠다. KOBIC의 Biopipe를 통해서 문제를 풀어야 하는데, 계속해서 Taverna를 이용해서 문제를 풀고 있으니... 경진대회에 참가하려면 Biopipe로 작성해야 하는데 ㅋㅋ
)의 search 메소드를 이용해서 질병관련 유전자를 검색한다. search메소드는 키워드(cancer, diabetes)로 주어진 질병에 대해 OMIM 데이터베이스를 검색해서 반환하는 메소드이다. 이 메소드의 결과로는 OMIM ID list가 반환된다. 반환된 결과의 OMIM ID에는 접두사로 asterisk(*), number(#), plus(+), percent(%)가 붙는데, 각각의 의미는
OMIM의 search 결과의 맨 마지막 부분을 보면 ";" 뒤에 gene symbol이 있다. 이 세미콜론(;) 뒤의 gene symbol만을 추출해서 convert_GeneSymbol_to_ProteinRefseq (http://sequenceome.kobic.re.kr/WS_Sequenceome/SearchDB?wsdl)을 통해 해당 gene symbol의 protein RefSeq를 얻으면 된다. convert_GeneSymbol_to_ProteinRefseq 메소드는 두개의 인자를 받는데, 하나는 gene symbol이고 다른 하나는 NCBI TaxID이다. Homo sapiens의 경우 9606이므로 convert_GeneSymbol_to_ProteinRefseq("BCAS1","9606")하면 BACS1의 Protein RefSeq를 얻을 수 있다.
OMIM의 결과로 부터 gene symbol을 추출하기 위해서 OMIM의 결과를 넣으면 gene symbol만을 String 형태로 뽑아내는 스크립트를 작성한다.
less..
import java.util.*; StringTokenizer tokenizer = new StringTokenizer(input,"\n"); int tokensize = tokenizer.countTokens(); String[][] line_out; line_out = new String[tokensize][1]; int i =0; while(tokenizer.hasMoreElements()){ line_out[i][0] = tokenizer.nextToken(); i = i+1; } //System.out.println("debug>>> : " + line_out.length); int count; count =0; boolean test; for(int k=0; k < line_out.length; k++){ test = line_out[k][0].contains(";"); if(test){ count = count +1; } } int output_size = count; //System.out.println(output_size); String[][] output_; output_ = new String[output_size][1]; int cou =0; for(int k=0; k < line_out.length; k++){ test = line_out[k][0].contains(";"); //System.out.println("debug>> " + k); if(test){ //System.out.println("debug>>> " + test); output_[cou][0] = line_out[k][0]; cou = cou+1; } } String[][] output; output = new String[output_.length][1]; String[][] debug; debug = new String[output_.length][1]; int j = 0; for(int hh=0; hh < output_.length ; hh++){ StringTokenizer tok = new StringTokenizer(output_[hh][0],";"); //output[hh][0] = output_[hh][0]; //tok.nextToken(); //test = tok.nextToken(); //debug[hh][0] = tok.nextToken(); //output[hh][0] = tok.nextToken(); if (tok.countTokens() != 2) { // } else if (tok.countTokens() == 2) { while (tok.hasMoreElements()) { // tok.nextToken(); output[hh][0] = tok.nextToken().trim(); //j = j + 1; } } } String output_string = ""; //boolean b; for (int x = 0; x < output.length; x++) { if (x == output.length - 1) { System.out.println("debug >> "+ output[x][0]); output_string = output_string + output[x][0]; // output_string = output_string.trim(); } else { System.out.println("debug >> "+ output[x][0]); //boolean b = output[x][0].contains("-"); //if (output[x][0] == "" | b) { if(output[x][0] ==""||output[x][0] == null || output[x][0].contains("-")) { output_string = output_string; } else { output_string = output_string + output[x][0] + ","; // output_string = output_string.trim(); } } } System.out.println(output_string); //output = "test"; output_size = output_.length;
less.. " 형태의 symbol이 생성된다. 이를 convert_GeneSymbol_to_ProteinRefseq 메소드로 돌리면 "
위와 같이 gene symbol:protein Refseq&protein Refseq 포맷으로 결과를 반환한다. 여기서 또 protein Refseq만을 뽑아내는 스크립트를 만든다.
자! 이젠 이렇게 얻은 Refseq에 대한 실제 서열을 가져온다. Biopipe 경진대회 문제에서는 refseq2seq 메소드를 사용하라고 되어 있는데, 이건 KOBIC에서 만든 메소드로 추측된다. 따라서 WSDL 주소를 모르겠다. ^^;; 그러나 여기 DDBJ의 RefSeq 모듈(
)의 search 메소드나 유전체 문제에서 사용한 EBI의 WSDbfetch를 이용하면 FlatFile 형태의 결과가 반환된다. 이렇게 해서 얻은 정보중 서열만 추출해낸다.
이렇게 얻은 질병 관련 protein 서열(정말 무지 힘들게 얻은)을 EBI interproscan을 통해 막관련 단백질을 추출한다. 근데 이부분에서 어떻게 interproscan에서 막관련 단백질만을 추출하는냐 지인에게 물었더니
이라고 하니 이부분을 찾으면 막관련 단백질을 찾는건 끝이고, gi number를 어떻게든 찾아서 Pathway정보만 얻으면 된단 말이지...
less..
AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMTigr TIGR01517 ATPase-IIB_Ca: calcium-translocating P-type 12 1087 0 T 28-Aug-2007 IPR006408 Calcium-translocating P-type ATPase, PMCA-type Molecular Function: calcium-transporting ATPase activity (GO:0005388), Molecular Function: calcium ion binding (GO:0005509), Molecular Function: ATP binding (GO:0005524), Biological Process: calcium ion transport (GO:0006816), Molecular Function: calcium ion transmembrane transporter activity (GO:0015085), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMTigr TIGR01494 ATPase_P-type: ATPase, P-type (transporting 156 278 4.6e-22 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMTigr TIGR01494 ATPase_P-type: ATPase, P-type (transporting 433 517 2.2e-26 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMTigr TIGR01494 ATPase_P-type: ATPase, P-type (transporting 698 748 3.9e-19 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMTigr TIGR01494 ATPase_P-type: ATPase, P-type (transporting 785 900 8.6e-35 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 FPrintScan PR00121 NAKATPASE 490 511 6e-10 T 28-Aug-2007 IPR006069 ATPase, P-type cation exchange, alpha subunit Biological Process: cation transport (GO:0006812), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 FPrintScan PR00121 NAKATPASE 622 640 6e-10 T 28-Aug-2007 IPR006069 ATPase, P-type cation exchange, alpha subunit Biological Process: cation transport (GO:0006812), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 FPrintScan PR00119 CATATPASE 237 251 8e-39 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 FPrintScan PR00119 CATATPASE 497 511 8e-39 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 FPrintScan PR00119 CATATPASE 701 712 8e-39 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 FPrintScan PR00119 CATATPASE 723 733 8e-39 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 FPrintScan PR00119 CATATPASE 818 837 8e-39 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 FPrintScan PR00119 CATATPASE 842 854 8e-39 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMSmart SM00734 no description 470 488 1.3e+02 T 28-Aug-2007 IPR006642 Zinc finger, Rad18-type putative Molecular Function: DNA binding (GO:0003677), Biological Process: DNA repair (GO:0006281) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMSmart SM00790 no description 762 910 3.6e+02 T 28-Aug-2007 IPR013983 Aldehyde ferredoxin oxidoreductase, N-terminal Biological Process: electron transport (GO:0006118), Molecular Function: oxidoreductase activity, acting on the aldehyde or oxo group of donors, iron-sulfur protein as acceptor (GO:0016625), Molecular Function: iron-sulfur cluster binding (GO:0051536) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMSmart SM00120 no description 957 1001 3e+02 T 28-Aug-2007 IPR000585 Hemopexin AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMSmart SM00558 no description 1005 1162 1.4e+02 T 28-Aug-2007 IPR003347 Transcription factor jumonji/aspartyl beta-hydroxylase AT2B2_HUMAN 7F10221B7B9AC3A2 1243 Gene3D G3DSA:2.70.150.10 no description 45 269 3e-29 T 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 Gene3D G3DSA:3.40.1110.10 no description 507 708 7.2e-44 T 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 Gene3D G3DSA:1.20.1110.10 no description 787 1091 1.2e-86 T 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 TMHMM tmhmm transmembrane_regions 106 124 NA ? 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 TMHMM tmhmm transmembrane_regions 150 170 NA ? 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 TMHMM tmhmm transmembrane_regions 400 418 NA ? 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 TMHMM tmhmm transmembrane_regions 437 471 NA ? 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 TMHMM tmhmm transmembrane_regions 878 900 NA ? 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 TMHMM tmhmm transmembrane_regions 955 973 NA ? 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 TMHMM tmhmm transmembrane_regions 1030 1050 NA ? 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 TMHMM tmhmm transmembrane_regions 1064 1084 NA ? 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMPanther PTHR11939:SF77 PLASMA MEMBRANE CALCIUM-TRANSPORTING ATPASE 2 (PMCA2) (PLASMA MEMBRANE CALCIUM PUMP ISOFORM 2) 3 123 0 T 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMPanther PTHR11939:SF77 PLASMA MEMBRANE CALCIUM-TRANSPORTING ATPASE 2 (PMCA2) (PLASMA MEMBRANE CALCIUM PUMP ISOFORM 2) 140 300 0 T 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMPanther PTHR11939:SF77 PLASMA MEMBRANE CALCIUM-TRANSPORTING ATPASE 2 (PMCA2) (PLASMA MEMBRANE CALCIUM PUMP ISOFORM 2) 385 945 0 T 28-Aug-2007 NULL NULL AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMPanther PTHR11939 CATION-TRANSPORTING ATPASE 3 123 0 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMPanther PTHR11939 CATION-TRANSPORTING ATPASE 140 300 0 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMPanther PTHR11939 CATION-TRANSPORTING ATPASE 385 945 0 T 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 ScanRegExp PS00154 ATPASE_E1_E2 499 505 NA ? 28-Aug-2007 IPR001757 ATPase, P-type, K/Mg/Cd/Cu/Zn/Na/Ca/Na/H-transporter Molecular Function: ATP binding (GO:0005524), Biological Process: transport (GO:0006810), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMPfam PF00690 Cation_ATPase_N 41 124 2.3e-19 T 28-Aug-2007 IPR004014 ATPase, P-type cation-transporter, N-terminal Biological Process: cation transport (GO:0006812), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMPfam PF00122 E1-E2_ATPase 156 489 1.9e-31 T 28-Aug-2007 IPR008250 E1-E2 ATPase-associated region Molecular Function: ATP binding (GO:0005524), Cellular Component: membrane (GO:0016020), Molecular Function: hydrolase activity, acting on acid anhydrides, catalyzing transmembrane movement of substances (GO:0016820) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMPfam PF00702 Hydrolase 493 841 1.3e-14 T 28-Aug-2007 IPR005834 Haloacid dehalogenase-like hydrolase Molecular Function: catalytic activity (GO:0003824), Biological Process: metabolic process (GO:0008152) AT2B2_HUMAN 7F10221B7B9AC3A2 1243 HMMPfam PF00689 Cation_ATPase_C 937 1088 1.7e-33 T 28-Aug-2007 IPR006068 ATPase, P-type cation-transporter, C-terminal Biological Process: cation transport (GO:0006812), Molecular Function: ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism (GO:0015662), Cellular Component: membrane (GO:0016020)
less.. 막관련 단백질의 gi number를 이용해서 대상질병에 대한 Pathway정보를 얻으면 된다. Pathway정보는 유전체 문제에서 다루었기 때문에 넘어간다.
이상 끝. 모자란 부분은 다음에 추가~!^^;;