ࡱ > 8 : 7 $ bjbjVV 4, < < 0 L f f f f A A A $ J . A A A A A f f A
f f A f Pֳ K . 0 y " , A A A A A A A
A A A A A A A A A A A A A A A A : Supplementary material 6: The pilot study
A pilot study has been carried out to assess the possible influence of species AAS-PS in local structure prediction. A classical Bayesian approach ADDIN EN.CITE de Brevern200088888817de Brevern, A. G.Etchebest, C.Hazout, S.Equipe de Bioinformatique Genomique et Moleculaire, INSERM U436, Universite Paris 7, Paris, France. debrevern@urbb.jussieu.frBayesian probabilistic approach for predicting backbone structures in terms of protein blocksProteinsProteins271-87413Artificial Intelligence*Bayes TheoremCluster Analysis*Computer SimulationDatabases, FactualForecastingLigases*Models, MolecularNeural Networks (Computer)Peptide Fragments/*chemistry/classification*Protein ConformationProtein Structure, SecondaryUbiquitins/metabolism2000Nov 1511025540http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11025540 (de Brevern et al., 2000)has been used to predict Protein Blocks from sequence ADDIN EN.CITE Joseph20102322322325Joseph, Agnel PraveenBornot, Aurliede Brevern, Alexandre G.Rangwala, Huzefa Karypis, George Local Structure AlphabetsProtein Structure Prediction in press2010wiley(Joseph et al., 2010). The principle was to train and validate one species specific databank and assess the prediction on the other databanks. It gives an idea of the influence of species specific preferences on prediction efficiency. 100 independent simulations have been carried out, as in one of our earlier works ADDIN EN.CITE Tyagi200919219219217Tyagi, M.Bornot, A.Offmann, B.de Brevern, A. G.Laboratoire de Biochimie et Genetique Moleculaire, Universite de La Reunion, BP 7151, 15 avenue Rene Cassin, 97715 Saint Denis Messag Cedex 09, La Reunion, France.Protein short loop prediction in terms of a structural alphabetComput Biol ChemComput Biol Chem329-333342009/07/252009Aug1476-928X (Electronic)19625218http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19625218S1476-9271(09)00051-6 [pii]
10.1016/j.compbiolchem.2009.06.002eng(Tyagi et al., 2009). Figure 6 summarizes the results of this analysis.
For the training based on the amino acid preferences observed in the NR databank, the average prediction rate was higher for the non-redundant set (~34%) when compared to the species specific validation sets. The least prediction rate was obtained in the case of Pf (~30%) for which highly diverged sequence-structure relationships were observed, as explained above. Predictions for Sc and At are slightly lower than the NR prediction rate (~32%).
When species specific training sets were used for the prediction, better prediction rates were obtained for the test sets from the same species. This is more obvious in the case of At and Pf. For At, a prediction rate of about 38%, which is 6% greater than the prediction rate with NR, was obtained. Moreover, no over-training was observed, reflecting the stability of the dataset distribution. Interestingly, the prediction rates with NR, Sc and Pf are only 29.1, 27.5 and 26.0% respectively. This underlines that the learning of specific sequence structures relationships of At is clearly more informative and distinct from the other species. For the training on Pf dataset, the prediction rate increases to 34.3%, i.e., 4% better than that with NR. All the other databank predictions were very low (~26%), again signifying the importance of species specific preferences. Efficiency of Sc predictions (trained on the species specific dataset) is comparable to that of NR.
Similar analyses have been carried out using non-redundant databanks generated with sequence identity cutoffs of 90, 50 and 25%. Results presented here are also obtained with the first two sets. For the datasets generated with 25% sequence identity cut-off, most of the amino acid preferences remain, mainly reflecting the relevance of this study. However for the latter, the prediction rates were highly sensitive to the random choice of protein sequences, underlining the need for more protein structures for Plasmodium falciparum or Arabidopsis thaliana, which is expected. Even though these preliminary results need to be assessed in-depth, they give good insights on the idea that the knowledge of species specific sequence-structure relationship can be used to improve the efficiency of prediction algorithms.
ADDIN EN.REFLIST de Brevern, A. G., Etchebest, C., Hazout, S., 2000. Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins 41, 271-87.
Joseph, A. P., Bornot, A., de Brevern, A. G., 2010. Local Structure Alphabets. In: Rangwala, H., Karypis, G., Eds.), Protein Structure Prediction wiley, pp. in press.
Tyagi, M., Bornot, A., Offmann, B., de Brevern, A. G., 2009. Protein short loop prediction in terms of a structural alphabet. Comput Biol Chem 33, 329-33.
) * V W p q y z
B D I K h j 8 : ? A G I 7 9 ; = B D ! R ԼԼԼԛԛԛԛԛԛԛԛԛԛԛԛԛԛ hkZ h_ ho 6]mH sH hkZ h_ 6]mH sH h mH nH sH uh mH sH j h UmH sH hkZ h_ 5mH sH hkZ h_ mH sH h# hkZ mH sH hkZ 5mH sH h# hkZ 5mH sH 6 * + S " " " B# # $ $ $0d ^`0a$gd@E $a$gd@E $d <