Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins

Verma, Ruchi; Melcher, Ulrich

dc.contributor.author	Verma, Ruchi
dc.contributor.author	Melcher, Ulrich
dc.date.accessioned	2018-11-09T21:10:58Z
dc.date.available	2018-11-09T21:10:58Z
dc.date.issued	2012-09-11
dc.identifier	oksd_verma_asupportvectorm_2012
dc.identifier.citation	Verma, R., & Melcher, U. (2012). A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins. BMC Bioinformatics, 13(Suppl 15), Article 59. https://doi.org/10.1186/1471-2105-13-S15-S9
dc.identifier.uri	https://hdl.handle.net/11244/302078
dc.description.abstract	Background: Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM).
dc.description.abstract	Result: The amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity.
dc.description.abstract	Conclusion: The results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences.
dc.format	application/pdf
dc.language	en_US
dc.publisher	BioMed Central
dc.rights	This material has been previously published. In the Oklahoma State University Library's institutional repository this version is made available through the open access principles and the terms of agreement/consent between the author(s) and the publisher. The permission policy on the use, reproduction or distribution of the material falls under fair use for educational, scholarship, and research purposes. Contact Digital Resources and Discovery Services at lib-dls@okstate.edu or 405-744-9161 for further information.
dc.title	Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins
osu.filename	oksd_verma_asupportvectorm_2012.pdf
dc.description.peerreview	Peer reviewed
dc.identifier.doi	10.1186/1471-2105-13-S15-S9
dc.description.department	Biochemistry and Molecular Biology
dc.type.genre	Article
dc.type.material	Text
dc.subject.keywords	proteobacteria
dc.subject.keywords	plant proteins
dc.subject.keywords	svm
dc.subject.keywords	machine learning
dc.subject.keywords	amino acid composition
dc.subject.keywords	dipeptide composition

Files in this item

Name:: oksd_verma_asupportvectorm_2012.pdf
Size:: 858.8Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

OSU - Faculty and Staff Publications [1079]

Show simple item record

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory

Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins

Files in this item

This item appears in the following Collection(s)