Genetic algorithms for feature selection and classification of complex chromatographic and spectroscopic data

Mirjankar, Nikhil Suresh

dc.contributor.advisor	Lavine, Barry K.
dc.contributor.author	Mirjankar, Nikhil Suresh
dc.date.accessioned	2013-11-26T08:21:28Z
dc.date.available	2013-11-26T08:21:28Z
dc.date.issued	2012-12
dc.identifier.uri	https://hdl.handle.net/11244/6463
dc.description.abstract	A basic methodology for analyzing large multivariate chemical data sets based on feature selection is proposed. Each chromatogram or spectrum is represented as a point in a high dimensional measurement space. A genetic algorithm for feature selection and classification is applied to the data to identify features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. A good principal component plot can only be generated using features whose variance or information is primarily about differences between classes in the data. Hence, feature subsets that maximize the ratio of between-class to within-class variance are selected by the pattern recognition genetic algorithm. Furthermore, the structure of the data set can be explored, for example, new classes can be discovered by simply tuning various parameters of the fitness function of the pattern recognition genetic algorithm. The proposed method has been validated on a wide range of data.
dc.description.abstract	A two-step procedure for pattern recognition analysis of spectral data has been developed. First, wavelets are used to denoise and deconvolute spectral bands by decomposing each spectrum into wavelet coefficients, which represent the samples constituent frequencies. Second, the pattern recognition genetic algorithm is used to identify wavelet coefficients characteristic of the class. In several studies involving spectral library searching, this method was employed. In one study, a search pre-filter to detect the presence of carboxylic acids from vapor phase infrared spectra which has previously eluted prominent researchers has been successfully formulated and validated. In another study, this same approach has been used to develop a pattern recognition assisted infrared library searching technique to determine the model, manufacturer, and year of the vehicle from which a clear coat paint smear originated. The pattern recognition genetic algorithm has also been used to develop a potential method to identify molds in indoor environments using volatile organic compounds. A distinct profile indicative of microbial volatile organic compounds was developed from air sampling data that could be readily differentiated from the blank for both high mold count and moderate mold count exposure samples. The utility of the pattern recognition genetic algorithm for discovery of biomarker candidates from genomic and proteomic data sets has also been shown.
dc.format	application/pdf
dc.language	en_US
dc.rights	Copyright is held by the author who has granted the Oklahoma State University Library the non-exclusive right to share this material in its institutional repository. Contact Digital Library Services at lib-dls@okstate.edu or 405-744-9161 for the permission policy on the use, reproduction or distribution of this material.
dc.title	Genetic algorithms for feature selection and classification of complex chromatographic and spectroscopic data
dc.contributor.committeeMember	Materer, Nicholas F.
dc.contributor.committeeMember	El Rassi, Ziad
dc.contributor.committeeMember	Bunce, Richard A.
dc.contributor.committeeMember	Kalkan, A. K.
osu.filename	Mirjankar_okstate_0664D_12414.pdf
osu.accesstype	Open Access
dc.type.genre	Dissertation
dc.type.material	Text
dc.subject.keywords	biomarker candidates
dc.subject.keywords	chemometrics
dc.subject.keywords	genetic algorithm
dc.subject.keywords	library searching
dc.subject.keywords	microbial volatile organic compounds
dc.subject.keywords	p
thesis.degree.discipline	Chemistry
thesis.degree.grantor	Oklahoma State University

Files in this item

Name:: Chemistry Department_24.pdf
Size:: 3.877Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

OSU Dissertations [11222]

Show simple item record

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory

Genetic algorithms for feature selection and classification of complex chromatographic and spectroscopic data

Files in this item

This item appears in the following Collection(s)