Show simple item record

dc.contributor.advisorSinha, Kaushik
dc.contributor.authorRoss, Jarret
dc.date.accessioned2016-11-14T21:53:52Z
dc.date.available2016-11-14T21:53:52Z
dc.date.issued2016-05
dc.identifier.othert16028
dc.identifier.urihttp://hdl.handle.net/10057/12673
dc.descriptionThesis (M.S.)--Wichita State University, College of Engineering, Dept. of Computer Science
dc.description.abstractExtracting meaning out of biological sequences such as DNA, RNA, and strings of amino acids is a task that traditionally requires a large amount of expert knowledge. Breakthroughs and advancements of these subjects are slow due to the computational intractability inherent in biological sequences. If it were possible to lower or remove the high level of expertise needed to solve important problems in biology it might be possible to increase the pace of biological breakthroughs. As a small step in this direction this thesis focuses on the challenge of sub-cellular protein localization. It is possible to totally remove the need for any biological understanding by viewing the problem of Sub-cellular protein localization as a Natural Language Processing task. This method requires no hand engineered features and performs at a character level granularity. Modifications are made to an existing deep convolution network which was designed to perform a range of Natural Language Processing tasks such as Sentiment Analysis and Topic Classification. While this model does not achieve state of the art performance it is competitive with respect to other models evaluated in this Thesis. These findings are encouraging for a few reasons. First it is shown that a totally biologically naive method performs competitively with other hand engineered methods. Lastly it is hoped that the current intense research focus on Natural Language processing in the field of deep learning will greatly increase the viability of the method contained in this thesis in coming years.
dc.format.extentviii, 37 p.
dc.language.isoen_US
dc.publisherWichita State University
dc.rightsCopyright 2016 Jarret Ross
dc.subject.lcshElectronic dissertations
dc.titleTreating biological sequences as natural language, a case study on sub-cellular protein localization
dc.typeThesis


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record