



全文預(yù)覽已結(jié)束
下載本文檔
版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
CADViST:VisualizationToolforBLASTAlignmentofDengueVirusSequencesBoonyaratViriyasaksathian,YodchananWongsawatDepartmentofBiomedicalEngineering,MahidolUniversityNakornpathom,Thailandg5137363student.mahidol.ac.thandegywsmahidol.ac.thPrapatSuriyapholBioinformaticsandDataManagementforResearchUnit,OfficeforResearchandDevelopment,FacultyofMedicineSirirajHospital,MahidolUniversityBangkok,Thailandsipurmucc.mahidol.ac.thAbstractExplorationofthesearchenginethatcansimultaneouslyvisualizethegenomicsequencesisoneofthechallengingproblems.Inthispaper,weproposethesoftware,calledCADViST.TheUnitXgraphicalrepresentation(previouslyproposedbytheauthors)isemployedasthealternativetooltovisualizetheresultobtainedfromtheBasicLocalAlignmentSearchTool(BLAST).Theproposedsoftwarecanefficientlyhelptheusers/expertstoeasilyinterprettheresults,especiallyinDenguevirussequenceanalysiswheredifferentserotypesorsubtypesneedtobedistinguished.Keywords-BLAST,DengueVirus,Visualization,Bioinformatics.I.INTRODUCTIONInbioinformatics,theBasicLocalAlignmentSearchTool(BLAST)isoneofthemostwidelyusedtoolsforsequencesimilaritysearchduetoitsspeedandreasonableaccuracyofsearchingperformance.However,theBLASTprogramisstilllackedoftheuserfriendlygraphicalrepresentation.Hence,inthispaper,weaimtodevelopavisualizationtoolthatiscapabletodisplaythetextoutputresultingfromBLAST.Therearemanyexistingtoolsusedforvisualizingandanalyzingthegenomicsequences.Eachtoolisdevelopedbasedonsomespecifictaskswhichcanbecategorizedintofourapproaches,i.e.Basevector,Sequential,FourierTransform(FT)andZ-Curveapproaches.(1)Basevectorapproach:Hamori,E.andRuskin,J.(1983)representedDNAsequencesinathreedimensionalcurve(H-Curve)1.Gates,M.A.(1985)proposedthatgraphicalrepresentationofDNAsequenceintwodimensionalspacewasbetterthanH-Curve.Gatesgraphicalrepresentationshowsfournucleotidebases,i.e.adenine(A),thymine(T),cytosine(C),andguanine(G).TheunitvectorrepresentationsofthesebasesareontheCartesiancoordinatesystem,i.e.BaseAisonthenegativey-axis,baseTisonthepositivey-axis,baseGisonthepositivex-axis,andbaseCisonthenegativex-axis2.Aboutelevenyearslater,NandyA.(1996)proposedagraphicalrepresentationinordertodistinctthefeaturesofintronandexonsegmentsofeukaryoticsequences3.ThisgraphicalrepresentationwassimilartoGatesmethod.TheA,G,CandTnucleotidewasplottedonanACGT-axissystem.Theslopeofthisplotindicatedaclusterofintronandexonsequences.However,bothNandyandGatesmethodshavehighdegeneracysuchthatthesequencessuchasAGTC,AGTCA,andAGTCAGleadtothesamegraphicalrepresentation4.StephenS.T.Yauetal.,2003modifiedGatesmethod.Thefournucleicacidsareclassifiedintopyrimidine/purinegraphontwoquadrantsoftheCartesiancoordinatesystem.Thefirstquadrantrepresentspyrimidine(TandC),andtheforthquadrantrepresentspurine(AandG)4.Recently,theauthorsproposethegraphicalrepresentationespeciallyfortheDenguevirussequenceanalysisbasedonthecumulativeamountofaminoandketobases,calledUnitX5.(2)Sequentialapproach:Altschuletal.,1990developedtheBasicLocalAlignmentSearchTool(BLAST)program.Thisprogramisoneofthemostpopulartoolsforgenomicsequenceanalysis.Thistoolcanperformafastsimilaritysearch.Theprogramcomparesthesimilaritybetweenanytwosequencesanddisplaysthedifferencebetweenthesesequencesbycomparinginthebase-by-basebasis6.(3)FourierTransform(FT)approach:AnatassiouD.proposedthecolorspectrogramsofbiomolecularsequenceswhichisthetoolusedforvisualizationofthebiomolecularsequenceanalysis7,8.Spectrogramswhichcanrepresentthemagnitudeoftheshort-timeFouriertransform(STFT)isimplementedviathediscreteFouriertransform(DFT).AnalysisofthegenomicsequenceinfrequencydomainviatheFouriertransform(FT)usesthe3-periodicitypropertyforDNAcodingsequence.Thecolorspectrogramisdefinedbyusingthecolor:red,greenandblue.Eventhoughthismethodyieldsanimpressivegraphicalrepresentation,thecomputationalcomplexityisfairlyhigh.(4)Z-Curveapproach:ZhangC.T.etal.,1994suggestedapracticalvisualizationtoolcalledZ-Curve8-12.JamesJ.etaldevelopedthistoolinthepackagecalledMBEToolbox13.Accordingtotheassumptiononthecumulativecomponentsofthegenomicsequence,featuresobtainedfromZ-Curvecanbequicklyinterpreted,suchasthedistributionalongthesequenceofpurine/pyrimidinebases,amino/ketobases,strongH-bond/weakH-bond.SincethealgorithmofZ-Curveissimple,itcanbeappliedtoallgenomicsequencesregardlessofhowlongthosesequencesare.ThesimilarapproachwithZ-Curvecalled3DD-CurveispresentedbyZhangY.andTanM.(2008).ThisapproachcanbeviewedastheweightedversionofZ-Curve14.978-1-4244-4713-8/10/$25.002010IEEEThechoiceofselectingthegraphicalrepresentationcanvarybasedonthecharacteristicsofgenomicsequencesofinterest.Therefore,inthisfirstversionoftheproposedsoftware,Denguevirussequences(neucleotidesequences)areemployedtoverifythemeritoftheproposedsoftware.ThesoftwareiscalledCADViSTwhichstandsforClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool.ByemployingUnitXasthevisulizationtool,theproposedsoftwareissuitabletouseforintepretingtheDenguevirussequence.However,positioningofpartialDenguesequencesonDenguegenomewithUnitXrepresentationrequireshighcomputationalload.BLASTiswellknownastheefficientsearchingtool.However,visualizingtheresultsobtainedfromBLASTneedssomeimprovement.Therefore,inthispaper,weproposethesoftwarethatcombinesthemeritofbothBLASTandUnitX.TheproposedsoftwarecanefficientlysearchtheunknownportionofDenguevirussequencesandcansimultaneouslyillustrategraphicalrepresentationsoftheresultingsequences.Thispapercanbeorganizedasfollows.SectionIIintroducestheproposedvisualizationtool,calledCADViST.ThesoftwarearchitectureofCADViSTisdescribedinSectionIII.InSectionIV,thesimulationresultsoftheproposedsoftwareareshown.Finally,SectionVconcludesthepaper.II.CADVIST:THEPROPOSEDVISUALIZATIONTOOLClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool,orCADViST,isavisualizationtoolproposedespeciallyforanalyzingtheDenguevirussequences.AllcomponentsanddetailsofCADViSTcanbedescribedindetailsasfollows:A.BasicLocalAlignmentSearchTool(BLAST)BLASTprogramisdevelopedbyStephenF.AltchulandhiscoworkersattheNationalCenterforBiotechnologyInformation(NCBI).Itiswidelyusedforcalculatingthesequencesimilarity.BLASTworksthroughtheheuristicalgorithmtofindthebestpossibleresults.Itfindsthehomologoussequencesbylocatingshortmatchesbetweentwosequencestomakethesearchfast.SimilaritymeasurementtechniqueofBLASTusesstatisticaltheorytoassignascoringmatrixforallpossiblepairsofresiduesandproducetheExpectvalue(E-value)foreachalignmentpair.Thestand-aloneBLASTprogramsareprovidedasacompressedpackage.Thepackage,availableasBLASTinitialedarchivesforavarietyofcomputerplatform,isavailableontheBLASTftpsite:/blast/executables/release/.Inthispaper,weemployedstand-aloneBLASTversion2.2.22togenerateBLASToutput,asinputoftheproposedsoftware(CADViST).B.UnitXGraphicalRepresentationUnitXgraphicalrepresentationcanefficientlyrevealthedistributionofamino/ketobasesalongthesequenceontwoquadrantsoftheCartesiancoordinatesystem.Thefirstquadrantrepresentstheamountofamino(CandA)whilethefourthquadrantrepresentsamountofketo(TandG).Theunitvectorsrepresentfournucleotides,i.e.adenines(A),guanine(G),thymine(T),andcytosine(C),aredemonstratedasfollows(Fig.1):Figure1.TheUnitXvectorsrepresentfournucleotidesA,G,CandT.ByassigningthenumbersofoccurringofbasesA,C,G,andTinthesequences,thecoordinate(x,y)oftheprojectionontoXandYaxeswithUnitXrepresentationcanbeillustratedasfollows:nullnullnullnullnullnullnullnullnull2nullnullnullnullnullnull2nullnullnullC.IdeaofCADViSTInthispaper,weemployBLASTina“stand-alone”modetofindthesimilarityscoreamongthequerysequenceandtheDenguevirusnucleotidedatabase.ThesearchresultsobtainedfromBLASTaregraphicallydisplayedviaUnitXrepresentation.D.CreatingnucleotideBLASTdatabaseThemainadvantageofstand-aloneBLASTprogramistobeabletocreateyourowndatabase.TocreateanucleotideBLASTdatabase,weneedasourcefileofsequenceinFASTAformat.Thisfilewillbeprocessedbytheformatdbprogramcontainedwithinthestand-aloneBLASTpackagetobuildindexfilesofthedatabase.Afterexecutingformatdbcommand,threefileswillbeproducedfromthesourceFASTAfile.Fornucleotidedatabases,theextensionsarenhr,nin,andnsq15.Theformatdbcommandcanbeshownasfollows:formatdb-pF-iDatabaseName.fastaThesourceFASTAfilewillhavetheform:FirstsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSecondsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLastsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXwhereXsarenucleotidecodes(A,T,GorC).Inthispaper,thedatabaseoftheproposedsoftwareisobtainedfromNCBIwiththekeywordofDengueviruscompletegenome.All2,184nucleotidesequencescomposeoffourserotypesofDenguevirussequences(eachserotypecontains952,737,405and90nucleotidesequences,respectively).E.Stand-aloneexecutableBLASTThestand-aloneexecutableBLASTandNCBIweb-basedBLASTprogramprovideeasywaysforuserstoperformBLASTsearchviacommandlineorawebsite.TherearemanyadvantagestorunBLASTsearchprogramonyourownmachine,e.g.databasecanbeeasilyedited.Inthispaper,weemploystand-aloneBLASTprogramtogenerateBLASToutput.BLASTsearchcanbeexecutedviablastallcommandasfollow:blastall-pblastn-dDatabasename.fasta-iQuerySequence.fasta-m9-FFResult.txtF.GraphicalRepresentationviaUnitXInsteadyofdisplayingthesearchresultsinalphabets(Figs.4(b)and4(c)likeBLAST,CADViSTextractstheinformationfromBLASTandrepresentstheresultsgraphicallyviaUnitXrepresentationdescribedsectionIIB.Furthermore,inthecasethattheusersonlyneedtoexplorethenatureofDenguevirussequences,theycanalsoemployonlythegraphicalfeature(UnitX)ofCADViST.III.SOFTWAREARCHITECTUREOFCADVISTTodeveloptheuserfriendlyGUI,theproposedCADViSTsoftwareiswritteninC#programming.TheGUIofCADViSTcanbeshowninFig.3.Theinputfieldsforquerysequencecanbeeither(1)thetextfileinFASTAformator(2)textletterdirectlycopiedandputintotheblankspaceinFig.3.Oncetheinputisinserted,theprocessinsideCADViSTcanbesummarizedasfollows(Fig.2):Step1:Callstand-aloneBLASTprogramtogenerateBLASToutput,Step2:ExtractsequenceaccessionnumberandthecoordinatesofeachmatchedsequencefromBLASToutput,Step3:ProvidematchingregionsbetweenqueryandmatchedsequenceidentifiedbyBLASTprogramandsendtheresultstothedisplayunit,i.e.UnitXrepresentation.TheresultsareshowninFigs.4(d-e).Inaddition,otheroptionsofCADViSTarecopy,save,print,showpointvaluesinthegraphofUnitXvector.Theoptioncanbeselectedbymakingarightclickonthegraph.IV.SIMULATIONRESULTSAsanexample,weverifythemeritofCADViSTforfindingthesimilaritiesamongFN429899Denguevirusserotype3(20407143baseposition)andourDenguevirussequencedatabase.Traditionally,theresultsobtainedfromstand-aloneBLASTprogramconsistoftwomajorparts,i.e.(1)theone-linedescriptionsofeachdatabasesequencefoundtomatchthequerysequence(Fig.4(a),and(2)thealignmentbetweentheinputsequenceandthematchedquerysequences(Figs.4(b)-(c)16.Figs.4(b)and(c)illustratethefirstandsecondhighestscorematchedsequences,respectively.ByemployingtheinformationobtainedfromBLAST,Figs.4(d)-(e)representtheproposedgraphicalrepresentationviaCADViST.TheresultoftheproposedsoftwareconsistsoftwomainpartswhereeachpartdisplaysthegraphicalrepresentationviaUnitX.Thefirstpartshowsthewholegenomeofquerysequence(Fig.4(d).ThesecondpartdisplaysthematchedregionsbetweenthequeryandinputsequenceidentifiedbyBLAST(Fig.4(e).InFig.4(e),forconvenience,onlythefirst(FN429899)andsecond(AY858038)highestscoresmatchedsequencesareshown.Bothsequencesarealsofromthesameserotypeasourinputsequence.Asexpected,thefirsthighestscorematchedsequenceisthesequencethatwecopyitsportionasourquerysequence.Furthermore,accordingtoFigs.4(ac),wecanalsoobservethattheoutputofBLASTstilllacksofuserfriendlygraphicalrepresentation.Therefore,CADViSTcanefficientlybeoneofthealternativewaytovisualizetheresultingsequencesobtainedfromBLASTasshowninFigs.4(de).InFig.4(e),wecanobviouslyobservetheregionofthemismatchedbasepairs.TheresultoftheproposedsoftwarecanbedisplayedviathegraphoverlayingformattogetherwiththeUnitXrepresentationofthesequences(Fig.4(d).Figure2.FlowchartoftheproposedsoftwareFigure3.ScreenshotsoftheproposedsoftwareV.CONCLUSIONSInthispaper,wehavedevelopedthesoftwarecalledCADViST.TheproposedsoftwarecanbeusedtovisuallyanalyzethematchedregionsidentifiedbyBLASTbetweenthequerysequencesandtheDenguevirusdatabase.GraphicalrepresentationisimplementedviaUnitXwhichissuitableespeciallyforanalyzingdifferentserotypesofDenguevirusneocleotidesequences.ManyoptionsinCADViSTcanalsobenefitthebioinformaticsexperts,e.g.save,print,andshowtherawnumericvaluesonthegraph.WframworkofC#,CADViSTcanbeeasilymodifiedtoincludemoreopensourceorinhousedevelopedmathematicalmodeling,whilemaintainingtheuserfriendlyGUI.REFERENCES1E.HamoriandJ.Ruskin,“Hcurves,anovelmethodofrepresentationofnucleotideseriesespeciallysuitedforlongDNAsequences”.TheJournalofBiologicalChemistry,vol.258(2),1983,pp.1318-1327.2M.A.Gates,“SimpleDNAsequencerepresentations”,Nature,vol.316,1985.3A.Nandy,“Two-dimensionalgraphicalrepresentationofDNAsequencesandintron-exondiscriminationinintron-richsequences”.Bioinformatics,vol.12(1),1996,pp.55-62.4S.-T.Yau,J.Wang,A.Niknejad,C.Lu,N.JinandY-K.Ho,“DNAsequencerepresentationwithoutdegeneracy”,NucleicAcidsResearch,vol.31(12),2003,pp.3078-3080.5B.Viriyasaksathian,Y.WongsawatandP.Suriyaphol,“UnitX:Denguevirussequencegraphicalrepresentationforserotypesclassification”,ISBME2009,Bangkok,Thailand.6S.F.Altschul,W.Miller,E.MyersandD.J.Lipman,“Basiclocalalignmentsearchtool”,JournalofMolecularBiology,vol.215(3),1990,pp.403-410.7J.Ye,S.McGinnisandT.L.M
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 初中學(xué)業(yè)水平考試實(shí)驗(yàn)操作中常見(jiàn)設(shè)備問(wèn)題及解決方案
- 江蘇省大豐區(qū)萬(wàn)盈鎮(zhèn)沈灶初級(jí)中學(xué)2024年數(shù)學(xué)七上期末調(diào)研模擬試題含解析
- 廣東省廣州白云廣雅實(shí)驗(yàn)學(xué)校2025屆七上數(shù)學(xué)期末考試試題含解析
- 四川省雅安市雨城區(qū)雅安中學(xué)2024年物理八上期末學(xué)業(yè)質(zhì)量監(jiān)測(cè)模擬試題含解析
- 車輛買賣合同書及車輛改裝及年檢及保險(xiǎn)協(xié)議
- 采石廠礦產(chǎn)資源開(kāi)采權(quán)終止合同
- 游戲開(kāi)發(fā)與編程技巧
- 醫(yī)院潔凈手術(shù)室安全隱患自查手冊(cè)
- 行業(yè)發(fā)展趨勢(shì)預(yù)測(cè)與未來(lái)展望
- 智能穿戴設(shè)備技術(shù)的發(fā)展趨勢(shì)及市場(chǎng)分析
- 品管圈PDCA提高手衛(wèi)生依從性
- 石子加工合同2024年
- 職業(yè)本科《大學(xué)英語(yǔ)》課程標(biāo)準(zhǔn)
- 2024年華東師大版七年級(jí)數(shù)學(xué)下冊(cè)單元測(cè)試題及參考答案
- 民航旅客運(yùn)輸基礎(chǔ)知識(shí)考核試題及答案
- 2024年工業(yè)廢水處理工(技師)技能鑒定考試題庫(kù)-下(多選、判斷題)
- 五年級(jí)期末家長(zhǎng)會(huì)含內(nèi)容模板
- 肺功能進(jìn)修匯報(bào)護(hù)理課件
- JTGT F20-2015 公路路面基層施工技術(shù)細(xì)則
- 長(zhǎng)春高新:2023年年度審計(jì)報(bào)告
- 第五章 中國(guó)特色社會(huì)主義理論體系的形成發(fā)展(一)
評(píng)論
0/150
提交評(píng)論