外文資料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF_第1頁
外文資料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF_第2頁
外文資料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF_第3頁
外文資料--Monte Carlo Simulations of Spatial Patterns of the Degree of (2).PDF_第4頁
全文預覽已結(jié)束

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領

文檔簡介

CADViST:VisualizationToolforBLASTAlignmentofDengueVirusSequencesBoonyaratViriyasaksathian,YodchananWongsawatDepartmentofBiomedicalEngineering,MahidolUniversityNakornpathom,Thailandg5137363student.mahidol.ac.thandegywsmahidol.ac.thPrapatSuriyapholBioinformaticsandDataManagementforResearchUnit,OfficeforResearchandDevelopment,FacultyofMedicineSirirajHospital,MahidolUniversityBangkok,Thailandsipurmucc.mahidol.ac.thAbstractExplorationofthesearchenginethatcansimultaneouslyvisualizethegenomicsequencesisoneofthechallengingproblems.Inthispaper,weproposethesoftware,calledCADViST.TheUnitXgraphicalrepresentation(previouslyproposedbytheauthors)isemployedasthealternativetooltovisualizetheresultobtainedfromtheBasicLocalAlignmentSearchTool(BLAST).Theproposedsoftwarecanefficientlyhelptheusers/expertstoeasilyinterprettheresults,especiallyinDenguevirussequenceanalysiswheredifferentserotypesorsubtypesneedtobedistinguished.Keywords-BLAST,DengueVirus,Visualization,Bioinformatics.I.INTRODUCTIONInbioinformatics,theBasicLocalAlignmentSearchTool(BLAST)isoneofthemostwidelyusedtoolsforsequencesimilaritysearchduetoitsspeedandreasonableaccuracyofsearchingperformance.However,theBLASTprogramisstilllackedoftheuserfriendlygraphicalrepresentation.Hence,inthispaper,weaimtodevelopavisualizationtoolthatiscapabletodisplaythetextoutputresultingfromBLAST.Therearemanyexistingtoolsusedforvisualizingandanalyzingthegenomicsequences.Eachtoolisdevelopedbasedonsomespecifictaskswhichcanbecategorizedintofourapproaches,i.e.Basevector,Sequential,FourierTransform(FT)andZ-Curveapproaches.(1)Basevectorapproach:Hamori,E.andRuskin,J.(1983)representedDNAsequencesinathreedimensionalcurve(H-Curve)1.Gates,M.A.(1985)proposedthatgraphicalrepresentationofDNAsequenceintwodimensionalspacewasbetterthanH-Curve.Gatesgraphicalrepresentationshowsfournucleotidebases,i.e.adenine(A),thymine(T),cytosine(C),andguanine(G).TheunitvectorrepresentationsofthesebasesareontheCartesiancoordinatesystem,i.e.BaseAisonthenegativey-axis,baseTisonthepositivey-axis,baseGisonthepositivex-axis,andbaseCisonthenegativex-axis2.Aboutelevenyearslater,NandyA.(1996)proposedagraphicalrepresentationinordertodistinctthefeaturesofintronandexonsegmentsofeukaryoticsequences3.ThisgraphicalrepresentationwassimilartoGatesmethod.TheA,G,CandTnucleotidewasplottedonanACGT-axissystem.Theslopeofthisplotindicatedaclusterofintronandexonsequences.However,bothNandyandGatesmethodshavehighdegeneracysuchthatthesequencessuchasAGTC,AGTCA,andAGTCAGleadtothesamegraphicalrepresentation4.StephenS.T.Yauetal.,2003modifiedGatesmethod.Thefournucleicacidsareclassifiedintopyrimidine/purinegraphontwoquadrantsoftheCartesiancoordinatesystem.Thefirstquadrantrepresentspyrimidine(TandC),andtheforthquadrantrepresentspurine(AandG)4.Recently,theauthorsproposethegraphicalrepresentationespeciallyfortheDenguevirussequenceanalysisbasedonthecumulativeamountofaminoandketobases,calledUnitX5.(2)Sequentialapproach:Altschuletal.,1990developedtheBasicLocalAlignmentSearchTool(BLAST)program.Thisprogramisoneofthemostpopulartoolsforgenomicsequenceanalysis.Thistoolcanperformafastsimilaritysearch.Theprogramcomparesthesimilaritybetweenanytwosequencesanddisplaysthedifferencebetweenthesesequencesbycomparinginthebase-by-basebasis6.(3)FourierTransform(FT)approach:AnatassiouD.proposedthecolorspectrogramsofbiomolecularsequenceswhichisthetoolusedforvisualizationofthebiomolecularsequenceanalysis7,8.Spectrogramswhichcanrepresentthemagnitudeoftheshort-timeFouriertransform(STFT)isimplementedviathediscreteFouriertransform(DFT).AnalysisofthegenomicsequenceinfrequencydomainviatheFouriertransform(FT)usesthe3-periodicitypropertyforDNAcodingsequence.Thecolorspectrogramisdefinedbyusingthecolor:red,greenandblue.Eventhoughthismethodyieldsanimpressivegraphicalrepresentation,thecomputationalcomplexityisfairlyhigh.(4)Z-Curveapproach:ZhangC.T.etal.,1994suggestedapracticalvisualizationtoolcalledZ-Curve8-12.JamesJ.etaldevelopedthistoolinthepackagecalledMBEToolbox13.Accordingtotheassumptiononthecumulativecomponentsofthegenomicsequence,featuresobtainedfromZ-Curvecanbequicklyinterpreted,suchasthedistributionalongthesequenceofpurine/pyrimidinebases,amino/ketobases,strongH-bond/weakH-bond.SincethealgorithmofZ-Curveissimple,itcanbeappliedtoallgenomicsequencesregardlessofhowlongthosesequencesare.ThesimilarapproachwithZ-Curvecalled3DD-CurveispresentedbyZhangY.andTanM.(2008).ThisapproachcanbeviewedastheweightedversionofZ-Curve14.978-1-4244-4713-8/10/$25.002010IEEEThechoiceofselectingthegraphicalrepresentationcanvarybasedonthecharacteristicsofgenomicsequencesofinterest.Therefore,inthisfirstversionoftheproposedsoftware,Denguevirussequences(neucleotidesequences)areemployedtoverifythemeritoftheproposedsoftware.ThesoftwareiscalledCADViSTwhichstandsforClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool.ByemployingUnitXasthevisulizationtool,theproposedsoftwareissuitabletouseforintepretingtheDenguevirussequence.However,positioningofpartialDenguesequencesonDenguegenomewithUnitXrepresentationrequireshighcomputationalload.BLASTiswellknownastheefficientsearchingtool.However,visualizingtheresultsobtainedfromBLASTneedssomeimprovement.Therefore,inthispaper,weproposethesoftwarethatcombinesthemeritofbothBLASTandUnitX.TheproposedsoftwarecanefficientlysearchtheunknownportionofDenguevirussequencesandcansimultaneouslyillustrategraphicalrepresentationsoftheresultingsequences.Thispapercanbeorganizedasfollows.SectionIIintroducestheproposedvisualizationtool,calledCADViST.ThesoftwarearchitectureofCADViSTisdescribedinSectionIII.InSectionIV,thesimulationresultsoftheproposedsoftwareareshown.Finally,SectionVconcludesthepaper.II.CADVIST:THEPROPOSEDVISUALIZATIONTOOLClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool,orCADViST,isavisualizationtoolproposedespeciallyforanalyzingtheDenguevirussequences.AllcomponentsanddetailsofCADViSTcanbedescribedindetailsasfollows:A.BasicLocalAlignmentSearchTool(BLAST)BLASTprogramisdevelopedbyStephenF.AltchulandhiscoworkersattheNationalCenterforBiotechnologyInformation(NCBI).Itiswidelyusedforcalculatingthesequencesimilarity.BLASTworksthroughtheheuristicalgorithmtofindthebestpossibleresults.Itfindsthehomologoussequencesbylocatingshortmatchesbetweentwosequencestomakethesearchfast.SimilaritymeasurementtechniqueofBLASTusesstatisticaltheorytoassignascoringmatrixforallpossiblepairsofresiduesandproducetheExpectvalue(E-value)foreachalignmentpair.Thestand-aloneBLASTprogramsareprovidedasacompressedpackage.Thepackage,availableasBLASTinitialedarchivesforavarietyofcomputerplatform,isavailableontheBLASTftpsite:/blast/executables/release/.Inthispaper,weemployedstand-aloneBLASTversion2.2.22togenerateBLASToutput,asinputoftheproposedsoftware(CADViST).B.UnitXGraphicalRepresentationUnitXgraphicalrepresentationcanefficientlyrevealthedistributionofamino/ketobasesalongthesequenceontwoquadrantsoftheCartesiancoordinatesystem.Thefirstquadrantrepresentstheamountofamino(CandA)whilethefourthquadrantrepresentsamountofketo(TandG).Theunitvectorsrepresentfournucleotides,i.e.adenines(A),guanine(G),thymine(T),andcytosine(C),aredemonstratedasfollows(Fig.1):Figure1.TheUnitXvectorsrepresentfournucleotidesA,G,CandT.ByassigningthenumbersofoccurringofbasesA,C,G,andTinthesequences,thecoordinate(x,y)oftheprojectionontoXandYaxeswithUnitXrepresentationcanbeillustratedasfollows:nullnullnullnullnullnullnullnullnull2nullnullnullnullnullnull2nullnullnullC.IdeaofCADViSTInthispaper,weemployBLASTina“stand-alone”modetofindthesimilarityscoreamongthequerysequenceandtheDenguevirusnucleotidedatabase.ThesearchresultsobtainedfromBLASTaregraphicallydisplayedviaUnitXrepresentation.D.CreatingnucleotideBLASTdatabaseThemainadvantageofstand-aloneBLASTprogramistobeabletocreateyourowndatabase.TocreateanucleotideBLASTdatabase,weneedasourcefileofsequenceinFASTAformat.Thisfilewillbeprocessedbytheformatdbprogramcontainedwithinthestand-aloneBLASTpackagetobuildindexfilesofthedatabase.Afterexecutingformatdbcommand,threefileswillbeproducedfromthesourceFASTAfile.Fornucleotidedatabases,theextensionsarenhr,nin,andnsq15.Theformatdbcommandcanbeshownasfollows:formatdb-pF-iDatabaseName.fastaThesourceFASTAfilewillhavetheform:FirstsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSecondsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLastsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXwhereXsarenucleotidecodes(A,T,GorC).Inthispaper,thedatabaseoftheproposedsoftwareisobtainedfromNCBIwiththekeywordofDengueviruscompletegenome.All2,184nucleotidesequencescomposeoffourserotypesofDenguevirussequences(eachserotypecontains952,737,405and90nucleotidesequences,respectively).E.Stand-aloneexecutableBLASTThestand-aloneexecutableBLASTandNCBIweb-basedBLASTprogramprovideeasywaysforuserstoperformBLASTsearchviacommandlineorawebsite.TherearemanyadvantagestorunBLASTsearchprogramonyourownmachine,e.g.databasecanbeeasilyedited.Inthispaper,weemploystand-aloneBLASTprogramtogenerateBLASToutput.BLASTsearchcanbeexecutedviablastallcommandasfollow:blastall-pblastn-dDatabasename.fasta-iQuerySequence.fasta-m9-FFResult.txtF.GraphicalRepresentationviaUnitXInsteadyofdisplayingthesearchresultsinalphabets(Figs.4(b)and4(c)likeBLAST,CADViSTextractstheinformationfromBLASTandrepresentstheresultsgraphicallyviaUnitXrepresentationdescribedsectionIIB.Furthermore,inthecasethattheusersonlyneedtoexplorethenatureofDenguevirussequences,theycanalsoemployonlythegraphicalfeature(UnitX)ofCADViST.III.SOFTWAREARCHITECTUREOFCADVISTTodeveloptheuserfriendlyGUI,theproposedCADViSTsoftwareiswritteninC#programming.TheGUIofCADViSTcanbeshowninFig.3.Theinputfieldsforquerysequencecanbeeither(1)thetextfileinFASTAformator(2)textletterdirectlycopiedandputintotheblankspaceinFig.3.Oncetheinputisinserted,theprocessinsideCADViSTcanbesummarizedasfollows(Fig.2):Step1:Callstand-aloneBLASTprogramtogenerateBLASToutput,Step2:ExtractsequenceaccessionnumberandthecoordinatesofeachmatchedsequencefromBLASToutput,Step3:ProvidematchingregionsbetweenqueryandmatchedsequenceidentifiedbyBLASTprogramandsendtheresultstothedisplayunit,i.e.UnitXrepresentation.TheresultsareshowninFigs.4(d-e).Inaddition,otheroptionsofCADViSTarecopy,save,print,showpointvaluesinthegraphofUnitXvector.Theoptioncanbeselectedbymakingarightclickonthegraph.IV.SIMULATIONRESULTSAsanexample,weverifythemeritofCADViSTforfindingthesimilaritiesamongFN429899Denguevirusserotype3(20407143baseposition)andourDenguevirussequencedatabase.Traditionally,theresultsobtainedfromstand-aloneBLASTprogramconsistoftwomajorparts,i.e.(1)theone-linedescriptionsofeachdatabasesequencefoundtomatchthequerysequence(Fig.4(a),and(2)thealignmentbetweentheinputsequenceandthematchedquerysequences(Figs.4(b)-(c)16.Figs.4(b)and(c)illustratethefirstandsecondhighestscorematchedsequences,respectively.ByemployingtheinformationobtainedfromBLAST,Figs.4(d)-(e)representtheproposedgraphicalrepresentationviaCADViST.TheresultoftheproposedsoftwareconsistsoftwomainpartswhereeachpartdisplaysthegraphicalrepresentationviaUnitX.Thefirstpartshowsthewholegenomeofquerysequence(Fig.4(d).ThesecondpartdisplaysthematchedregionsbetweenthequeryandinputsequenceidentifiedbyBLAST(Fig.4(e).InFig.4(e),forconvenience,onlythefirst(FN429899)andsecond(AY858038)highestscoresmatchedsequencesareshown.Bothsequencesarealsofromthesameserotypeasourinputsequence.Asexpected,thefirsthighestscorematchedsequenceisthesequencethatwecopyitsportionasourquerysequence.Furthermore,accordingtoFigs.4(ac),wecanalsoobservethattheoutputofBLASTstilllacksofuserfriendlygraphicalrepresentation.Therefore,CADViSTcanefficientlybeoneofthealternativewaytovisualizetheresultingsequencesobtainedfromBLASTasshowninFigs.4(de).InFig.4(e),wecanobviouslyobservetheregionofthemismatchedbasepairs.TheresultoftheproposedsoftwarecanbedisplayedviathegraphoverlayingformattogetherwiththeUnitXrepresentationofthesequences(Fig.4(d).Figure2.FlowchartoftheproposedsoftwareFigure3.ScreenshotsoftheproposedsoftwareV.CONCLUSIONSInthispaper,wehavedevelopedthesoftwarecalledCADViST.TheproposedsoftwarecanbeusedtovisuallyanalyzethematchedregionsidentifiedbyBLASTbetweenthequerysequencesandtheDenguevirusdatabase.GraphicalrepresentationisimplementedviaUnitXwhichissuitableespeciallyforanalyzingdifferentserotypesofDenguevirusneocleotidesequences.ManyoptionsinCADViSTcanalsobenefitthebioinformaticsexperts,e.g.save,print,andshowtherawnumericvaluesonthegraph.WframworkofC#,CADViSTcanbeeasilymodifiedtoincludemoreopensourceorinhousedevelopedmathematicalmodeling,whilemaintainingtheuserfriendlyGUI.REFERENCES1E.HamoriandJ.Ruskin,“Hcurves,anovelmethodofrepresentationofnucleotideseriesespeciallysuitedforlongDNAsequences”.TheJournalofBiologicalChemistry,vol.258(2),1983,pp.1318-1327.2M.A.Gates,“SimpleDNAsequencerepresentations”,Nature,vol.316,1985.3A.Nandy,“Two-dimensionalgraphicalrepresentationofDNAsequencesandintron-exondiscriminationinintron-richsequences”.Bioinformatics,vol.12(1),1996,pp.55-62.4S.-T.Yau,J.Wang,A.Niknejad,C.Lu,N.JinandY-K.Ho,“DNAsequencerepresentationwithoutdegeneracy”,NucleicAcidsResearch,vol.31(12),2003,pp.3078-3080.5B.Viriyasaksathian,Y.WongsawatandP.Suriyaphol,“UnitX:Denguevirussequencegraphicalrepresentationforserotypesclassification”,ISBME2009,Bangkok,Thailand.6S.F.Altschul,W.Miller,E.MyersandD.J.Lipman,“Basiclocalalignmentsearchtool”,JournalofMolecularBiology,vol.215(3),1990,pp.403-410.7J.Ye,S.McGinnisandT.L.M

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論