版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、生物信息軟件綜合實(shí)踐第五章多序列對(duì)位排列和進(jìn)化分析多序列對(duì)位排列Multiple Sequence Alignment (MSA)chicken xenopus human monkey dog hamster bovineguinea pigPLVSS-PLRGEAGVLPFQQEEYEKVKRGIVEQCCHNTCS ALVSG-PQDNELDGMQLQPQEYQKMKRGIVEQCCHSTCS LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICS PQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICS LQVRDVELAGAPGE
2、GGLQPLALEGALQKRGIVEQCCTSICS PQVAQLELGGGPGADDLQTLALEVAQQKRGIVDQCCTSICS PQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSPQVEQTELGMGLGAGGLQPLALEMALQKRGIVDQCCTGTCTN S N N NNESBring the greatest number of similar characters into the same column of the alignmentYCN YCN YCN YCN YCN YCN NYCNYCNLYQLE LFQLE LYQLE LYQLE L
3、YQLE LYQLE LYQLRHQLQ為什么要做MSA?用于描述一組序列之間的相似性關(guān)系,以便了解一個(gè)基因 的基本特征,尋找motif,保守區(qū)域等。用于預(yù)測(cè)新序列的二級(jí)和三級(jí)結(jié)構(gòu),進(jìn)而推測(cè)其生物學(xué)功能。Human Hox genes為什么要做MSA?Nature 423, 241-254不同種的酵母Gal1和Gal10 啟動(dòng)子區(qū)MSA為什么要做MSA?用于描述同源序列之間的親緣關(guān)系的遠(yuǎn)近,應(yīng)用到分子進(jìn)化分析中。是構(gòu)建分子進(jìn)化樹(shù)的基礎(chǔ)。AaSpecies treeGene treeBbCcWe often assume that gene trees give us species trees
4、注意概念:Paralogy(旁系同源/并系同源)& Orthology(直系同源)Paralogy(旁系同源/并系同源)& Orthology(直系同源)Orthologs:物種形成過(guò)程中源自同一祖先,通常功能保守Paralogs:基因組內(nèi)基因復(fù)制產(chǎn)生,較易發(fā)生功能分化為什么要做MSA?不同物種基因組范圍的MSA能分析基因組結(jié)構(gòu)變異和共線(xiàn)性Nature 423, 241-254為什么要做MSA?Contig assembly怎么做MSA? 動(dòng)態(tài)規(guī)劃算法(dynamic programming):MSA 改進(jìn)算法(啟發(fā)式算法):1. 漸進(jìn)法(progressive methods):Clusta
5、l, T-Coffee,MUSCLE2. 迭代法(iterative methods):PRRP, DIALIGN3. 其它算法:Partial Order Algorithm、profile HMM、meta-methods (MAFFT)/wiki/List_of_sequence_alignment_softwareCurrent Opinion in Structural Biology 2006, 16:368373兩條及三條序列的動(dòng)態(tài)規(guī)劃算法SAAN SVSNSStartVSN S SNAS A五條長(zhǎng)度為200-250aa的蛋白質(zhì)序列使用
6、動(dòng)態(tài)規(guī)劃比對(duì)需要運(yùn)算超過(guò)12小時(shí)Clustal使用方法u Clustal:目前被最廣泛應(yīng)用的MSA 方法u 可在線(xiàn)分析u 可在本地計(jì)算機(jī)運(yùn)行u 序列輸入、輸出格式InputFASTANBRF/PIR EMBL/SWISSPROT ALNGCG/MSF GCG9/RSF GDEOutputALN NBRF/PIR GCG/MSF PHYLIP NEXUS GDE/FASTAsequence 1ATTGCAGTTCGCA sequence 2 ATAGCACATCGCAsequence 3 ATGCCACTCCGCCClustal W/X算法基礎(chǔ)兩 兩 比 對(duì) 構(gòu)建距離矩陣構(gòu)建指導(dǎo)樹(shù)(guide
7、tree)將距離最近的兩條序列用動(dòng)態(tài)規(guī)劃的算法進(jìn)行比對(duì);“漸進(jìn)”的加上其他的序列“漸進(jìn)”比對(duì)(Progressive alignment)粘貼或上載序列u Clustal在線(xiàn)分析方法(ClustalW)EBI的ClustalW分析網(wǎng)頁(yè)http:/www.ebi.ac.uk/Tools/msa/clustalw2/幫助文檔 http:/www.ebi.ac.uk/Tools/msa/clustalw2/help/調(diào)整參數(shù)u Clustal在線(xiàn)分析方法(ClustalW)u Clustal離線(xiàn)分析方法(ClustalX) 下載安裝自帶Help文件Using ClustalX for multipl
8、e sequence alignmentby Jarno Tuimala兩種工作模式: Multiple Alignment Profile Alignment第一步:輸入序列FileLoad sequences注意:該軟件不能識(shí)別中文。因此序列不能位于XP系統(tǒng)的桌面, 應(yīng)放于C:或D:等純英文路徑下。 第二步:設(shè)定比對(duì)參數(shù)第三步:進(jìn)行序列比對(duì)第四步:比對(duì)完成,選擇結(jié)果文件的保存格式conserved residuesconservation profileu 可進(jìn)一步對(duì)排列好的序列進(jìn)行修飾(1)Boxshade突出相同或相似位點(diǎn)(/softwar
9、e/BOX_form.html)在EBI ClustalW結(jié)果網(wǎng)頁(yè)復(fù)制序列比對(duì)結(jié)果在“Boxshade”網(wǎng)頁(yè)粘貼序列,在“Input sequence format”欄目選擇“ALN”,在“Output format”欄目選擇“RTF_new”在結(jié)果網(wǎng)頁(yè)點(diǎn)擊“here is your output number 1” 修飾過(guò)的排列結(jié)果u 可進(jìn)一步對(duì)排列好的序列進(jìn)行修飾(2)ESPript多種修飾 功能,突出相同或相似位點(diǎn)http:/espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi在EBI ClustalW結(jié)果網(wǎng)頁(yè)下載“Alignment file”(ALN
10、文件)在ESPript分析網(wǎng)頁(yè)“Aligned Sequences”欄上載ALN文件在“Output layout”和“Output file or device”欄選擇 修飾后的比對(duì)結(jié)果u 可進(jìn)一步對(duì)排列好的序列進(jìn)行修飾(3)GeneDoc/gfx/genedocFile Import選擇輸入文件的格式(如ALN)修飾排列結(jié)果2. 系統(tǒng)發(fā)生分析(Phylogenetic analysis)u 分析基因或蛋白質(zhì)的進(jìn)化關(guān)系u 系統(tǒng)發(fā)生(進(jìn)化)樹(shù)(phylogenetic tree)A tree showing the evolutionary relatio
11、nships among various biological species or other entities that are believed to have a common ancestor.研究系統(tǒng)發(fā)生的方法經(jīng)典進(jìn)化生物學(xué):比較:形態(tài)、生理結(jié)構(gòu)、化石分子進(jìn)化生物學(xué):比較DNA和蛋白質(zhì)序列An Alignment is an hypothesis of positional homology between bases/Amino AcidsResidues that are lined up in different sequences are considered to sha
12、re a common ancestry (i.e., they are derived from a common ancestral residue).Easyonly with substitutionsDifficultalso with indels系統(tǒng)發(fā)生樹(shù)術(shù)語(yǔ)末端節(jié)點(diǎn)分支BranchA可以是物種,群體,或者蛋白質(zhì)、DNA、RNA分子等OTUB節(jié)點(diǎn)NodeCD祖先節(jié)點(diǎn)/ 樹(shù)根RootE內(nèi)部節(jié)點(diǎn)/分歧點(diǎn)該分支可能的祖先HTU= (A, (B,C), (D, E)Newick format系統(tǒng)發(fā)生樹(shù)術(shù)語(yǔ)A clade(進(jìn)化支) is a group of organisms thati
13、ncludes an ancestor and all descendents of that ancestor.分支樹(shù)Cladogram時(shí)間度量樹(shù)Ultrametric tree進(jìn)化樹(shù)Phylogram6Taxon BTaxon BTaxon CTaxon BTaxon C11Taxon C31Taxon ATaxon ATaxon A5Taxon DTaxon DTaxon Dgenetic changeno meaningtime系統(tǒng)發(fā)生樹(shù)術(shù)語(yǔ)進(jìn)化樹(shù)分支的長(zhǎng)度Scaled branches : the length of the branch is proportional to the
14、 number of changes.The distance between 2 species is the sum of the length of all branches connecting them.系統(tǒng)發(fā)生樹(shù)術(shù)語(yǔ)Rooted tree vs. Unrooted tree無(wú)根樹(shù)AC有根樹(shù)DBtwo major ways to root trees:By midpoint or distanced (A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9A10C322BD5outgroup外群、外圍支系統(tǒng)發(fā)育樹(shù)構(gòu)建步驟多UPGMA最大簡(jiǎn)約法(maxi
15、mum parsimony, MP)鄰近法距離法(distance)最大似然法(Neighbor-joining, NJ)最小進(jìn)化法(minimum evolution)(maximum likelihood, ML)貝葉斯法(Bayesian inference)統(tǒng)計(jì)分析BootstrapLikelihood Ratio Test進(jìn)化樹(shù)評(píng)估建立進(jìn)化樹(shù)選擇建樹(shù)方法(替代模型)序列比對(duì)(自動(dòng)比對(duì)、手工校正) 距離法距離法又稱(chēng)距離矩陣法,首先通過(guò)各個(gè)序列之間的比較,根據(jù)一定的假設(shè)(進(jìn)化距離模型)推導(dǎo)得出分類(lèi)群之間的進(jìn)化距離,構(gòu)建一個(gè)進(jìn)化距離矩陣。進(jìn)化樹(shù)的構(gòu)建則是基于這個(gè)矩陣中的進(jìn)化距離關(guān)系 。計(jì)算
16、序列的距離,建立距離矩陣Rat通過(guò)距離矩陣建進(jìn)化樹(shù)CowCatCatDogRat21 1DogRat34524Cow676DogStep1. 計(jì)算序列的距離,建立距離矩陣對(duì)位排列, 去除空格(選擇替代模型)Uncorrected “p” distance(=observed percent sequence difference)Kimura 2-parameter distance(estimate of the true number of substitutions between taxa)Step2. 通過(guò)矩陣建樹(shù)由進(jìn)化距離構(gòu)建進(jìn)化樹(shù)的方法有很多,常見(jiàn)有:1. Unweighted
17、Pair Group Method with Arithmetic mean (UPGMA)2. Neighbor-Joining Method (NJ法/鄰位連接法)3.Minimum Evolution (MP法/最小進(jìn)化法)最大簡(jiǎn)約法 (Maximum Parsimony)最大簡(jiǎn)約法(MP)最早源于形態(tài)性狀研究,現(xiàn)在已經(jīng)推廣到分子序列的進(jìn)化分析中。最大簡(jiǎn)約法的理論基礎(chǔ)是奧卡姆(Ockham)哲學(xué)原則,對(duì)所有可能的拓?fù)浣Y(jié)構(gòu)進(jìn)行計(jì)算,找出所需替代數(shù)最小的那個(gè)拓?fù)浣Y(jié)構(gòu),作為最優(yōu)樹(shù)。Find the tree that explains the observed sequences with a
18、 minimal number of substitutionsMP法建樹(shù)流程PositionT T AAG A GAC C GGSequence1 Sequence2 Sequence3Sequence4If 1 and 2 are grouped a total of four changes are needed.If 1 and 3 are grouped a total of fivechanges are needed.If 1 and 4 are grouped a total of six changes are needed.Position 3(1,2): 1 change
19、;(1,3) or (1,4): 2 changesPosition 2(1,3): 1 change;(1,2) or (1,4): 2 changesPosition 1(1,2): 1 change;(1,3) or (1,4): 2 changes123MP法建樹(shù)步驟654BEST 最大似然法 (Maximum Likelihood)最大似然法(ML) 最早應(yīng)用于對(duì)基因頻率數(shù)據(jù)的分析上。其原理為選取一個(gè)特定的替代模型來(lái)分析給定的一組序列數(shù)據(jù),使得獲得的每一個(gè)拓?fù)浣Y(jié)構(gòu)的似然率都為最大值,然后再挑出其中似然率最大的拓?fù)浣Y(jié)構(gòu)作為最優(yōu)樹(shù)。CCAGATATGCGCML法建樹(shù)流程Inferring
20、 the maximum likelihood treePick an Evolutionary ModelFor each position, Generate all possible tree structuresBased on the Evolutionary Model, calculate Likelihood of these Trees and Sum them to get the Column Likelihood for each OTU cluster.Calculate Tree Likelihood by multiplying the likelihood fo
21、r each positionChoose Tree with Greatest Likelihood構(gòu)建進(jìn)化樹(shù)的新方法貝葉斯推斷(Bayesian inference)Holder&Lewis (2003) Nature Reviews Genetics 4, 275-284Bayesian inference:Maximum Likelihood:What is the probability that the model/theory is correct given the observed data?What is the probability of seeing the obse
22、rved data (D) given a model/theory (T)?Pr(T|D)Pr(D|T)與ML相比,BI的優(yōu)勢(shì): Speed No need for bootstrappingComparison of MethodsDistanceMaximumparsimonyMaximum likelihoodUses only pairwise distancesUses only shared derived charactersUses all dataMinimizes distance between nearest neighborsMinimizes totaldista
23、nceMaximizes tree likelihood given specific parameter valuesVery fastSlowVery slowEasily trapped in local optimaAssumptions fail when evolution is rapidHighly dependent on assumed evolution modelGood for generating tentative tree, or choosing among multiple treesBest option when tractable (30 taxa,
24、homoplasy rare)Good for very small data sets and for testing trees built using other methodsChoosing a Method for Phylogenetic PredictionMolecular Biology and Evolution2005 22(3):792-802Bioinformatics: Sequence and Genome Analysis, 2nd edition, by David W. Mount./cgi/cont
25、ent/full/2008/5/pdb.ip49p254 評(píng)估進(jìn)化樹(shù)的可靠性自展法(bootstrapping method)A statistical technique that uses intensive random resampling of data to estimate a statistic whose underlying distribution is unknown. 從排列的多序列中隨機(jī)有放回的抽取某一列, 構(gòu)成相同長(zhǎng)度的新的排列序列 重復(fù)上面的過(guò)程,得到多組新的序列 對(duì)這些新的序列進(jìn)行建樹(shù),再觀(guān)察這些樹(shù)與原始樹(shù)是否有差異,以此評(píng)價(jià)建樹(shù)的可靠性The Bootstr
26、ap Computational method to estimate the confidence level of a certain phylogenetic tree.Pseudo sample 10011222345Sample0123456789GAGGCTTATCrat human turtle fruitfly oakduckweedGGAAGGGGCT GGTTGGGGCT GGTTGGGCCC CCTTCCCGCC AATTCCCGCTAATTCCCCCTrathuman turtle fruitfly oakduckweedGTGGCTTATC GTGCCCTATG CT
27、CGCCTTTG ATCGCTCTTGATCCCTCCGGPseudo sample 24455567778CCTTTTAAATCCTTTTAAATrathumanrat human turtle fruit fly oakduckweedturtle fruitfly oakduckweedCCCCCTAAAT CCCCCTTTTT CCTTTCTTTTCCTTTCCCCGMore replicates (between 100 -1000)Inferred tree自展法檢驗(yàn)流程Bootstrapping doesnt reallyassess the accuracy of a tree
28、,only indicates the consistency othe data對(duì)ML法而言,自展法太耗時(shí),可用aLRT法檢驗(yàn)進(jìn)化樹(shù)的可靠性Anisimova&Gascuel (2006)Syst. Biol. 55(4):539-552u 看圖工具TreeView進(jìn)化樹(shù)編輯打印軟件(在http:/taxonomy.zoology.gla.ac.uk/rod/treeview.html)EBI的ClustalW2-phylogeny分析網(wǎng)頁(yè)輸入比對(duì)后的序列(或上載ALN文件)下載“Phylip tree file”(ph文件)用TreeView軟件打開(kāi)上述文件可以不同格式展示進(jìn)化樹(shù)(1、2、
29、3)分子進(jìn)化分析軟件PHYLIP/phylip.html免費(fèi)的集成進(jìn)化分析工具PAUP/商業(yè)軟件,集成的進(jìn)化分析工具M(jìn)EGA/免費(fèi)的圖形化集成進(jìn)化分析工具,最新版包括了MLPHYMLhttp:/atgc.lirmm.fr/phyml/最快的ML建樹(shù)工具PAMLhttp:/abacus.gene.ucl.ac.uk/software/paml.htmlML建樹(shù)工具Tree-puzzlehttp:/www.tree-puzz
30、le.de/較快的ML建樹(shù)工具M(jìn)rBayes/基于貝葉斯方法的建樹(shù)工具更多工具/phylip/software.htmlu 分子進(jìn)化樹(shù)構(gòu)建方法/提供最大似然法(ML)、最大簡(jiǎn)約法(MP)和距離法三種建樹(shù)方法。其中距離法包括鄰接法(NJ)、最小進(jìn)化法(ME)和UPGMA三種算法。優(yōu)點(diǎn):圖形界面,集序列查詢(xún)、比對(duì)、進(jìn)化樹(shù)構(gòu)建為一體,幫助文件詳盡,免費(fèi)缺點(diǎn):ML法較慢(如序列較多可考慮用PHYML)最新版本(MEGA6)Pig gi
31、|218855168|gb|ACL12051.1| FAD24 pr9298Cattle gi|146186885|gb|AAI40653.1| NOC3L100Human gi|18389433|dbj|BAB84194.1| AD24 HMouse gi|18389431|dbj|BAB84193.1| AD24 MChicken gi|118092837|ref|XP 421670.2| PRZebrafish gi|50838808|ref|NP 001002863.10.02OsDR10OsDR10-O.rufipogonA分析舉例87 OsDR10-9311AOsDR10-Nipp
32、onbareAOsDR10-O.rufipogonB91OsDR10-Nackdong OsDR10-9311BOsDR10-NipponbareB5289OsDR10-O.punctataOsDR10-O.latifolia95OsDR10-O.australiensisOsDR10-L.tisserantii95OsDR10-L.JX0.005Phylogenetic analysis of the coding regions of OsDR10 and its homologs from different species. The tree was constructed by ne
33、ighbour-joining method. The numbers for interior branches indicate the bootstrap values (%) for 1,000 replications. The scale at the bottom is in units of number of nucleotide substitutions per site.Xiao et al. PLoS ONE 4:e4603 (2009)MSA是構(gòu)建分子進(jìn)化樹(shù)的關(guān)鍵步驟MSA程序可對(duì)任何序列進(jìn)行比對(duì),選擇什么樣的序列進(jìn)行比對(duì)非常重要!用于構(gòu)建進(jìn)化樹(shù)的序列必須是同源序列
34、3.上機(jī)操作1. 在基因重組人胰島素面市之前,糖尿病患者所需胰島素主要來(lái) 自屠宰場(chǎng)的動(dòng)物胰臟。請(qǐng)分析來(lái)源自豬、牛和羊的胰島素哪一種最適于人使用,說(shuō)明理由。四種蛋白的注冊(cè)號(hào)分別是AAA59172(人), AAQ00954(豬),AAA30722(牛)和P01318(羊)。2. Keratin是一種微管蛋白,有type I 和 type II兩種類(lèi)型,在染色體上成簇分布,對(duì)上皮細(xì)胞的正常結(jié)構(gòu)十分重要。請(qǐng)根據(jù)人類(lèi)type II keratin 2p(CAD91891)對(duì)NCBI Homo sapiens RefSeqprotein序列數(shù)據(jù)庫(kù)的BLASTP檢索結(jié)果(/Blast.cgi?CMD=Get&RID=HH241 XTA014),下載人
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 人教版地理八年級(jí)下冊(cè)8.1《自然特征與農(nóng)業(yè)》聽(tīng)課評(píng)課記錄1
- 湘教版數(shù)學(xué)八年級(jí)上冊(cè)2.5《全等三角形及其性質(zhì)》聽(tīng)評(píng)課記錄1
- 人教版數(shù)學(xué)九年級(jí)上冊(cè)聽(tīng)評(píng)課記錄21.2.3《因式分解法》
- 生產(chǎn)設(shè)備技術(shù)轉(zhuǎn)讓協(xié)議書(shū)(2篇)
- 環(huán)保保潔服務(wù)協(xié)議書(shū)(2篇)
- 蘇科版數(shù)學(xué)七年級(jí)下冊(cè)12.3《互逆命題》聽(tīng)評(píng)課記錄1
- 部編版八年級(jí)道德與法治下冊(cè)第四課《公民義務(wù)》第1課時(shí)《公民基本義務(wù)》聽(tīng)課評(píng)課記錄
- 【部編人教版】八年級(jí)上冊(cè)歷史聽(tīng)課評(píng)課記錄 第18課 從九一八事變到西安事變
- 浙教版數(shù)學(xué)七年級(jí)下冊(cè)1.3《平行線(xiàn)的判定》聽(tīng)評(píng)課記錄2
- 2025年超低頻傳感器標(biāo)定系統(tǒng)合作協(xié)議書(shū)
- 鋰硫電池介紹
- (高職)旅游景區(qū)服務(wù)與管理電子課件(全套)
- DB50∕T 959-2019 營(yíng)運(yùn)高速公路施工管理規(guī)范
- RBA培訓(xùn)教材系列02RBA商業(yè)道德政策培訓(xùn)針對(duì)員工
- 高中研究性課題-------食品添加劑
- 弟子規(guī)全文拼音版打印版
- 變電站設(shè)備驗(yàn)收管理標(biāo)準(zhǔn)規(guī)范
- 鍋爐房危害告知卡
- 江西省農(nóng)村信用社(農(nóng)商銀行)
- 陳子性藏書(shū)卷七
- NPI流程管理分解
評(píng)論
0/150
提交評(píng)論