版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
1、5. Phylogenetic Tree5.1 Genetic polymorphism and phylogenetic tree5.2 Construction of phylogenetic tree生物信息學(xué)(第二版)(樊龍江主編,2021)配套PPT5達(dá)爾文費(fèi)希爾(Fisher)、萊特(Wright)和霍爾丹(Haldane)(群體遺傳學(xué)三巨頭)馬萊科特(Malecot)、科克漢姆(Cockerham)等木村資生(Kimura)、根井根正(M. Nei)等WrightFisherHaldane Great trinity歷史與人物5.1 Genetic polymorphism an
2、d phylogenetic treePolymorphism in the genomesIntroduction about treeTypes of polymorphismSingle nucleotide polymorphism (SNP)Insertion/deletion (indel)Copy-number variation (CNV)Frame-shiftPresence and absence variation (PAV)Polymorphism in the genomesMeasure of polymorphism (Tajima 1983): the aver
3、age pairwise nucleotide diversity (Watterson 1975): Wattersons estimator. Number of separating sites per nucleotide siteindica_1 ATG CGG GAT CCA TTC CTT AAT GAG TTT CCT AAA ACG GTG CAC GGT TTT indica_2 ATG TGG GAT CCA TTC CTT AAT GAG TTT CCC GAA ACG GTG CAC GGT TTT indica_3 ATG TGG GAT CCA TTC CTT A
4、AT GAG TTC CCT GAA ACG GTG CAC GGT TTT joponica_1 ATG TGG - CCA TTC CTT AAT GAG TTC CCT GAA ACC GTG CAC GGT TTT joponica_2 ATG CGG - CCA TTG CTT AAT GAG TTC CCT GAA ACC GTG CAC GGT TTT joponica_3 ATG TGG GAT CCA TTG CTT AAT GAG TTC CCT GAA ACC GTG CAC GGT TTT rufipogon_1 ATG TGG GAT CCA TTC CTT AAT
5、GAG TTC CCT GAA ACG GTG CAC GGT TTT rufipogon_2 ATG TGG GAT CCA TTG CTT AAT GAG TTC CCT GAA ACC GTG CAC GGT TTT rufipogon_3 ATG TGG GAT CCA TTG CTT AAT GAG TTC CCT GAA ACC GTG CAC GGT TTTO.Nivara ATG TGG GAT CCA TTC CTT AAC GAG TTC CCT GAA ACG GTG CAC GGT TTTmnWhy do a phylogenetic analysis?Importan
6、t for deciphering relationships in gene function and protein structure and function in different organismsHelps to utilize genetic information of a model organism to analyze a second organismHelps to sort out gene family relationshipsValuable tool for tracing the evolutionary history of genes Evalua
7、ting sequence relationships sequence A ERKSIQDLFQSFTLFERRLLIEFsequence B ERLSISELIGSLRLYERRLIIEYsequence C DRKSISDLIGSLRLALLIEFsequence D DRKDLISSLRKALLIEW1. Account for all column variations |A,B and C,D form similar groups based on col. 1 |A,C,D based on col. 32. Count differences between sequence
8、s A,B 17/23 similar, 6/23 different C,D 21/23 similar, 2/23 differentWhat is a tree?A graphical representation of the sequence similarities among a group of nucleic acid or protein sequencesFor example: number of differences between 3 sequences may be represented by . A B CA 9 7B 127ABC52Tree of lif
9、ePhylogenetic tree (dendrogram)Nodes: branching pointsBranches: linesTopology: branching pattern Branches can be rotated at a node, without changing therelationships among the OTUs.Rooted: unique path from root.Unrooted: degree of kinship, no evolutionary path.Assumptions about rate of change in bra
10、nches of tree May assume constant rate of change throughout tree - then branch length is proportional to no. of changes and we can easily root the tree using a simple algorithm May have variable rateno ancestor sequenceancestor sequenceTree reconstruction as optimizationNumber of possible trees is 2
11、S. Considering branch lengths makes the problem harder.Want the tree that maximizes some quality score. Score based on eitherCharacters (e.g. sites in a sequence) directly, or Distances between character sets (e.g. alignment scores)Number of possible phylogenetic trees3 OTUs: 1 unrooted tree3 rooted
12、 trees4 OTUs: 3 unrooted trees15 rooted trees.實(shí)用分類單位(operational taxonomic units,OTU) TYPES OF TREESNewick (shorthand) format- text based representation of relationships.5.2 Construction of phylogenetic treeDistance methodsProbabilistic methods: maximum likelihood (ML)目前應(yīng)用最為廣泛方法The flowchart for phy
13、logenetic predictionA msa Is there strong sequence similarity?Is there clearly recognizable sequence similarity?Maximum parsimony (MP)Distance methodMLyesyesnonoBecause the maximum parsimony method has to attempt to fit all possible trees to the data, the method is not suitable for more than 12 sequ
14、ences because there are too many trees to test.Maximum likelihood methods are particularly useful when the sequences are more variable. However, they are computationally intense.ML can test diverse evolutionary models for a best tree 5.2.1 Distance methodsBasic idea: Employs the number of changes be
15、tween each pair in a group of sequences to produce a phylogenetic tree of the group;For phylogenetic analysis, the distance score between two sequences is used. This score between two sequences is the number of mismatched positions in the alignment or the number of sequence positions that must be ch
16、anged to generate the other sequence. Gaps may be ignored in these calculations or treated like substitutions. When a scoring or substitution matrix is used, the calculation is slightly more complicated, but the principle is the same. Theory of gene duplicationHomoplasy: Sequence similarity NOT due
17、to shared ancestry, such as convergent or parallel evolution; horizontal transmissionACTGAACGTAACGCACTGACTACGGTAAACTCGCACA 單一替換 (single substitution)TGA 多重替換 (multiple substitutions)ACA 同義替換 (coincidental substitutions)GTA 平行替換 (parallel substitutions)AAT 趨同替換 (convergent substitution)CGCTC 反轉(zhuǎn)替換 (ba
18、ck substitution)祖先序列 后代1 后代2DistanceConverting sequence similarity to distance scoresCorrection of distance between nucleic acid sequences for multiple changes and reversions遺傳模型和序列距離 Judes-Cantor單參數(shù)模型 (1969)Kimura兩參數(shù)模型(木村資生,1980):轉(zhuǎn)換(transition,):嘌呤間或嘧啶間的替換;顛換(transversion,): 嘌呤和嘧啶間的替換。在大多數(shù)DNA片段中, 轉(zhuǎn)
19、換出現(xiàn)的頻率高于顛換。Motoo KimuraGenerally, genetic distance is thought of as related to the time and requires a genetic model specifying the processes such as mutation and drift causing the populations to diverge. Judes和Cantor(1969)提出了DNA序列距離K計(jì)算公式: 其中q為經(jīng)過t世代祖先序列的趨異變化后同源DNA序列中具有相同堿基的概率,為堿基替換頻率。 序列距離Kimura在其兩參
20、數(shù)模型下證實(shí),由于趨異變化,由轉(zhuǎn)換造成差異(型變化)或由顛換造成差異(型變化)的堿基,隨時(shí)間而變化。如果k2是單位時(shí)間堿基替換的總頻率 ,則 同義與非同義替換 在蛋白質(zhì)編碼基因中,突變后仍為同義密碼子的核苷酸替換稱為同義或沉默替換(synonymous/silent substitution),而導(dǎo)致非同義密碼子的替換,稱為非同義或氨基酸更換替換(nonsynonymous/amino acid replacement substitution)。另外,導(dǎo)致形成終止密碼子的突變,稱為無義突變(nonsense mutation). DNA序列距離K又可稱為DNA序列間的分歧度,即序列間相異性的
21、一個(gè)指標(biāo)。蛋白質(zhì)編碼序列的分歧度分為兩序列同義變化的分歧度(KS)和非同義變化的分歧度(KA) 。根據(jù)Jukes-Cautor單參數(shù)模型和Kimura兩參數(shù)模型等遺傳模型,可以分別計(jì)算得到兩序列的分歧度(或稱為蛋白質(zhì)序列間的距離)。 分歧度(sequence divergence)How to construct a tree: A simple exampleMethods for tree constructionUPGMA (Unweighted pair group method using arithmetic mean)Neighbor-joining (NJ) distance
22、methodFitch-Margoliash methodMinimum evolution (ME)UPGMAUnweighted pair group method using arithmetic mean (非加權(quán)平均連接聚類法 ).A form of agglomerative clustering (successive pairing of closest nodes)Method:Let initial organisms be leaves of the tree.Iteratively add a parent to the closest pair of existing
23、 nodesDefine the distance from an internal node to be the mean distance to its children.實(shí)例: 線粒體DNA序列 5個(gè)線粒體序列的差異核苷酸數(shù)(對角線下)和Jukes- Cantor距離(對角線上) 人類(hu)黑猩猩(ch)大猩猩(go)猩猩(or) 長臂猿(gi)人類(hu)-0.0150.0450.1430.198黑猩猩(ch)1-0.0300.1260.179大猩猩(go)32-0.0920.179猩猩(or)986-0.179長臂猿(gi)12111111- (hu-ch)goorgihu-ch0
24、.0370.1350.189go0.179or0.179gi最近的距離是人類和黑猩猩之間的,將它們合并為一個(gè)類。其它序列與這個(gè)新類之間的距離就是該序列到新類各成員間的平均距離: 其中人類黑猩猩(hu-ch)與大猩猩(go)之間的距離最小。將它們合并為一類。新距離為: (hu-ch-go)orgi(hu-ch-go)0.1210.185or0.179gi平均連接聚類法系統(tǒng)樹 人黑猩猩大猩猩猩猩長臂猿0.092 0.060 0.019 0.007Neighbor Joining Method (鄰接法)Another agglomerative clustering methodMethod: L
25、et A be active set, initially all leaves (organisms)Let r(i) be the average distance from i to all other leavesDefine D(i,j) = d(i,j) (r(i) + r(j)Pick i,j with smallest D(i,j). Create parent k of i and j.Set d(i,k) = d(i,j)+r(i)-r(j), and d(j,k) = d(i,j) - d(i,k)For all other m A, set d(k,m) = d(i,m
26、)+d(j,m) d(i,j)Replace i and j in A with k. Repeat until |A| = 1. 根井正利(Masatoshi Nei)(1931-)鄰接法的一般步驟計(jì)算第i終端節(jié)點(diǎn)(即分類單位i)的凈分歧度ri 其中N為終端節(jié)點(diǎn)數(shù),dik為節(jié)點(diǎn)i和節(jié)點(diǎn)k之間的距離,有dik=dki計(jì)算并確定最小速率校正距離(rate-corrected distance) Mij: 定義一個(gè)新節(jié)點(diǎn)u,u節(jié)點(diǎn)由節(jié)點(diǎn)i和j組合而成。節(jié)點(diǎn)u與節(jié)點(diǎn)i和j的距離為: 節(jié)點(diǎn)u與系統(tǒng)樹其它節(jié)點(diǎn)k的距離為: 從距離矩陣中刪除列節(jié)點(diǎn)i和j的距離,N值(總節(jié)點(diǎn)數(shù))減去1如果尚余2個(gè)以上終端節(jié)點(diǎn)
27、,返回到步驟繼續(xù)計(jì)算,直至系統(tǒng)樹完全建成。 同樣以線粒體DNA為例鄰接法計(jì)算線粒體序列的距離dij (上對角線部分) 和Mij (下對角線部分)huchgoorgi凈分歧度j=1j=2j=3j=4j=5rihui=10.0000.0150.0450.1430.1980.401chi=2-0.2350.0000.0300.1260.1790.350goi=3-0.204-0.2020.0000.0920.1790.346ori=4-0.171-0.171-0.2030.0000.1790.540gii=5-0.181-0.183-0.181-0.2460.0000.735第一步,or和gi之間的
28、Mij值最小,則它們用節(jié)點(diǎn)1取代,進(jìn)入第2步,則新節(jié)點(diǎn)(節(jié)點(diǎn)1)到這二個(gè)節(jié)點(diǎn)的距離為: huchgo節(jié)點(diǎn)1j=1j=2j=3j=4rihui=10.0000.0150.0450.0810.141chi=2-0.1100.0000.0300.0630.108goi=3-0.086-0.0840.0000.0460.121節(jié)點(diǎn)1i=4-0.085-0.086-0.1100.0000.190go節(jié)點(diǎn)1節(jié)點(diǎn)2j=1j=2j=3rigoi=10.0000.0460.0300.076節(jié)點(diǎn)1i=2-0.1410.0000.0650.111節(jié)點(diǎn)2i=3-0.141-0.1410.0000.095go節(jié)點(diǎn)3j=
29、1j=2goi=10.0000.005節(jié)點(diǎn)3i=20.000人黑猩猩大猩猩猩猩長臂猿0.092 0.060 0.019 0.007Parsimony: General scientific criterion for choosing amongcompeting hypotheses that states that we should acceptthe hypothesis that explains the data most simply andefficiently.Maximum parsimony method of phylogeny reconstruction:The o
30、ptimum reconstruction of ancestral character states isthe one which requires the fewest mutations in the phylogenetictree to account for contemporary character states.5.2.2 Maximum parsimony methods(簡約法)First step : Identify all of the informative sitesInvariant: all OTUs possess the same character
31、state at the site. Any invariant site is uninformative.對于DNA序列,信息位點(diǎn)是指那些至少存在2個(gè)不同的堿基且每個(gè)不同堿基至少出現(xiàn)兩次的位點(diǎn)。只有一個(gè)堿基且只在一個(gè)序列中出現(xiàn)的位點(diǎn)不屬于信息位點(diǎn),因?yàn)槟欠N獨(dú)特的堿基位點(diǎn)是由于在直接通向它所在序列的分枝上發(fā)生單個(gè)堿基變更所引起的。這種堿基變更可與任何拓?fù)浣Y(jié)構(gòu)相容。 Two types of variable sites:Informative: favors a subset of trees over other possible trees.Uninformative: a charac
32、ter that contains no groupinginformation relevant to a cladistic problem (i.e. autapomorphies).Uninformative: each tree 3 steps2nd step: Calculate the minimum numberof substitutions at each informative siteInformative: favors tree 1 over other 2 trees.1 step2 steps2 stepsFinal step: Sum the number of chan
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五年度供應(yīng)鏈反擔(dān)保保證合同書3篇
- 2025年度個(gè)人助學(xué)貸款合同3篇
- 2024清罐拆除工程附帶特種作業(yè)人員培訓(xùn)合同3篇
- 二零二五年度農(nóng)業(yè)科技公司股東退股保密與土地使用協(xié)議3篇
- 2024年網(wǎng)絡(luò)安全員聘書3篇
- 揭陽古建施工方案
- 2025版特許經(jīng)營合同:快餐品牌與加盟商之間的連鎖加盟協(xié)議3篇
- 2024年早教市場深度調(diào)查評估及投資方向研究報(bào)告
- 二零二五年度兒童玩具展覽合作合同范本2篇
- 2024年礦業(yè)資產(chǎn)并購專項(xiàng)法律咨詢協(xié)議版B版
- 水務(wù)集團(tuán)定崗定員方案范文
- 2023-2024學(xué)年河北省高二上學(xué)期期末考試生物試題(解析版)
- 金剛砂固化地坪施工合同
- 車輛駕駛考試培訓(xùn)委托書
- 開票稅點(diǎn)自動計(jì)算器
- 2024親戚借名買房協(xié)議書
- 小學(xué)二年級上冊數(shù)學(xué)-數(shù)角的個(gè)數(shù)專項(xiàng)練習(xí)
- 期末核心素養(yǎng)測評卷2023-2024學(xué)年語文五年級上冊+統(tǒng)編版
- 醫(yī)療器械質(zhì)量安全風(fēng)險(xiǎn)會商管理制度
- 《我愛上班》朗誦稿
- 2024年石油石化技能考試-石油鉆井工筆試參考題庫含答案
評論
0/150
提交評論