基因表達的生物信息學研究北京大學李婷婷_第1頁
基因表達的生物信息學研究北京大學李婷婷_第2頁
基因表達的生物信息學研究北京大學李婷婷_第3頁
基因表達的生物信息學研究北京大學李婷婷_第4頁
基因表達的生物信息學研究北京大學李婷婷_第5頁
已閱讀5頁,還剩51頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領

文檔簡介

1、翻譯DNARNA氨基酸序列折疊蛋白質翻譯后修飾轉錄內(nèi)容大綱n基因表達與基因芯片基因表達與基因芯片n基于基因芯片的基于基因芯片的基因表達基因表達分析分析n基于基于ChIP-chip、 ChIP-seq技術的轉錄調(diào)控分析技術的轉錄調(diào)控分析n基于基于RNA-seq技術的基因表達分析技術的基因表達分析2內(nèi)容大綱n基因表達與基因芯片基因表達與基因芯片n基于基因芯片的基于基因芯片的基因表達基因表達分析分析n基于基于ChIP-chip、 ChIP-seq技術的轉錄調(diào)控分析技術的轉錄調(diào)控分析n基于基于RNA-seq技術的基因表達分析技術的基因表達分析34Gene ExpressionnGene Primary

2、 Transcript nuc. mRNA cytosolic mRNA protein protein activitynBasic cellular processes are realized by tightly regulated gene expression programsnDifferent cell types in a multicellular organism express different sets of genes at different time and with different quantities45nNorthern BlottingDetect

3、ing RNA fragmentsRNA fragments are treated with formaldehyde to ensure linear conformationThe amount of a specific RNA in a sample can be estimated from a Northern blotFrom wiki56Going high-throughput: the basic idea of microarraysExpression Levels: cDNA7Main Types of microarraysnPrinted cDNA arrays

4、 (Brown/Botstein)nShort oligonucleotide arrays (Affymetrix)nLong oligonucleotide arrays (Agilent Inkjet)nFibre optic arrays (Illumina)8Two popular types of microarraysnPrinted cDNA arraysAn 1-kb portion of the coding region of each gene analyzed is individually amplified by PCR, generating a probe o

5、f the geneA robot is used to apply each amplified DNA probe to closely spaced spots on the surface of a glass microscope slideTypical arrays are 2x2 cm and contain 6000 spots of DNAcDNA clones(probes)PCR product amplificationpurificationprintingmicroarrayHybridise target to microarraymRNA target)exc

6、itationlaser 1laser 2emissionscanninganalysisoverlay images and normalise0.1nl/spotPrinted cDNA microarray910The Full Yeast Genome on a Chip11“Low-level” processing of microarray datanImage Processing/AnalysisIdentify spot areaExtract intensities for each spotImprove image qualitynNormalizationDye e

7、ffectsSlide effectsnAssessing Gene Expression LevelsAssess expression levels from image intensities12Two popular types of microarraysnOligonucleotide arraysParallel synthesis of oligonucleotide probes on a slide using photolithographic methodsMultiple probes per gene13Oligonucleotide microarray13SPO

8、TS14Hybridization15SCAN161718Light directed oligonucleotide synthesis A solid support is derivatized with a covalent linker molecule terminated with a photolabile protecting group Light is directed through a mask to deprotect and activate selected sites, and protected nucleotides couple to the activ

9、ated sites. The process is repeated, activating diffrerent sets of sites and coupling different bases allowing arbitrary DNA probes to be constructed at each site.19Raw data annotation software-dChip , RMAnData ProcessingnProbe-level AnalysisNormalize arraysView dataModel-based expression values20Da

10、ta sheet example21張學工等,中國計算機學會通訊, 第2卷, 第3期, 2006年5月 Microarray databasenGEO (/geo/)nArrayExpress (http:/www.ebi.ac.uk/microarray-as/ae/)nSMD (/)222324The final data from microarray CaseGeneCase 1Case 2Case 3Case 4Gene a123.4234.5135.70.123Gene b1234.55

11、678.9321.078.9Gene c765.443.29.23456.1Gene d211.0985.0618.912.3different individuals;different subtypes;different situations;different phases; All genes in the genome;All suspicious genes;Selected genes; 內(nèi)容大綱n基因表達與基因芯片基因表達與基因芯片n基于基因芯片的基于基因芯片的基因表達基因表達分析分析n基于基于ChIP-chip、 ChIP-seq技術的轉錄調(diào)控分析技術的轉錄調(diào)控分析n基于基

12、于RNA-seq技術的基因表達分析技術的基因表達分析25 選擇感興趣的候選基因選擇感興趣的候選基因數(shù)據(jù)的聚類分析:分級聚類小鼠肝臟發(fā)育數(shù)據(jù),美國昂飛公司Mouse430 2.0。14 個時間點: 胚胎期11.5天,12.5天,13.5天, 14.5天,15.5天,16.5天,17.5天,E18.5天,出生0天,3天,7天,14天,21天,和成熟肝臟。它們的功能是什么?2627數(shù)據(jù)格式nAssociation fileGene1 GO:0000009Gene2 GO:0000016Gene3 GO:0000009nDescription fileGO:0000009 lactase activi

13、ty FGO:0000016 alpha-1,6-mannosyltransferase activity F28Population set: totally 10000 genesGenes with function Btotally 1000 genesSelected study settotally 500 genes100 genes withfunction A100 genes withfunction BGenes with function Atotally 150 genesnBoth of the two functions have 100 genes.nFunct

14、ion A: 100/150Function B: 100/1000nThe function A is over-represented in the selected study set than function B.29超幾何分布與GO富集度分析min,11n mi nnNnimipmN 30Available method: DAVID,GeneMerge, 選擇感興趣的候選基因選擇感興趣的候選基因數(shù)據(jù)的聚類分析:分級聚類小鼠肝臟發(fā)育數(shù)據(jù),美國昂飛公司Mouse430 2.0。14 個時間點: 胚胎期11.5天,12.5天,13.5天, 14.5天,15.5天,16.5天,17.5天

15、,E18.5天,出生0天,3天,7天,14天,21天,和成熟肝臟。它們受哪些轉錄因子調(diào)控?轉錄因子結合位點的識別流程32Motif discovery:5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGT

16、GCCATAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACT

17、CATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACAHIS7 ARO4ILV6THR4ARO1HOM2PRO35- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCA

18、TAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA5- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA5- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATT

19、CTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACAHIS7 ARO4ILV6THR4ARO1HOM2PRO3Upstream regions from yeast Sacharomyces cerevisiae genes33HIS7 5- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT5- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG5- CAC

20、ATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT5- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC5- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAAARO4ILV6THR4ARO1Known MotifAAAAGAGTCAMotif search example:34前景集

21、:待研究集合背景集:隨機從基因組中抽取已知模體模體 2模體1利用Motifclass尋找各階段高表達和低表達基因啟動子區(qū)中富集的轉錄調(diào)控因子結合位點 (Smith, et al.)與特定基因集相關的模體35轉錄因子結合位點的識別流程36內(nèi)容大綱n基因表達與基因芯片基因表達與基因芯片n基于基因芯片的基于基因芯片的基因表達基因表達分析分析n基于基于ChIP-chip、 ChIP-seq技術的轉錄調(diào)控分析技術的轉錄調(diào)控分析n基于基于RNA-seq技術的基因表達分析技術的基因表達分析37How to detect transcription factor DNA interactions?nDNase

22、 I Footprinting AnalysisnElectrophoretic Mobility Shift Assay (Gel shift)nYeast One-HybridnChromatin-Immunoprecipitation (ChIP)Figure from /haberlab/jehsite/chIP.htmlChIP-chipnChromatin-immunoprecipitation and microarray analysis (chip) are combined to study protein-DNA int

23、eraction in vivo. /wiki/ChIP-on-chipChIP-seqnChromatin-immunoprecipitation and deep sequencing (seq) are combined to study protein-DNA interaction in vivo. 40“Last generation” sequencingABI 373096 capillaries, automation Sequencing length: 1000 bp41How to speed up sequencing?nP

24、arallel sequencingParallelize the sequencing process, producing millions of sequences at once!nNext-generation Sequencing technologiesRoche (http:/ Genome Sequencing System (GS FLX Standard - Titanium)Illumina (http:/ / Genome Analyzer (1G - GA II - GA IIx - HiSeq2000)ABI (Applied Biosystems) (http:

25、/ - SOLiD 4 System42illumina Gene Analyzer (Solexa)Short reads: up to 150 ntHigh throughput: 300M reads per round43Illumina GASequencing by synthesisSequencing by synthesis! Polymerization of 1 base1 Detection2 Deblock; Fluor removal344DNA(0.1-1.0 ug) Single molecule arraySolexa Sequencing Technolog

26、ySample preparationCluster growth553GTCAGTCAGTCACAGTCATCACCTAGCGTAGT123789456Image acquisition Base calling T G C T A C G A T Sequencing from IlluminaVideo #1 technology Video #2 work flow45PRESLEY_1_FC30G2TAAXX:1:1:13:1622CGACCGCAGCACCGTCTGCCGNGTGAGGNTTNTCTG+PRESLEY_1_FC30G2TAAXX:1:1:13:1622aaaaaaa

27、aaaaW_aZREEKKKUEUEZaXGASCII character code = (quality value + 64) by default New format in illumina GAIISolexa sequencing raw datanFastq Format:BRITNEYSPEARS_1_FC30AB3AAXX:5:1:1601:346GAAGCCGAGGGTGGGCTTTATCGGCAGGGGTGTTTN+BRITNEYSPEARS_1_FC30AB3AAXX:5:1:1601:34640 40 40 40 40 40 40 40 40 40 40 40 40

28、40 40 40 40 40 -5SequenceQuality Score46Reads mappingGenomeShort readschr11 6548 6580 6_1_496_300 1 +chr5 11480 11512 6_1_1641_453 0 + Output format: (e.g. bed file)Look at your data in Genome browser內(nèi)容大綱n基因表達與基因芯片基因表達與基因芯片n基于基因芯片的基于基因芯片的基因表達基因表達分析分析n基于基于ChIP-chip、 ChIP-seq技術的轉錄調(diào)控分析技術的轉錄調(diào)控分析n基于基于RNA-seq技術的基因表達分析技術的基因表達分析4950張學工等,中國計算機學會通訊, 第2卷, 第3期, 2006年5月 RNA-Seq Experiment51Wang et al. 2009RNA-SeqWang et al. 2009 Advantages Digital signal High sensitivity and specificity Whole genome analysis for any speci

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論