




版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、 Hortonworks Inc. 2014Page 1Apache Tez : Next Generation Execution Engine upon HadoopJeff Zhang Hortonworks Inc. 2014OutlineTez IntroductionTez APITez InternalTez Project StatusQ & A Hortonworks Inc. 2014Tez IntroductionPage 3 Distributed execution framework targeted towards data-processing appl
2、ications. Based on expressing a computation as a dataflow graph. Highly customizable to meet a broad spectrum of use cases. Built on top of YARN the resource management framework for Hadoop. Open source Apache project and Apache licensed.Hadoop 1 - Hadoop 2HADOOP 1.0HDFS(redundant, reliable storage)
3、MapReduce(cluster resource management & data processing)Pig(data flow)Hive(sql)Others(cascading)HDFS2(redundant, reliable storage)YARN(cluster resource management)Tez(execution engine)HADOOP 2.0Data FlowPigSQLHiveOthers(Cascading)BatchMapReduceReal Time Stream ProcessingStormOnline Data Processi
4、ngHBase,AccumuloMonolithicResource ManagementExecution EngineUser APILayeredResource Management YARNExecution Engine TezUser API Hive, Pig, Cascading, Your App! Hortonworks Inc. 2014Pig/Hive-MR versus Pig/Hive-TezPage 5Pig/Hive - MRPig/Hive - TezI/O Synchronization BarrierI/O Synchronization Barrier
5、Job 1Job 2Job 3Single Job Hortonworks Inc. 2014OutlineTez IntroductionTez APITez InternalTez Project InfoQ & A Hortonworks Inc. 2014Tez Expressing the computationPage 7Tez provides the following APIs to define the processing DAG API ( Vertex, Edge ) Defines the structure of the data processing a
6、nd the relationship between producers and consumers Enable definition of complex data flow pipelines using simple graph connection APIs. Tez expands the logical DAG at runtime This is how all the tasks in the job get specified Runtime API ( Task ) Defines the interfaces using which the framework and
7、 app code interact with each other App code transforms data and moves it between tasks This is how we specify what actually executes in each task on the cluster nodes Hortonworks Inc. 2014Tez DAG API / Define DAG DAG dag = new DAG(); / Define Vertex Vertex map1 = new Vertex(MapProcessor.class); / De
8、fine Edge Edge edge1 = Edge(map1, reduce1, SCATTER_GATHER, PERSISTED, SEQUENTIAL, Output.class, Input.class); / Connect them dag.addVertex(map1) .addEdge(edge) Page 8Simple DAG definition API Hortonworks Inc. 2014Tez DAG APIPage 9 Data movement Defines routing of data between tasksOne-To-One : Data
9、from the ith producer task routes to the ith consumer task.Broadcast : Data from a producer task routes to all consumer tasks.Scatter-Gather : Producer tasks scatter data into shards and consumer tasks gather the data. The ith shard from all producer tasks routes to the ith consumer task. Scheduling
10、 Defines when a consumer task is scheduledSequential : Consumer task may be scheduled after a producer task completes.Concurrent : Consumer task must be co-scheduled with a producer task. Data source Defines the lifetime/reliability of a task outputPersisted : Output will be available after the task
11、 exits. Output may be lost later on.Persisted-Reliable : Output is reliably stored and will always be availableEphemeral : Output is available only while the producer task is runningEdge properties define the connection between producer and consumer tasks in the DAG Hortonworks Inc. 2014Tez Runtime
12、API (IPO)Flexible Inputs-Processor-Outputs Model Thin API layer to wrap around arbitrary application code Compose inputs, processor and outputs to execute arbitrary processing Event routing based control plane architecture Applications decide logical data format and data transfer technology Customiz
13、e for performance Built-in implementations for Hadoop 2.0 data services HDFS and YARN ShuffleService. Built on the same API. Your impls are as first class as ours!Page 10InputProcessorOutput Hortonworks Inc. 2014Tez Runtime API ( VertexManager) VertexManagerControl on the flow execution engine in ve
14、rtex level Hortonworks Inc. 2014Tez Logical DAG expansion at RuntimePage 12Reduce1Map2Reduce2JoinMap1 Hortonworks Inc. 2014Tez Performance Benefits of expressing the data processing as a DAG Reducing overheads and queuing effects Gives system the global picture for better planning Efficient use of r
15、esources Re-use resources to maximize utilization Pre-launch, pre-warm and cache Locality & resource aware scheduling Support for application defined DAG modifications at runtime for optimized execution Change task concurrency Change task scheduling Change DAG edges Change DAG verticesPage 13 Ho
16、rtonworks Inc. 2014Tez Benefits of DAG executionFaster Execution and Higher Predictability Eliminate replicated write barrier between successive computations. Eliminate job launch overhead of workflow jobs. Eliminate extra stage of map reads in every workflow job. Eliminate queue and resource conten
17、tion suffered by workflow jobs that are started after a predecessor job completes. Better locality because the engine has the global picturePage 14Pig/Hive - MRPig/Hive - Tez Hortonworks Inc. 2014OutlineTez IntroductionTez APITez InternalTez Project InfoQ & A Hortonworks Inc. 2014Tez System Diag
18、ram Hortonworks Inc. 2014Tez Container Re-Use Reuse YARN containers/JVMs to launch new tasks Reduce scheduling and launching delays Shared in-memory data across tasks JVM JIT friendly executionPage 17YARN Container / JVMTezChildTezTask1TezTask2Shared ObjectsYARN ContainerTez Application MasterStart
19、TaskTask DoneStart Task Hortonworks Inc. 2014Tez Customizable Core EnginePage 18Vertex-2Vertex-1StartvertexVertex ManagerStarttasksDAGSchedulerGet PriorityGet PriorityStartvertexTaskSchedulerGet containerGet containerVertex ManagerDetermines task parallelismDetermines when tasks in a vertex can star
20、t.DAG SchedulerDetermines priority of taskTask SchedulerAllocates containers from YARN and assigns them to tasks Hortonworks Inc. 2014Tez SessionsPage 19Application MasterClientStart SessionSubmit DAGTask SchedulerContainer PoolShared Object RegistryPre WarmedJVMSessions Standard concepts of pre-lau
21、nch and pre-warm applied Key for interactive queries Represents a connection between the user and the cluster Multiple DAGs executed in the same session Containers re-used across queries Takes care of data locality and releasing resources when idle Hortonworks Inc. 2014Use case of Tez Split Group by
22、 + Join Orderby Automatic Reduce Parallelism Reduce Slow Start/Pre-launch Hortonworks Inc. 2014Split Group by + Joinf = load foo as (x,y,z)f1 = Group f By x;f2 = Group f By y;j = Join g1 by group, g2 by group Hortonworks Inc. 2014Orderbyf = Load foo as (x, y);o = Order f by x; Hortonworks Inc. 2014T
23、ez Automatic Reduce ParallelismPage 23 Map VertexReduce VertexApp MasterVertex ManagerData Size StatisticsVertex StateMachineSet ParallelismCancel TaskRe-RouteEvent ModelMap tasks send data statistics events to the Reduce Vertex Manager.Vertex ManagerPluggable application logic that understands the
24、data statistics and can formulate the correct parallelism. Advises vertex controller on parallelism Hortonworks Inc. 2014Tez Reduce Slow Start/Pre-launchPage 24 Map VertexReduce VertexApp MasterVertex ManagerTask CompletedVertex StateMachineStart TasksStartEvent ModelMap completion events sent to th
25、e Reduce Vertex Manager.Vertex ManagerPluggable application logic that understands the data size. Advises the vertex controller to launch the reducers before all maps have completed so that shuffle can start. Hortonworks Inc. 2014OutlineTez IntroductionTez APITez InternalTez Project StatusQ & A
26、Hortonworks Inc. 2014Tez Current status Apache Top Level ProjectRapid development. Over 1500 jiras opened. Over 1100 resolvedGrowing community of contributors and usersLatest release is 0.5 Focus on stabilityTesting and quality are highest priorityCode ready and deployed on multi-node environments a
27、t scale Support for a vast topology of DAGs Already functionally equivalent to Map Reduce. Existing Map Reduce jobs can be executed on Tez with few or no changesApache Hive 0.13 release supports Tez as an execution engine (HIVE-4660)Apache Pig port to Tez is also done(PIG-3446)Cascading 3.0 support
28、TezPage 26 Hortonworks Inc. 2014Tez Adoption Apache Hive Hadoop standard for declarative access via SQL-like interface Apache Pig Hadoop standard for procedural scripting and pipeline processing Cascading Developer friendly Java API and SDK Scalding (Scala API on Cascading) Commercial Vendors ETL : Use Tez instead of MR or custom pipelines Analytics Vendors : Use Tez as a target platform for scaling parallel analytical tools to large
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 新人教版高中語(yǔ)文必修2近代科學(xué)進(jìn)入中國(guó)的回顧與前瞻 同步練習(xí)1
- 高中語(yǔ)文第二冊(cè)赤壁賦 同步練習(xí)1
- 高二上冊(cè)語(yǔ)文(人教版)夢(mèng)游天姥吟留別閱讀高速路 同步閱讀
- 修整祠堂合同范例
- 個(gè)人砂石料采購(gòu)合同范本
- nk細(xì)胞研發(fā)合同范例
- 個(gè)人對(duì)公材料合同范例
- 人力公司服務(wù)合同范例
- 分公司協(xié)議合同范例
- 代理報(bào)稅公司合同范例
- 2024年小學(xué)語(yǔ)文新部編版一年級(jí)上冊(cè)全冊(cè)教案
- 密碼學(xué)課件 古典密碼學(xué)
- GB/T 44438-2024家具床墊功能特性測(cè)試方法
- 2022版義務(wù)教育(歷史)課程標(biāo)準(zhǔn)(附課標(biāo)解讀)
- 2025軌道交通工程周邊環(huán)境調(diào)查與評(píng)價(jià)規(guī)程
- DL∕T 1928-2018 火力發(fā)電廠氫氣系統(tǒng)安全運(yùn)行技術(shù)導(dǎo)則
- CJT 526-2018 軟土固化劑 標(biāo)準(zhǔn)
- 品質(zhì)提升計(jì)劃改善報(bào)告課件
- 中國(guó)嗜酸性粒細(xì)胞增多癥診斷和治療指南(2024版)解讀
- 上海高考化學(xué)考綱知識(shí)點(diǎn)版
- 《基于mRNA-LNP技術(shù)的(細(xì)胞)免疫治療產(chǎn)品開(kāi)發(fā)指南》征求意見(jiàn)稿
評(píng)論
0/150
提交評(píng)論