2024人工智能AI技術(shù)教程

上傳人：1*** IP屬地：山東上傳時間：2024-12-24 格式：PPTX 頁數(shù)：52 大?。?.95MB 積分：18 舉報 版權(quán)申訴

已閱讀5頁，還剩47頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認領(lǐng)

文檔簡介

2024人工智能AI技術(shù)教程課程講義名稱備注1課程介紹Overviewand

system/AI

basics2人工智能系統(tǒng)概述System

perspective

System

for

AISystem

for

AI:

historic

view;

Fundamentals

neural

networks;Fundamentals

System

for

AI3深度神經(jīng)網(wǎng)絡(luò)計算框架基礎(chǔ)Computation

frameworks

for

DNNBackprop

and

AD,

Tensor,

DAG,

Execution

graphPapers

and

systems:

PyTorch,

TensorFlow4矩陣運算與計算機體系結(jié)構(gòu)Computer

architecture

for

Matrix

computationMatrix

computation,

CPU/SIMD,

GPGPU,

ASIC/TPUPapers

and

systems:

Blas,

TPU5分布式訓(xùn)練算法Distributed

training

algorithmsData

parallelism,

model

parallelism,

distributed

SGDPapers

and

systems:6分布式訓(xùn)練系統(tǒng)Distributed

training

systemsMPI,

parameter

servers,all-reduce,

RDMAPapers

and

systems:

Horovod7異構(gòu)計算集群調(diào)度與資源管理系統(tǒng)Scheduling

and

resource

management

systemRunning

DNNjob

cluster:

container,

resource

allocation,

schedulingPapers

and

systems:

KubeFlow,

OpenPAI,

Gandiva,

HiveD8深度學(xué)習(xí)推導(dǎo)系統(tǒng)Inference

systemsEfficiency,

latency,

throughput,

and

deployment課程講義名稱備注9計算圖編譯優(yōu)化Computation

graph

compilation

and

optimizationIR,

sub-graph

pattern

match,

Matrix

multiplication

and

memoryoptimizationPapers

and

systems:

XLA,

MLIR,

TVM,NNFusion10模型壓縮和稀疏化處理Efficiency

via

compression

and

sparsityModel

compression,

SparsityPruning11自動機器學(xué)習(xí)系統(tǒng)AutoML

systemsHyper

parameter

tuning,

NASPapers

and

systems:

Hyperband,

SMAC,

ENAS,AutoKeras,

NNI12強化學(xué)習(xí)系統(tǒng)Reinforcement

learning

systemsTheory

RL,

systems

for

RLPapers

and

systems:

AC3,

RLlib,

AlphaZero13模型安全與隱私保護Security

and

PrivacyFederated

learning,

security,

privacyPapers

and

systems:

DeepFake14用AI技術(shù)優(yōu)化計算機系統(tǒng)AIfor

systemsAI

for

traditional

systems

problems,

for

system

algorithmsPapers

and

systems:

Learned

Indexes,

Learned

query

path課程講義名稱備注Lab

(for

week1,2)框架及工具入門示例A

simple

throughout

end-to-end

example,

from

asystem

perspectiveUnderstand

the

systems

from

debugger

info

andsystem

logsLab

(for

week

3)定制一個新的張量運算Customize

operatorsDesign

and

implement

customized

operator

(bothforward

and

backward):

pythonLab

(for

week

4)CUDA實現(xiàn)和優(yōu)化CUDA

implementationAdd

CUDA

implementation

for

thecustomizedoperatorLab

(for

week

5,6)AllReduce實現(xiàn)和優(yōu)化AllReduceImprove

one

AllReduce

operators’implementation

onHorovodLab

(for

week

8)配置Container來進行云上訓(xùn)練或推理準(zhǔn)備Configure

containers

for

customized

training

and

inferenceConfigure

containersLab

6學(xué)習(xí)使用調(diào)度管理系統(tǒng)Scheduling

and

resource

management

systemGet

familiar

with

OpenPAI

KubeFlowLab

7分布式訓(xùn)練任務(wù)練習(xí)Distributed

trainingTry

different

kinds

all

reduce

implementationsLab

8自動機器學(xué)習(xí)系統(tǒng)練習(xí)AutoMLSearch

for

new

neural

networkNN

structuree

forImage/NLP

tasksLab

9強化學(xué)習(xí)系統(tǒng)練習(xí)RLSystemsConfigure

and

get

familiar

with

one

the

followingRL

Systems:

RLlib,

…Deep

Learning深度學(xué)習(xí)正在改變世界Self-drivingPersonalassistantSurveillance

detectionTranslationMedicaldiagnosticsGameArtImage

recognitionSpeech

recognitionNatural

languageGenerative

modelReinforcement

learningCatDogRaccoonDogcatdoghoney

badger??1??2??3??4??5??error????5??error????4??error????3??error????2??error????1ErrorslossRDMA計算能力海量的（標(biāo)識）數(shù)據(jù)14M

images深度學(xué)習(xí)算法的進步語言、框架深度學(xué)習(xí)+系統(tǒng)的進步:編程語言、優(yōu)化、計算機體系結(jié)構(gòu)、并行計算以及分布式系統(tǒng)MNISTImageNetWeb

Images60K

samples16M

samplesBillions

Images10

categories1000

categoriesOpened

categoriesE.g.,

image

classification

problem1257.73.31.44.71.70.23TEST

ERROR

RATE

(%)LeNet,convolution,max-pooling,softmax,

1998AlexNet,

16.4%ReLU,

Dropout,2012Inception,6.7%Batchnormalization,2015ResNet,3.57%Residual

way,2015EfficientNet,3.1%NAS2019Image

recognitionSpeech

recognitionNatural

languageReinforcement

learning19602019CPUMoore’s

law108x1970

19801990

20002010ENIAC5

Kops~500

GopsXeon

E5DedicatedHardware105xGPUTPUTPUv3360

TopsV100TPUv1125

Tops90

Tops?Performance(Op/Sec)Deep

learning

frameworksMxNetTensorFlowCNTKPyTorchLanguage

FrontendSwift

for

TensorFlowCompiler

BackendTVMTensorFlow

XLACustom

purposemachine

learningalgorithmsTheanoDisBeliefCaffeAlgebra

&linear

libsCPUGPUDense

matmul

engineGPUFPGASpecial

acceleratorsTPUGraphCoreOther

ASICsAI

frameworkDense

matmulengineDeep

learningframeworksprovide

easierways

leveragevarious

librariesCustom

purposemachine

learningalgorithmsTheanoDisBeliefCaffeAlgebra

&linear

libsCPUGPUA

Full-Featured

Programming

Language

forML:

Expressive

and

flexibleControl

flow,

recursion,

sparsityPowerful

Compiler

Infrastructure:Code

optimization,

sparsity

optimization,hardware

targetingMachine

Learning

Language

andCompilerSIMD

MIMDSparsity

SupportControl

Flowand

DynamicityAssociated

MemoryScalable

Network

Stack

(RDMA,

IB,

NVLink)Hardware

APIs

(GPU,

CPU,

FPGA,

ASIC)Resource

Management/SchedulerExperienceFrameworksArchitecture(single

node

and

Cloud)Deep

Learning

Runtime:Optimizer,

Planner,

ExecutorRuntimeEnd-to-End

User

ExperiencesModel,

Algorithm,

Pipeline,

Experiment,

Tool,Life

CycleManagementProgramming

InterfacesComputation

graph,

(auto)

Gradient

calculationIR,

Compiler

infrastructureclass

3class

4class

5class

6class

7class

8更廣泛的AI系統(tǒng)生態(tài)機器學(xué)習(xí)新模式(RL)自動機器學(xué)習(xí)(AutoML)安全與隱私模型推導(dǎo)、壓縮與優(yōu)化深度學(xué)習(xí)算法和框架廣泛用途的高效新型通用AI算法多種深度學(xué)習(xí)框架的支持與進化深度神經(jīng)網(wǎng)絡(luò)編譯架構(gòu)及優(yōu)化核心系統(tǒng)軟硬件深度學(xué)習(xí)任務(wù)運行和優(yōu)化環(huán)境通用資源管理和調(diào)度系統(tǒng)新型硬件及相關(guān)高性能網(wǎng)絡(luò)和計算棧class

12class

11class

13class

10（2）開始訓(xùn)練（1）定義網(wǎng)絡(luò)結(jié)構(gòu)Fullyconnected 通常用作分類問題的最后幾層Convolutionalneural

network 通常用作圖像、語音等Locality強的數(shù)據(jù)Recurrentneural

network 通常用作序列及結(jié)構(gòu)化的數(shù)據(jù)，比如文本信息、知識圖Transformerneural

network 通常用作序列數(shù)據(jù)，比如文本信息#

recursive

TreeBank

model

dozen

lines

JPL

code#

Walk

the

tree,

accumulating

embedding

vecs#

Word

embedding

model

used

the

leaf

node

map

word#

index

into

high-dimensional

semantic

word

representation.#

Map

tree

embedding

sentiment#

Getsemantic

representations

forleft

and

right

children.#

composition

function

used

learn

semantic#

representation

for

phrase

the

internal

node.更多樣化的結(jié)構(gòu)更強大的建模能力更復(fù)雜的依賴關(guān)系更細粒度的計算模式Graph

definition

(IR)x

yFront-endLanguage

Binding:

Python,

Lua,

C++OptimizationBatching,

Cache,

OverlapExecution

RuntimeCPU,

GPU,

RDMA

devicesTensorFlowx

yz*a+bΣcData-Flow

Graph

(DFG)as

Intermediate

Representation??b??a??x??y??z+??*??TensorFlowx

yz*a+bΣ

Σ??cAdd

gradient

backpropagation

Data-FlowGraph

(DFG)??b??a??z+??*??xy

??x

??y*a+bΣ

Σ??cCPU

codeGPU

code??b??a??z+??*??xy

??x

??y*a+bΣ

Σ??c......1OperatorsExperienceFrameworksArchitectureIDEProgramming

with:

VSCode,

Jupiter

NotebookLanguageIntegrated

with

mainstream

PL:

PyTorch

and

TensorFlow

inside

PythonCompilerIntermediate

representationCompilationOptimizationBasic

data

structure:

TensorLexical

analysis:

TokenUser

controlled:

mini-batchBasic

computation:

DAGParsing:

ASTData

parallelism

and

model

parallelismAdvance

features:

control

flowSemantic

analysis:Symbolic

ADLoop

nets

analysis:

pipeline

parallelism,control

flowGeneral

IRs:

MLIRCode

optimizationData

flow

analysis:

CSP,

Arithmetic,

FusionCode

generationHardware

dependent

optimizations:matrix

computation,

layoutResource

allocation

and

scheduler:memory,

recomputation,RuntimesSingle

node:

CuDNNMultimode:

Parameter

servers,

All

reducerComputation

cluster

resource

management

and

job

schedulerHardwareHardware

accelerators:CPU/GPU/ASIC/FPGANetworkaccelerators:

RDMA/IB/NVLinkDeep

learning

frameworksMxNetTensorFlowCNTKPyTorchLanguage

FrontendSwift

for

TensorFlowCompiler

BackendTVMTensorFlow

XLAAI

Framework

Densematmul

engineGPUFPGASpecial

acceleratorsTPUGraphCoreOther

ASICsimport

"tensorflow/core/framework/to";import

"tensorflow/core/framework/op_to";import

"tensorflow/core/framework/tensor_toAFull-Featured

Programming

Languagefor

ML:

Expressive

and

flexibleControl

flow,

recursion,

sparsityPowerful

Compiler

Infrastructure:Code

optimization,

sparsity

optimization,hardwaretargetingMachine

Learning

Language

andCompilerSIMD

MIMDSparsity

SupportControl

Flowand

DynamicityAssociated

Memory//

Syntactically

similar

LLVM:func

@testFunction(%arg0:

i32){%x

call

@thingToCall(%arg0)

(i32)->

i32br

^bb1^bb1:%y

addi

%x,

%x:i32return

i32}深度學(xué)習(xí)高度依賴數(shù)據(jù)規(guī)模和模型規(guī)模提高訓(xùn)練速度可以加快深度學(xué)習(xí)模型的開發(fā)速度大規(guī)模部署深度學(xué)習(xí)模型需要更快和更高效的推演速度Inference

performance

Serving

latency8

layers1.4

GFLOP16%

Error2012AlexNetImage152

layers22.6

GFLOP3.5%

Error2015ResNetSpeech80

GFLOP7,000

hrs

Data8%

Error2014Deep

Speech

1465

GFLOP12,000

hrs

Data5%

Error2015Deep

Speech

2Different

architectures:

CNN,RNN,

Transformer,

…High

computation

resourcerequirements:

model

size,

…Different

goals:

latency,throughput,

accuracy,

…Transparently

apply

over

heterogeneous

hardware

environmentScale-out Local

Efficiency Memory

EffectivenessBe

transparent

various

user

requirements系統(tǒng)、算法和硬件必須相互結(jié)合

人人文庫> 全部分類> 專業(yè)文獻 > 工程機械

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

2024人工智能AI技術(shù)教程

文檔簡介

溫馨提示

最新文檔

評論

2024人工智能AI技術(shù)教程

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔