版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
2024人工智能AI技術(shù)教程課程講義名稱備注1課程介紹Overviewand
system/AI
basics2人工智能系統(tǒng)概述System
perspective
of
System
for
AISystem
for
AI:
a
historic
view;
Fundamentals
of
neural
networks;Fundamentals
of
System
for
AI3深度神經(jīng)網(wǎng)絡(luò)計算框架基礎(chǔ)Computation
frameworks
for
DNNBackprop
and
AD,
Tensor,
DAG,
Execution
graphPapers
and
systems:
PyTorch,
TensorFlow4矩陣運算與計算機體系結(jié)構(gòu)Computer
architecture
for
Matrix
computationMatrix
computation,
CPU/SIMD,
GPGPU,
ASIC/TPUPapers
and
systems:
Blas,
TPU5分布式訓(xùn)練算法Distributed
training
algorithmsData
parallelism,
model
parallelism,
distributed
SGDPapers
and
systems:6分布式訓(xùn)練系統(tǒng)Distributed
training
systemsMPI,
parameter
servers,all-reduce,
RDMAPapers
and
systems:
Horovod7異構(gòu)計算集群調(diào)度與資源管理系統(tǒng)Scheduling
and
resource
management
systemRunning
DNNjob
on
cluster:
container,
resource
allocation,
schedulingPapers
and
systems:
KubeFlow,
OpenPAI,
Gandiva,
HiveD8深度學(xué)習(xí)推導(dǎo)系統(tǒng)Inference
systemsEfficiency,
latency,
throughput,
and
deployment課程講義名稱備注9計算圖編譯優(yōu)化Computation
graph
compilation
and
optimizationIR,
sub-graph
pattern
match,
Matrix
multiplication
and
memoryoptimizationPapers
and
systems:
XLA,
MLIR,
TVM,NNFusion10模型壓縮和稀疏化處理Efficiency
via
compression
and
sparsityModel
compression,
SparsityPruning11自動機器學(xué)習(xí)系統(tǒng)AutoML
systemsHyper
parameter
tuning,
NASPapers
and
systems:
Hyperband,
SMAC,
ENAS,AutoKeras,
NNI12強化學(xué)習(xí)系統(tǒng)Reinforcement
learning
systemsTheory
of
RL,
systems
for
RLPapers
and
systems:
AC3,
RLlib,
AlphaZero13模型安全與隱私保護Security
and
PrivacyFederated
learning,
security,
privacyPapers
and
systems:
DeepFake14用AI技術(shù)優(yōu)化計算機系統(tǒng)AIfor
systemsAI
for
traditional
systems
problems,
for
system
algorithmsPapers
and
systems:
Learned
Indexes,
Learned
query
path課程講義名稱備注Lab
1
(for
week1,2)框架及工具入門示例A
simple
throughout
end-to-end
AI
example,
from
asystem
perspectiveUnderstand
the
systems
from
debugger
info
andsystem
logsLab
2
(for
week
3)定制一個新的張量運算Customize
operatorsDesign
and
implement
a
customized
operator
(bothforward
and
backward):
in
pythonLab
3
(for
week
4)CUDA實現(xiàn)和優(yōu)化CUDA
implementationAdd
a
CUDA
implementation
for
thecustomizedoperatorLab
4
(for
week
5,6)AllReduce實現(xiàn)和優(yōu)化AllReduceImprove
one
of
AllReduce
operators’implementation
onHorovodLab
5
(for
week
7,
8)配置Container來進行云上訓(xùn)練或推理準(zhǔn)備Configure
containers
for
customized
training
and
inferenceConfigure
containersLab
6學(xué)習(xí)使用調(diào)度管理系統(tǒng)Scheduling
and
resource
management
systemGet
familiar
with
OpenPAI
or
KubeFlowLab
7分布式訓(xùn)練任務(wù)練習(xí)Distributed
trainingTry
different
kinds
of
all
reduce
implementationsLab
8自動機器學(xué)習(xí)系統(tǒng)練習(xí)AutoMLSearch
for
a
new
neural
networkNN
structuree
forImage/NLP
tasksLab
9強化學(xué)習(xí)系統(tǒng)練習(xí)RLSystemsConfigure
and
get
familiar
with
one
of
the
followingRL
Systems:
RLlib,
…Deep
Learning深度學(xué)習(xí)正在改變世界Self-drivingPersonalassistantSurveillance
detectionTranslationMedicaldiagnosticsGameArtImage
recognitionSpeech
recognitionNatural
languageGenerative
modelReinforcement
learningCatDogRaccoonDogcatdoghoney
badger??1??2??3??4??5??error????5??error????4??error????3??error????2??error????1ErrorslossRDMA計算能力海量的(標(biāo)識)數(shù)據(jù)14M
images深度學(xué)習(xí)算法的進步語言、框架深度學(xué)習(xí)+系統(tǒng)的進步:編程語言、優(yōu)化、計算機體系結(jié)構(gòu)、并行計算以及分布式系統(tǒng)MNISTImageNetWeb
Images60K
samples16M
samplesBillions
of
Images10
categories1000
categoriesOpened
categoriesE.g.,
image
classification
problem1257.73.31.44.71.70.23TEST
ERROR
RATE
(%)LeNet,convolution,max-pooling,softmax,
1998AlexNet,
16.4%ReLU,
Dropout,2012Inception,6.7%Batchnormalization,2015ResNet,3.57%Residual
way,2015EfficientNet,3.1%NAS2019Image
recognitionSpeech
recognitionNatural
languageReinforcement
learning19602019CPUMoore’s
law108x1970
19801990
20002010ENIAC5
Kops~500
GopsXeon
E5DedicatedHardware105xGPUTPUTPUv3360
TopsV100TPUv1125
Tops90
Tops?Performance(Op/Sec)Deep
learning
frameworksMxNetTensorFlowCNTKPyTorchLanguage
FrontendSwift
for
TensorFlowCompiler
BackendTVMTensorFlow
XLACustom
purposemachine
learningalgorithmsTheanoDisBeliefCaffeAlgebra
&linear
libsCPUGPUDense
matmul
engineGPUFPGASpecial
AI
acceleratorsTPUGraphCoreOther
ASICsAI
frameworkDense
matmulengineDeep
learningframeworksprovide
easierways
to
leveragevarious
librariesCustom
purposemachine
learningalgorithmsTheanoDisBeliefCaffeAlgebra
&linear
libsCPUGPUA
Full-Featured
Programming
Language
forML:
Expressive
and
flexibleControl
flow,
recursion,
sparsityPowerful
Compiler
Infrastructure:Code
optimization,
sparsity
optimization,hardware
targetingMachine
Learning
Language
andCompilerSIMD
MIMDSparsity
SupportControl
Flowand
DynamicityAssociated
MemoryScalable
Network
Stack
(RDMA,
IB,
NVLink)Hardware
APIs
(GPU,
CPU,
FPGA,
ASIC)Resource
Management/SchedulerExperienceFrameworksArchitecture(single
node
and
Cloud)Deep
Learning
Runtime:Optimizer,
Planner,
ExecutorRuntimeEnd-to-End
AI
User
ExperiencesModel,
Algorithm,
Pipeline,
Experiment,
Tool,Life
CycleManagementProgramming
InterfacesComputation
graph,
(auto)
Gradient
calculationIR,
Compiler
infrastructureclass
3class
4class
5class
6class
7class
8更廣泛的AI系統(tǒng)生態(tài)機器學(xué)習(xí)新模式(RL)自動機器學(xué)習(xí)(AutoML)安全與隱私模型推導(dǎo)、壓縮與優(yōu)化深度學(xué)習(xí)算法和框架廣泛用途的高效新型通用AI算法多種深度學(xué)習(xí)框架的支持與進化深度神經(jīng)網(wǎng)絡(luò)編譯架構(gòu)及優(yōu)化核心系統(tǒng)軟硬件深度學(xué)習(xí)任務(wù)運行和優(yōu)化環(huán)境通用資源管理和調(diào)度系統(tǒng)新型硬件及相關(guān)高性能網(wǎng)絡(luò)和計算棧class
12class
11class
13class
10(2)開始訓(xùn)練(1)定義網(wǎng)絡(luò)結(jié)構(gòu)Fullyconnected 通常用作分類問題的最后幾層Convolutionalneural
network 通常用作圖像、語音等Locality強的數(shù)據(jù)Recurrentneural
network 通常用作序列及結(jié)構(gòu)化的數(shù)據(jù),比如文本信息、知識圖Transformerneural
network 通常用作序列數(shù)據(jù),比如文本信息#
A
recursive
TreeBank
model
in
a
dozen
lines
of
JPL
code#
Walk
the
tree,
accumulating
embedding
vecs#
Word
embedding
model
is
used
at
the
leaf
node
to
map
word#
index
into
high-dimensional
semantic
word
representation.#
Map
tree
embedding
to
sentiment#
Getsemantic
representations
forleft
and
right
children.#
A
composition
function
is
used
to
learn
semantic#
representation
for
phrase
at
the
internal
node.更多樣化的結(jié)構(gòu)更強大的建模能力更復(fù)雜的依賴關(guān)系更細粒度的計算模式Graph
definition
(IR)x
*w
b+
yFront-endLanguage
Binding:
Python,
Lua,
R,
C++OptimizationBatching,
Cache,
OverlapExecution
RuntimeCPU,
GPU,
RDMA
devicesTensorFlowx
yz*a+bΣcData-Flow
Graph
(DFG)as
Intermediate
Representation??b??a??x??y??z+??*??TensorFlowx
yz*a+bΣ
Σ??cAdd
gradient
backpropagation
to
Data-FlowGraph
(DFG)??b??a??z+??*??xy
z
??x
??y*a+bΣ
Σ??cCPU
codeGPU
code??b??a??z+??*??xy
z
??x
??y*a+bΣ
Σ??c......1OperatorsExperienceFrameworksArchitectureIDEProgramming
with:
VSCode,
Jupiter
NotebookLanguageIntegrated
with
mainstream
PL:
PyTorch
and
TensorFlow
inside
PythonCompilerIntermediate
representationCompilationOptimizationBasic
data
structure:
TensorLexical
analysis:
TokenUser
controlled:
mini-batchBasic
computation:
DAGParsing:
ASTData
parallelism
and
model
parallelismAdvance
features:
control
flowSemantic
analysis:Symbolic
ADLoop
nets
analysis:
pipeline
parallelism,control
flowGeneral
IRs:
MLIRCode
optimizationData
flow
analysis:
CSP,
Arithmetic,
FusionCode
generationHardware
dependent
optimizations:matrix
computation,
layoutResource
allocation
and
scheduler:memory,
recomputation,RuntimesSingle
node:
CuDNNMultimode:
Parameter
servers,
All
reducerComputation
cluster
resource
management
and
job
schedulerHardwareHardware
accelerators:CPU/GPU/ASIC/FPGANetworkaccelerators:
RDMA/IB/NVLinkDeep
learning
frameworksMxNetTensorFlowCNTKPyTorchLanguage
FrontendSwift
for
TensorFlowCompiler
BackendTVMTensorFlow
XLAAI
Framework
Densematmul
engineGPUFPGASpecial
AI
acceleratorsTPUGraphCoreOther
ASICsimport
"tensorflow/core/framework/to";import
"tensorflow/core/framework/op_to";import
"tensorflow/core/framework/tensor_toAFull-Featured
Programming
Languagefor
ML:
Expressive
and
flexibleControl
flow,
recursion,
sparsityPowerful
Compiler
Infrastructure:Code
optimization,
sparsity
optimization,hardwaretargetingMachine
Learning
Language
andCompilerSIMD
MIMDSparsity
SupportControl
Flowand
DynamicityAssociated
Memory//
Syntactically
similar
to
LLVM:func
@testFunction(%arg0:
i32){%x
=
call
@thingToCall(%arg0)
:
(i32)->
i32br
^bb1^bb1:%y
=
addi
%x,
%x:i32return
%y
:
i32}深度學(xué)習(xí)高度依賴數(shù)據(jù)規(guī)模和模型規(guī)模提高訓(xùn)練速度可以加快深度學(xué)習(xí)模型的開發(fā)速度大規(guī)模部署深度學(xué)習(xí)模型需要更快和更高效的推演速度Inference
performance
Serving
latency8
layers1.4
GFLOP16%
Error2012AlexNetImage152
layers22.6
GFLOP3.5%
Error2015ResNetSpeech80
GFLOP7,000
hrs
of
Data8%
Error2014Deep
Speech
1465
GFLOP12,000
hrs
of
Data5%
Error2015Deep
Speech
2Different
architectures:
CNN,RNN,
Transformer,
…High
computation
resourcerequirements:
model
size,
…Different
goals:
latency,throughput,
accuracy,
…Transparently
apply
over
heterogeneous
hardware
environmentScale-out Local
Efficiency Memory
EffectivenessBe
transparent
to
various
user
requirements系統(tǒng)、算法和硬件必須相互結(jié)合
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 教研成果成果轉(zhuǎn)化
- 裝修設(shè)計師的工作總結(jié)
- 房地產(chǎn)行業(yè)設(shè)計師工作總結(jié)
- 2024年爬山安全教案
- 2024年計算機應(yīng)屆生簡歷
- 農(nóng)田租賃協(xié)議書(2篇)
- 2024年苯噻草胺項目營銷方案
- 《贛州市國家稅務(wù)局》課件
- 烏魯木齊市實驗學(xué)校2023-2024學(xué)年高三上學(xué)期1月月考政治試題(解析版)
- 甘肅省部分學(xué)校2025屆高三上學(xué)期第一次聯(lián)考(期末)歷史試卷(含答案解析)
- 人教版四年級上冊數(shù)學(xué) 第五單元《平行四邊形和梯形》單元專項訓(xùn)練 作圖題(含答案)
- 物業(yè)品質(zhì)督導(dǎo)述職報告
- 2024年山東濟南軌道交通集團有限公司招聘筆試參考題庫含答案解析
- 療愈行業(yè)現(xiàn)狀分析
- 北京海淀區(qū)2023-2024學(xué)年六年級上學(xué)期期末數(shù)學(xué)數(shù)學(xué)試卷
- 2023年安全總監(jiān)年終工作總結(jié)
- 浙江省杭州拱墅區(qū)2023-2024學(xué)年六年級上學(xué)期期末語文試題
- 以消費者為中心:提升營銷效果的技巧
- 部編版四年級道德與法治上冊期末復(fù)習(xí)計劃
- 獸用疫苗管理制度
- 2023瑞幸員工合同協(xié)議書
評論
0/150
提交評論