




版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
深度解讀DeepSeek:
原理與效應(yīng)熊德意天津大學(xué)dyxiong@https://dyxiong.github.iohttps://tjunlp-lab.github.io仁文伏羲
伏羲傳語(yǔ)天津大學(xué)自然語(yǔ)言處理實(shí)驗(yàn)室The
Natural
Language
Processing
Laboratory
at
Tianjin
UniversityOpenEval營(yíng)大語(yǔ)言模型發(fā)展路線圖DeepSeek
V2-V3/R1技術(shù)原理
DeepSeek效應(yīng)未來(lái)展望03報(bào)告目錄生成式Al:2014——2024ENIAC
圖靈測(cè)試達(dá)特茅斯會(huì)議ELIZA
Al
寒冬I專家系統(tǒng)Al
寒冬II
統(tǒng)計(jì)方法
NN
再興起1945
1950
1956
19661974-19801980-19871987-1990s1990-2000s2006-AGI...ASI2014
2024生成式AI生成式AI:
使用生成式模型生成各類數(shù)據(jù)
(語(yǔ)言、語(yǔ)音、圖片、視頻等)o
Attention:
數(shù)據(jù)依存關(guān)系建模o
Transformer:
數(shù)據(jù)生成的統(tǒng)一架構(gòu)o
Scaling
Laws:
數(shù)據(jù)學(xué)習(xí)、生成的擴(kuò)展法則oRLHF:生成與人類價(jià)值對(duì)齊的數(shù)據(jù)o
01/R1:生成式求解問(wèn)題——生成問(wèn)題求解的過(guò)程和答案(推理)1414
Feature
Map1.hput
2.Convouional3.RNNmithatenion
4.Wordbymape
Feature
Eatraction
over
theimoge
Attention
Tran2sformerScalingLaws
|GPT-3
2020RLHF|ChatGPT2022Figure
L.Our
modellearmsawords/imagealignment.Thevisualized
attentional
maps(3)are
explained
in
Sctions3.1&5.42024water生成式Al:2014——2024ENIAC
圖靈測(cè)試達(dá)特茅斯會(huì)議
ELIZA
Al寒冬I專家系統(tǒng)Al寒冬II
統(tǒng)計(jì)方法NN再興起1945
1950
1956
19661974-19801980-19871987-1990s1990-2000s2006-AGI...ASI20142024生成式AI生成式Al:
使用生成式模型生成各類數(shù)據(jù)
(語(yǔ)言、語(yǔ)音、圖片、視頻等)o
Attention:
數(shù)據(jù)依存關(guān)系建模o
Transformer:
數(shù)據(jù)生成的統(tǒng)一架構(gòu)o
Scaling
Laws:
數(shù)據(jù)學(xué)習(xí)、生成的擴(kuò)展法則oRLHF:
生成與人類價(jià)值對(duì)齊的數(shù)據(jù)o
01/R1:
生成式求解問(wèn)題——生成復(fù)雜問(wèn)題的答案(推理)2021一ohaScaling
Laws
|GPT-3
2020RLHF|ChatGPT
2022Tran2sformerLakasr
KaisrGoogle
Bninlukanzkatserdgoogle.coAidamN.GomeeUniveniy
of
Torote
aidanecs.toranto,adAshishVaswanlGook
Bainavasvaniegoogle.coLion
Jooes
GoogleResanch
11iontgoogle.ccAttention
Is
All
You
NeedIⅢiPolosukhin111a.polopukhintpai1.c04Jkab
Uokoreir
Research
szogoogle.comNiki
PrmarGoogleRescarchnikipagoogle.cAttentionNoamShareerGoogleBmin
men0gpogle.cPotionBror0生成式Al:2014——2024ENIAC
圖靈測(cè)試達(dá)特茅斯會(huì)議
ELIZAAl寒冬I專家系統(tǒng)Al寒冬II統(tǒng)計(jì)方法NN再興起1945
1950
1956
19661974-19801980-19871987-1990s1990-2000s2006-AGI...ASI2014
2024生成式AI生成式Al:
使用生成式模型生成各類數(shù)據(jù)
(語(yǔ)言、語(yǔ)音、圖片、視頻等)o
Attention:
數(shù)據(jù)依存關(guān)系建模o
Transformer:
數(shù)據(jù)生成的統(tǒng)一架構(gòu)o
Scaling
Laws:
數(shù)據(jù)學(xué)習(xí)、生成的擴(kuò)展法則oRLHF:生成與人類價(jià)值對(duì)齊的數(shù)據(jù)o
01/R1:生成式求解問(wèn)題——生成復(fù)雜問(wèn)題的答案(推理)Attention
Tran2sformerScaling
Laws|GPT-3
2020RLHF|ChatGPT
2022ComputePF-days.non-ermbedding2024Dataset
Size
tokensParametersnon-embeddingTastLoss圖靈測(cè)試達(dá)特茅斯會(huì)議ELIZA
Al寒冬I專家系統(tǒng)
Al寒冬II
統(tǒng)計(jì)方法NN
再興起E1945
1950
1956
19661974-19801980-19871987-1990s1990-2000s2006-20142024生成式AI生成式Al:使用生成式模型生成各類數(shù)據(jù)
(語(yǔ)言、語(yǔ)音、圖片、視頻等)o
Attention:數(shù)據(jù)依存關(guān)系建模o
Transformer:
數(shù)據(jù)生成的統(tǒng)一架構(gòu)o
Scaling
Laws:
數(shù)據(jù)學(xué)習(xí)、生成的擴(kuò)展法則o
RLHF:
生成與人類價(jià)值對(duì)齊的數(shù)據(jù)o
01/R1:生成式求解問(wèn)題——生成復(fù)雜問(wèn)題的答案(推理)Scaling
Laws|GPT-3RLHF|ChatGPT
01/R12020
2022
2024生成式AI:2014——2024andtaaagersd
poky
tanrrd
medetrertooenerlearinyadooithmTran2sformerAGI...ASIAtte01tion生成式Al:2014——2024ENIAC
圖靈測(cè)試達(dá)特茅斯會(huì)議
ELIZAAl寒冬I專家系統(tǒng)Al寒冬II統(tǒng)計(jì)方法NN再興起1945
1950
1956
19661974-19801980-19871987-1990s1990-2000s2006-AGI...ASI20142024生成式AI生成式Al:使用生成式模型生成各類數(shù)據(jù)
(語(yǔ)言、語(yǔ)音、圖片、視頻等)o
Attention:
數(shù)據(jù)依存關(guān)系建模o
Transformer:
數(shù)據(jù)生成的統(tǒng)一架構(gòu)o
Scaling
Laws:
數(shù)據(jù)學(xué)習(xí)、生成的擴(kuò)展法則oRLHF:生成與人類價(jià)值對(duì)齊的數(shù)據(jù)o01/R1:生成式求解問(wèn)題——生成復(fù)雜問(wèn)題的答案(推理)Questiom:Ifa>1,thenthesumof
thereal
solutions
of
va-va+x=xis
equal
toResponse
<think>Tosolvethe
equationva-Va+x=x,let's
start
by
squaring
both…
(va-Va+x)=x2
一a-va+x=x2Rearrangetoisolatethe
innersquare
root
terme(a-x2)2=a+x一a2
-2ax2+(x2)2=a+x一x?-2ax2
-x+(a2
-a)=0…Wait,wait.WaitThat'sanahamoment
Ican
flag
hereLet'sreevaluatethisstepby-step
to
identify
if
the
correct
sum
can
be
Westartedwith
the
equatioreVa-va+x=xFirst,let'ssquare
both
sidesa
-
V
a
+
x
=
x
2
一
V
a
+
x
=
a
-
x
2Next,Icouldsquareboth
sides
again,treating
the
equation:…Attention
Tran2sformerScaling
Laws
GPT-3
2020RLHF|ChatGPT
20222024Trainedtopredictthenext
word
in
a
sentence:Thecat
ischasing
the10010100101010
100101001100101001自然語(yǔ)言處理:人類語(yǔ)言的智能化
處理與分析,使計(jì)算機(jī)具備聽(tīng)、說(shuō)、
讀、寫、譯等人所具備的語(yǔ)言能力語(yǔ)言模型:自然語(yǔ)言統(tǒng)計(jì)建模,簡(jiǎn)
單說(shuō),就是預(yù)測(cè)句子中的下一個(gè)單
詞是什么自然語(yǔ)言處理與語(yǔ)言模型NATURALLANGUAGE
PROCESSING(NLP)
FORARTIFICIALINTELLIGENCEdog
5%mouse70%
squirel
20%
boy
5%house
0%Language
ModelsThikpadmAnthropicAlWebGPTSErnie3.0TitanGopherGLaMBLOOMmTOBLOOMZ
養(yǎng)Galatica
XTO
排
9-10-NAVER11-12\InstructGPTCodeGenMT-NLGOPTGPT-NeoX-20BTk-InstructCohereWeLM
CmTII)wssHUAWEIYuLan-ChatStarCoderCodeGen2ChatGLMFalconPaLM2PythiaVicunaPanGu-ZInternLME2
QwenMistralE2Qwen2DeepSeek-V2XLLaMA3大語(yǔ)言模型:2018——2024OPT-IMLXZhao
et
al.A
Survey
of
Large
Language
Models.arXiv:2303.18223inspur
Yuan
1.0GGY4-6Bard0LLaMA周DeepseekMixtralGMiniCPMGemmaOGG(xSparrowFlan-T5Flan-PaLMLuminousNLLBAl21labs3AAIErnie
3.0Jurassic-1CPM-216
7-12-—
—2024—1-6→GPT-4身
0
LLaMA2PanGu-αHUAWEE2PLUGG
FLANGPT-3身Codex
竊
-
5-8CodeGeeXGLM
AlexaTM智譜·面aG
mT51-4GO◎UL2PaLMYaLM-
2023ChatGPT
身—2019—2020
2021G
T5
GGShardAlphaCodeChinchilla百源開(kāi)照LaMDAPublicly
Available2022I1-3HyperCLOVA7-1011-12.Ai2X對(duì)齊訓(xùn)練數(shù)據(jù)PromptRosponses軟件資源分配docker
kubernetes任務(wù)調(diào)度模型訓(xùn)練預(yù)訓(xùn)練
對(duì)齊訓(xùn)練
SFTRLHFDPOBet
of
N
sam
plingData
Parallel
TensorParallelExpert
Parallel
ZeROPipelineParallelSoquene
ParalelFashAttention數(shù)據(jù)處理和管理Data
Processing
and
Managem
ent算力管理Com
puting
Managem
ent模型評(píng)測(cè)OpenEva
UltraEval
OpenCompassXChatbot
ArenaFlagEval
-
openLMLaderbard知識(shí)能力價(jià)值對(duì)齊負(fù)載均街性能監(jiān)控安全可信專業(yè)領(lǐng)域彈性擴(kuò)展容錯(cuò)機(jī)制大語(yǔ)言模型:技術(shù)棧通用模型行業(yè)模型Specialized
Model行業(yè)模型領(lǐng)域?qū)R訓(xùn)練動(dòng)態(tài)批處理模型量化模型剪枝算子優(yōu)化模型蒸餾性能監(jiān)控處理流程數(shù)據(jù)去重領(lǐng)域分類910B910A質(zhì)量篩選版本控制slurm通用模型General-purposeModel預(yù)訓(xùn)練數(shù)據(jù)數(shù)據(jù)分類行業(yè)對(duì)齊數(shù)據(jù)領(lǐng)域微調(diào)訓(xùn)練應(yīng)用層Application行業(yè)棋型部署
行業(yè)模型評(píng)測(cè)網(wǎng)頁(yè)
書籍
代碼
論文百科品AAMD語(yǔ)營(yíng)檢測(cè)內(nèi)容過(guò)濾M1350M1300模
型
部
著自主規(guī)劃圖文創(chuàng)作智能客服信息檢索H100A100工具調(diào)用代碼生成nVIDIA.評(píng)測(cè)數(shù)據(jù)行業(yè)數(shù)據(jù)硬件Ascond數(shù)據(jù)處理
預(yù)訓(xùn)練
后訓(xùn)練
應(yīng)用部署數(shù)據(jù)治理
基座模型
對(duì)齊模型
紅隊(duì)測(cè)試數(shù)據(jù)要素
自監(jiān)督學(xué)習(xí)
■
微調(diào)&強(qiáng)化
·商業(yè)落地知識(shí)源頭
能力涌現(xiàn)
安全可信
模型壓縮o
訓(xùn)練范式
o關(guān)鍵·預(yù)訓(xùn)練——基座模型
●模型架構(gòu)·后訓(xùn)練——對(duì)齊模型
·訓(xùn)練算法·推理訓(xùn)練——推理模型·
擴(kuò)展法則殺手锏:性能/成本曲線|性價(jià)比大語(yǔ)言模型:生命周期與范式The
bitter
lesson
is
based
on
the
histor-ical
observations
that1)Al
researchers
have
often
tried
to
build
knowledge
intotheir
agents,2)this
always
helps
in
the
short
term,and
is
personally
satisfying
to
the
researcher,but
3)in
the
longrun
it
plateaus
and
even
inhibits
furtherprogress,and
4)breakthrough
progresseventually
arrives
by
an
opposing
ap-
proach
based
on
scaling
computation
bysearch
and
learning.擴(kuò)展法則The
BitterLessonSasha
Rushand
Daniel
Ritter.SpeculationsonTest-TimeScaling.2024[Sutton,2019]Two
Era'sAlignment
PipelinesHumanInstructions(-10k)
PPO
optinizationBase
Model
SFT
Model
AlignedModel成本較低大部分實(shí)驗(yàn)室可做成本高昂(上千萬(wàn))
少數(shù)企業(yè)/實(shí)驗(yàn)室可做NewSynthetieCompletionsxNroundsDPQ,PPORejoction
Samplng.ormutpeoptintzations
Aligned
ModelN+1-
Final
Model大語(yǔ)言模型:后訓(xùn)練范式Aligned
ModelNHumanpreferences(~1M*?)Re-usepreferencepromptsReward
ModelBase
ModelInfercornectHuman+SyntheticInstructions(-1M+7initiatFTtrahingReward
Model/LLMJudge~Llama
3.1/Nemotron~InstructGPTHumanpreferencs(-100k)Re-use
preference
promptsSOpenAIQ*exp
d)Thederoninalorofafactonis7hasthan
3ines
she
munnetactlt4⑧
●urcaltenumr回四④
som
dinaler
is-7.-5eGPTain(l①
wothasme7=25①
a
7四②
x
-14●@
sox-7過(guò)程獎(jiǎng)勵(lì)模型PRMReminder:AlphaZero推理語(yǔ)言模型?Sasha
Rushand
Daniel
Ritter.Speculations
on
Test-TimeScaling.2024MCTS大語(yǔ)言模型發(fā)展路線圖DeepSeek
V2-V3/R1技術(shù)原理DeepSeek效應(yīng)未來(lái)展望03報(bào)告目錄2023.11DeepSeek
V12024.5DeepSeek
V2天邊的兩多云(國(guó)內(nèi)外現(xiàn)狀)o模型架構(gòu):大部分企業(yè)采用已驗(yàn)證架構(gòu)(試錯(cuò)成本高昂)【
不
敢
】o推理模型:大部分實(shí)驗(yàn)室仍在苦苦猜測(cè)摸索Q*/01(OpenAl保密)【
不
知
】DeepSeek:2023——2024.11DeepSeek
R1-Lite2025.01DeepSeek
R12024.12DeepSeek
V3DeepSeek
V2主要?jiǎng)?chuàng)新o
DeepSeekMoEo
MLADeepSeekMoEo
稀疏激活:計(jì)算不隨規(guī)模呈線性增長(zhǎng)o
相比傳統(tǒng)MoE:細(xì)粒度專家(共享+路由)o
路由&通信改造:·
Device-Limited
Routing·Auxiliary
Loss
for
Load
BalanceToken-Dropping
StrategyTransformerBlock×LFeed-Forward
NetworkRMS
NormAttentionDeepSeekMoEOOOO
OO00
Routed
ExpertOutput
Hidden
h{Shared
Expert1
N?
1
2
3
4
N-1
N,Router
dhl
Top-KInput
Hidden
utMulti-Head
LatentAttention(MLA)OCached
During
InferenceOutput
Hidden
u:OOOOOOO0Multi-HeadAttentionOO
0OLatent
c8LatentcOV2規(guī)模:236B
totalparameters,21B
activatedparameters,128K
context
window
Input
Hidden
heOOOO
OO00[lqS;a&concatenatelafapply
RoPE{[kS,;k{](
ooOconcatenatelkE回applyRoPEDeepSeek:
技術(shù)創(chuàng)新——模型架構(gòu)|V2rocrhKeyscompresed
LatentKVQueriesMLA:低秩壓縮,降低KVcache占用空間ached
During
InferenceRMS
NormMuHeadAttention(MHA)!Grouped-queryAtention(GOA)!Mut-queryAttenton(MOA!MuuHeadLatentAttenton(MLA){k
匠{qS3OCoO{vH·Mixtral8x7BCommand
RLLaMA38BO
LLaMA
234BMistral
7BLLaMA133BLLaMA
213BLLaMA
270BO
LLaMA165BLLaMA
1
Family-LLaMA2FamilyLLaMA3
FamilyMixtral
FamilyCommand
R
FamilyQwen1.5
Family(a)
(b)殺手锏:性能/成本曲線|性價(jià)比KVCacheforGeneration(KB/Token)DeepSeek67Breducing
KV
cache
by93.3%100200300400MaximumGenerationThroughput(Tokens/Sec)DeepSeek67BDeepSeek-V20TrainingCosts(KGPU
Hours/T
Tokens)DeepSeek67Bsaving
42.5%oftrainingcosts050
100150DeepSeek-V2Mixtral8x22B
LLaMA
370BCommand
R+DBRXQwen1.572B
ODeepSeek:
技術(shù)創(chuàng)新——模型架構(gòu)|V2訓(xùn)練開(kāi)銷存儲(chǔ)開(kāi)銷生成速度0
20
40
60
80
100ActivatedParameters(Billions)Performance
(MMLU)807570656055576%of
maximum
throughputDeepSeek
67BGrok-1DeepSeek-V2DeepSeek-V2Qwen1.532B01
N?
1
2
3
4N--
1N,Router
dhhlTop-KOO0O
◎
InputHidden
uMulti-Head
Latent
Attention(MLA)OOcachedDuring
InferenceOutput
Hidden
u:
OOOO
OOO0Multi-Head
Attention{[qS,;ql}
O0O
{[k{,;k{J}[concatenatef
conctenotelDeepSeek
V3主要?jiǎng)?chuàng)新o
InfrastructuresoMulti-Token
Prediction
(MTP)Infrastructureso減少流水線氣泡o
高效節(jié)點(diǎn)間All-to-All通信o
FP8訓(xùn)練o
低精度存儲(chǔ)與通信MTP:
一次預(yù)測(cè)多個(gè)topkenTransformerBlock
×LFeed-ForwardNetworkRMS
NormAttentionRMS
NormetMin
MaddouputHeoad-4MTPModue1oupurHeadfranfomerliockMTP
Modube2etoupgHeadTransformerlocktrngfrmerlock×LlherhaFmbeddlguebwr
mtDeepSeek:
技術(shù)創(chuàng)新——模型架構(gòu)|V3V3規(guī)模:671Btotalparameters,37Bactivatedparameters,trainedon14.8TtokensImput
Hiden
h,Oo00'o000OO-0Olatent
c{
LatentcODeepSeekMoEOutput
Hidden
h{93
(qRouted
Expert
Shared
Expertkf|
回applyRoPE后fapplyRoPE{k
匠OO{v&B殺手锏:性能/成本曲線|性價(jià)比DeepSeek:
技術(shù)創(chuàng)新——模型架構(gòu)|V3MMLU
Redux
ZeroEval
Score
VS
Input
API
Price($/1M
Tokens)Training
CostsPre-Training
Context
Extension
Post-TrainingTotalin
H800
GPU
Hours2664K119K5K2788Kin
USD$5.328M$0.238M$0.01M$5.576MTable
1|Training
costs
of
DeepSeek-V3,assumingtherentalprice
of
H800
is
$2per
GPUhour.Duringthe
pre-trainingstate,training
DeepSeek-V3oneachtrilliontokens
requiresonly
180K
H800
GPU
hours,i.e.,
3.7daysonourownclusterwith2048
H800GPUs.Consequently,our
pre-training
stageiscompleted
in
less>E.g.Llama3405B
used30.8M
GPU-hours,while
DeepSeek-V3looksto
be
a
stronger
model
at
only
2.8M
GPU-hours(~11X
less
compute).Super
interesting!And
DeepSeek
was
trained
in
H800's
which
areprobably
also
a
tad
(or
noticeably?)slower
than
Meta's
H100's.大規(guī)模高性能加速器
(折舊)大模型研發(fā)人員成本大模型架構(gòu)技術(shù)探索成本
大模型數(shù)據(jù)成本大模型最終訓(xùn)練成本DeepSeek:
技術(shù)創(chuàng)新——模型架構(gòu)|V3
成本殺手锏:性能/成本曲線|性價(jià)比thantwomonthsandcosts2664K
GPU
hours.SebastianRaschka@rasbt大模型部署推理成本大模型研發(fā)成本成本-岡DeepSeek
V2-V3及R1在模型架構(gòu)上選擇稀疏MoE模型而非稠密模型,并進(jìn)行和積累了大量技術(shù)創(chuàng)新,包括MLA、FP8
訓(xùn)練、MoE
All-to-AlI通信瓶頸解決、MTP等,
這些技術(shù)并不是所有都是原始創(chuàng)新,但是能夠進(jìn)行如此多大模型架構(gòu)底層創(chuàng)新的實(shí)驗(yàn)室,在全世界可能也只有少數(shù)幾個(gè);DeepSeek
所有模型架構(gòu)上的創(chuàng)新均是圍繞“降本增效”:在基本不損害性能前提
下,盡可能通過(guò)算法挖掘和提升硬件訓(xùn)練和解碼效率美國(guó)采取芯片禁令(全球三級(jí)管控)策略維持自己的Al領(lǐng)導(dǎo)地位,DeepSeek
算法繞過(guò)了美國(guó)的算力護(hù)城河DeepSeek:
技術(shù)創(chuàng)新——?jiǎng)?chuàng)新程度DeepSeek
R1主要?jiǎng)?chuàng)新o
DeepSeek-R1-Zero:
大規(guī)模RL
訓(xùn)練,發(fā)現(xiàn)了RL
訓(xùn)練的Scaling
Laws,RL訓(xùn)練涌現(xiàn)“aha”時(shí)刻o
推理模型訓(xùn)練技術(shù)框架:
4步法,有效解決了R1-Zero
存在問(wèn)題,將推理與對(duì)齊合為一體o
強(qiáng)化學(xué)習(xí)訓(xùn)練框架:GRPO,
來(lái)
自DeepSeekMath,降低了強(qiáng)化學(xué)習(xí)訓(xùn)練成本o
推理模型蒸餾:
將大模型推理能力蒸餾到小模型,優(yōu)于小模型直接進(jìn)行推理訓(xùn)練(規(guī)模效應(yīng))為什么MCTS+PRM是“誤區(qū)”o
The
bitter
lesson:scalabilityo
OpenAl
競(jìng)爭(zhēng)策略DeepSeek:
技術(shù)創(chuàng)新——推理模型|R1DeepSeek:
技術(shù)創(chuàng)新——推理模型|R1-ZeroLarge-scale
Reasoning-OrientedReinforcement
Learning3.通過(guò)prompt
策略引導(dǎo)模型思考和給出答案,避免基座
模型不能生成停止符使用標(biāo)記<think></think><answer></answer>R1-Zero存在問(wèn)題:poorreadability,language
mixingstepsDeepSeek-v3-Base
DeepSeek-R1-Zero2.RL
Training
Scaling
Law:
涌現(xiàn)reflection
、aha自動(dòng)涌現(xiàn)出搜索、反思、頓悟、糾錯(cuò)與testing-time
scaling
law—致,可從性能增長(zhǎng)曲線和長(zhǎng)
度增長(zhǎng)曲線推出推理時(shí)scaling
lawA
conversation
between
User
and
Assistant.The
user
asks
a
question,and
the
Assistant
solves
it.The
assistant
firstthinks
aboutthereasoningprocess
inthe
mind
and
then
provides
the
userwith
the
answer.The
reasoning
process
and
answer
are
enclosed
within
<think></think>and<answer></answer>tags,respectively,i.e,<think>reasoningprocesshere</think><answer>answerhere</answer>.User:prompt.Assistant:Table1|TemplateforDeepSeek-R1-Zero.promptwillbe
replaced
with
the
specific
reasoningquestion
during
training.1.強(qiáng)化學(xué)習(xí)訓(xùn)練規(guī)模大業(yè)內(nèi)通常訓(xùn)練幾十RL
steps,DeepSeek訓(xùn)練幾千RL
Tülu
3最大發(fā)布模型只訓(xùn)練了~50RL
stepsKerconnects.ai/p/deepseek-r1-recipe-for-01Fgure3/The
average
rspone
lknghfDwpskRI-Zme
onthe
trainingstduring
theRLpoces.DepSok-R1-ZronuhuralylearstosbvereasoningtaskswihmarethinkngtimeFgum2|AIMEaecurayafDwpskRI-ZcmduringtrainingForeachquotbnwsmple16responsesandakuletheowrallawerageaccuncytoensure
astulleevaluation.stepsDeepSeek-V3-base(200K
samples)Step3.
RejectionSamplingSFT
3/4reasoning
data(600K)1/4
general
instruction
data
(200K)Reasoning
Data長(zhǎng)CoT
數(shù)據(jù)General-Purpose
ModelDeepSeek-R1Step
0.GeneratingLong
CoT
data
Step
4.General
RLRLHF
Preference
Tuning
with
safety
rewardso
DeepSeek-R1
不是唯一的推理模型框架,2025年將出現(xiàn)更多新的框架o
要復(fù)現(xiàn)上述框架,需要DeepSeek
開(kāi)源相關(guān)數(shù)據(jù)Step
2.Reasoning-orientedRLStep3
Reasoning
Data
類似訓(xùn)練R1-Zero
Math,Code,Logic直至訓(xùn)練收斂
(600K
samples)Few-shot
ICL+
人工后期refining
Reasoning
RL
with
rule-based
rewardsDeepSeek:
技術(shù)創(chuàng)新——推理模型|R1
Recipe大規(guī)模強(qiáng)化學(xué)習(xí)DeepSeek-R1-Zero
中間推理模型Step
3
Instruction
DataWriting,QA,trans,etc.SFTCheckpoint
RL-tuned
ModelStep1.
ReasoningSFT
Cold
Start1.強(qiáng)化學(xué)習(xí)框架GRPO(DeepSeekMath)采用蒙特卡洛采用估算以取代Value模型,降低
計(jì)算和存儲(chǔ)開(kāi)銷2.強(qiáng)化學(xué)習(xí)獎(jiǎng)勵(lì)模型o
采用easily
verifiable
rewards rewardo
避免過(guò)程獎(jiǎng)勵(lì)模型:計(jì)算復(fù)雜,容易reward●·AccuracyReferenceModelRewardModelqTrained
ModelsFrozenModelsA?A?AFigure4|DemonstrationofPPOandourGRPO.GRPOforegoesthevaluemodel,insteadestimatingthebaselinefromgroupscores,significantlyreducingtrainingresources.hackingDeepSeek:
技術(shù)創(chuàng)新——推理模型|RLFormat
reward·Language-consistency
rewardKLReference
ModelReward
ModelGroupComputationGAE
APolicyModelPolicyModelValue
Model0?0?0GRPO◆田-
rPPOr1T?|KLrgqV0Qwen2.5-Math-1.5B,SFTQwen2.5-14B,Qwen2.5-32B,Llama-3.1-8B,andStep
3
Reasoning
DataMath,Code,Logic(600K
samples)Step
3
Instruction
DataWriting,QA,trans,etc.(200K
samples)推理模型蒸餾到小模型o
reasoning能力可以蒸餾到小模型o
大模型蒸餾到小模型優(yōu)于小模型直接通過(guò)大規(guī)模RL訓(xùn)練o
再次驗(yàn)證了模型規(guī)模在AGI發(fā)展中的重要性o推理者同樣需要規(guī)模支撐DeepSeek:
技術(shù)創(chuàng)新——推理模型|推理能力蒸餾DeepSeek-R1-Distill-Qwen2.5DeepSeek-R1-Distill-LlamaLlama-3.3-70B-InstructQwen2.5-Math-7B,DeepSeekvs
OpenAICreated
by
pc■openAl-o1-1217MMLUDlamond囊uygnSource:DeepSeek
OHiclar
Website
Morecharts:殺手锏:性能/成本曲線|性價(jià)比DeepSeek:
技術(shù)創(chuàng)新——推理模型|R1Pricing:InputandOutput
PricesUSD
per
1MTokens■Input
price
■Output
pricecodstorceAcompectve
erooramming
plottormshere
coders
solheMATH-500AcolecticnAIME2024Amothor500
toughmothprobemssWE-bench
VerifiedDeepSeek-R1ondreosonlngAtest
ofGPOAModelsLogicalLevel
1Level
2Level
3OpenSource?Model
SizeDeepSeek-R1(API)76.10%90.48%77.14%61.70%Yes671BDeepSeck-R1(網(wǎng)頁(yè))74.84%80.95%78.57%63.83%Yes671Bol-preview72.33%88.10%74.29%55.32%NoundisclosedDeepSeek-R1(非官方API-together)70.44%80.95%78.57%48.94%Yes671BQwQ-32B63.52%73.81%70.00%44.68%Yes32Bhunyuan-turbo-latest62.26%85.71%65.71%36.17%NoundisclosedGLM-Zero-preview61.64%71.43%71.43%38.30%NoundisclosedDoubao-pro-32k61.01%83.33%62.86%38.30%NoundisclosedYi-Lightning52.83%64.29%60.00%31.91%NoundisclosedDeepSeek-V2.5-121049.69%69.05%57.14%21.28%YesundisclosedErnie-4.0-Turbo-8k49.06%66.67%54.29%25.53%NoundisclosedDeepSeek-V349.06%66.67%52.86%27.66%Yes671BSenseChat-5-120247.17%64.29%50.00%27.66%NoundisclosedGPT-4-Turbo42.77%57.14%48.57%21.28%NoundisclosedSpark4.0Ultra39.62%57.14%44.29%17.02%NoundisclosedMoonshot-v1-32k38.99%45.24%48.57%19.15%NoundisclosedGPT-3.5-Turbo29.56%35.71%35.71%14.89%NoundisclosedDeepSeek-R1(網(wǎng)頁(yè))平均思考時(shí)間Average
Times(s)AllCorrectWrongOverall147.26100.69285.83Level
183.5763.88167.25Level
2132.4991.98281.00Level3226.19158.37345.88DeepSeek:
技術(shù)創(chuàng)新——推理模型|R1TJUNLP實(shí)測(cè)DeepSeek-R1邏輯推理性能DeepSeek
R1是在探明方向
(OpenAl
o1引領(lǐng)和證實(shí)的方向)上進(jìn)行0-1的創(chuàng)新突破,獨(dú)立探索出基于大規(guī)模強(qiáng)化學(xué)習(xí)的大語(yǔ)言模型推理技術(shù)路線,避開(kāi)了過(guò)去一年
多(
自O(shè)penAl
的Q*
在社交媒體討論)業(yè)內(nèi)廣泛思索的通過(guò)在訓(xùn)練中進(jìn)行顯式搜索、
過(guò)程獎(jiǎng)勵(lì)模型(即Search+PRM)
實(shí)現(xiàn)推理的“誤區(qū)”;貢
獻(xiàn)
:o獨(dú)立探索出推理技術(shù)路線o將技術(shù)路線公開(kāi)發(fā)布(解惑了業(yè)內(nèi)的“不知”)o模型開(kāi)源
(MITLicense)Dee
pSeek
R1打破了美國(guó)第一梯隊(duì)企業(yè)以閉源形成的技術(shù)護(hù)城河,進(jìn)一步動(dòng)搖
了美國(guó)的"AIDominance"DeepSeek:
技術(shù)創(chuàng)新——?jiǎng)?chuàng)新程度大語(yǔ)言模型發(fā)展路線圖DeepSeek
V2-V3/R1技術(shù)原理DeepSeek
效應(yīng)未來(lái)展望報(bào)告目錄Overnight,Microsoft,NVIDIA,andAmazonallconnectedtoDeepSeek!
Andrew
Ng:Al
inChina
isonthe
rise.New
InteligenoeSouree·Jan3116:49
文/AMSFT+0.35%
NVDA+1.71%AMZN
+195%
口unusual_whales@unusualwhales·Jan28BREAKING:Thisisnotamemecoin.Thisis
Nvidia,SNVDA,themostvaluablecompanyin
the
word
before
today.t
isdown
17%.It
lost$560
billion
in
marketcaptodaysofar,thelargest
in
market
history.<AppsTop
Charts
All
AppsFree
Apps
Paid
AppsMicrosoft,NVIDIA,andAmazonembraceDeepSeekR1,alongwithUSACloud
Computingplatforms.Andrew
NgandtheformerCEOofIntelpraise
DeepSeek's
innovativecapabilities.開(kāi)源vs
閉
源創(chuàng)新&人才&Vision2
ChatGPTThe
official
app
by
OpenAI3
ThreadsConnect
andshare
ideasDeepSeek:
效應(yīng)OnthelastdayofJanuary,theenthusiasmfrom
DeepSeekshows
nosigns
of
waning.算力價(jià)格戰(zhàn)認(rèn)知誤區(qū)1
DeepSeek-AlAssistantIntelligent
AIAssistantOpenme17so.00s2.00s?.00S?.00s?.00$10.00$12.00
$14.00
$16.00
$18.00
S20.00
$22.00
$24.00
S26.00
S28.00
$30.00Price(USD
per
MTokens)產(chǎn)品:性價(jià)比永遠(yuǎn)是王道
技術(shù)也是如此
數(shù)百億美元構(gòu)建的前沿技術(shù)護(hù)城河一夜間被攻破DeepSeek
R1DeepSeek
V3Gemini1.5
Pro(Sep)
Qwen2.5
MaxLlama
3/370B
01-minio3-mini
Claude
3.5
Sonnet
(Oct)GPT-40(Nov'24)Mistral
Large
2(Nov
24)DeepSeek:
效應(yīng)——算力價(jià)格戰(zhàn)AArtificial
Analysis015550-Llama
3.18B95-90-85-80-75-70-65-60-ArtificialAnalysisQualityIndexGPT-3
選擇閉源之后,大模型開(kāi)源vs
閉源之爭(zhēng)、之戰(zhàn)一直存在DeepSeek
R1的開(kāi)源發(fā)布,一舉趕超閉源大模型,是大模型開(kāi)源史上的里程碑美國(guó)Al第一梯隊(duì)企業(yè)的前沿技術(shù)封閉被打破開(kāi)源vs
閉源不僅涉及技術(shù)的公開(kāi)性,也關(guān)乎Al安全治理Thiscoderepositoryandthemodelweightsare
licensed
underthe
MIT
License.DeepSeek-R1series
supportcommercialuse,allowforanymodificationsandderivativeworks,including,butnotlimitedto,distillationfor
trainingotherLLMs.Pleasenote
that:·DeepSeek-R1-Distill-Qwen-1.5B,DeepSeek-R1-Distll-Qwen-7B,DeepSeek-R1-Distill-Qwen-14BandDeepSeek-R1-Distil-Qwen-32BarederivedfromQwen-2.5series,whichareoriginallylicensedunderApache2.0
License,andnowfinetunedwith800ksamplescuratedwith
DeepSeek-R1.·DeepSeek-R1-Distll-Llama-8BisderivedfromLlama3.1-8B-Baseandis
originally
licensed
under
llama3.1
license.·DeepSeek-R1-Distll-Llama-70BisderivedfromLlama3.3-70B-Instructandis
originally
licensed
underlama3.3license.samaltmancO-HOST
·
4dagoyes,wearediscussing.ipersonallythinkwe
have
been
on
the
wrong
side
of
history
here
and
need
tofigureouta
different
open
source
strategy;not
everyone
at
openai
shares
this
view,and
it's
also
notourcurrenthighest
priority.白個(gè)
5
1
0
凸
2Share
…DeepSeek:
效應(yīng)——開(kāi)源
vs
閉源lolzinventor
·5d
agoWould
you
consider
releasing
some
model
weights,and
publishing
some
research?曰
164Award
Share
…OpenAI
CEO
Sam
Altman
IVerified7.LicenseDeepSeek:
效應(yīng)——認(rèn)知誤區(qū)如果ChatGPT刷新了我們對(duì)Al的認(rèn)知,那么DeepSeek在某種程度上顛覆了:o
美國(guó)人對(duì)中國(guó)Al水平的認(rèn)知:長(zhǎng)久以來(lái),美國(guó)認(rèn)為中國(guó)在Al科技創(chuàng)新上更多是跟隨者角色o
大模型研發(fā)成本的認(rèn)知:大模型研發(fā)成本需要數(shù)千萬(wàn)乃至上億美元14.中
國(guó)
版
的Sora
模型何時(shí)到來(lái),可以看中國(guó)
版
的ChatGPT
何時(shí)到來(lái)。過(guò)去
一年,國(guó)內(nèi)
大語(yǔ)言模型發(fā)展迅速,甚至出現(xiàn)了百模大戰(zhàn)
的熱鬧景象,但“熱鬧”較多的是同質(zhì)化競(jìng)
爭(zhēng),較少的是底層基礎(chǔ)技術(shù)的原創(chuàng)性突破。15.
國(guó)內(nèi)和國(guó)外大模型的差距不在于模型能
力高低,也不在于應(yīng)用,而在于底層核心技
術(shù)
。而底層核心技術(shù)突破的最主要障礙不是
算力受限,也不是數(shù)據(jù)規(guī)模和質(zhì)量受限,而
是缺乏足夠數(shù)量的具有技術(shù)遠(yuǎn)見(jiàn)、敢于技術(shù)
冒險(xiǎn)的大模型人才。16
.
大模型技術(shù)仍然在不斷發(fā)展和突破中,
未來(lái)格局存在很多變數(shù)。大模型頂尖人才技術(shù)型人才:銳意進(jìn)行大模型底層技術(shù)創(chuàng)
新和冒險(xiǎn)(第
一
類人才)戰(zhàn)略型人才
:具有AGI技術(shù)遠(yuǎn)見(jiàn)和vision
(第二類人才)為鞏固并提升我國(guó)在這
一
領(lǐng)域的
國(guó)際競(jìng)爭(zhēng)力,可以從以下布局和規(guī)劃
著手。第
一
,進(jìn)
一
步提升以大模型為
代表的前沿人工智能在國(guó)家科技和產(chǎn)
業(yè)發(fā)展中的戰(zhàn)略地位,成立人工智能
工
作
小
組
,
領(lǐng)
導(dǎo)AI
產(chǎn)
研
咨
詢
委
員
會(huì)
,
統(tǒng)籌資源,制定AI
政策和計(jì)劃,推進(jìn)
人
工
智
能
技
術(shù)
創(chuàng)
新
和
產(chǎn)
業(yè)
發(fā)
展。
第
二,重點(diǎn)規(guī)劃和建設(shè)前沿人工智能相
關(guān)的國(guó)家基礎(chǔ)設(shè)施,包括超級(jí)智算網(wǎng)
絡(luò)、通用及行業(yè)數(shù)據(jù)基礎(chǔ)設(shè)施、大規(guī)
模人工智能軟件基礎(chǔ)平臺(tái)、人工智能
安全與測(cè)評(píng)基礎(chǔ)設(shè)施、大模型開(kāi)源平
臺(tái)等。第三,開(kāi)展大模型關(guān)鍵理論和
技術(shù)攻關(guān),啃硬骨頭,探新疆域,研
發(fā)
經(jīng)
得
起
實(shí)
踐
考
驗(yàn)
的
硬
核
技
術(shù)
。
第
四,
培
育
和
建
立
大
模
型
創(chuàng)
新
發(fā)
展
生
態(tài),形成大模型技術(shù)創(chuàng)新氛圍,鼓勵(lì)
耐心資本敢投廣投大模型硬核技術(shù)創(chuàng)
業(yè)企業(yè)。第五,重視人工智能人才培
養(yǎng)和成長(zhǎng),培養(yǎng)
一
批具有長(zhǎng)遠(yuǎn)眼光
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年中國(guó)雙歧三聯(lián)活菌散劑市場(chǎng)調(diào)查研究報(bào)告
- 如何做好幼兒園保育工作
- 2025年橡膠充氣、減震制品合作協(xié)議書
- 2025年中國(guó)剪茶機(jī)數(shù)據(jù)監(jiān)測(cè)研究報(bào)告
- 學(xué)校操場(chǎng)周邊休息室裝修設(shè)計(jì)
- 2025年自動(dòng)化生產(chǎn)單元項(xiàng)目投資風(fēng)險(xiǎn)評(píng)估報(bào)告
- 2025年膠槍熱熔膠項(xiàng)目投資風(fēng)險(xiǎn)評(píng)估報(bào)告
- 太原古代戰(zhàn)爭(zhēng)遺址及其旅游價(jià)值
- 培養(yǎng)孩子空間想象力的數(shù)學(xué)方法研究
- 學(xué)校心理服務(wù)隊(duì)伍建設(shè)與管理策略
- 2025年哈爾濱傳媒職業(yè)學(xué)院?jiǎn)握新殬I(yè)技能測(cè)試題庫(kù)新版
- (一模)贛州市2025年高三年級(jí)摸底考試地理試卷(含答案詳解)
- 2025屆武漢市二調(diào)數(shù)學(xué)質(zhì)量分析正式版【課件】
- 2025年山東省職教高考《英語(yǔ)》高頻必練考試題庫(kù)400題(含答案)
- 老年骨質(zhì)疏松性疼痛診療與管理中國(guó)專家共識(shí)2024解讀課件
- 湖北省新八校協(xié)作體2024-2025學(xué)年高三下學(xué)期2月聯(lián)考數(shù)學(xué)試題 含解析
- 智能教學(xué)工具在小學(xué)課堂的應(yīng)用
- 2025年合肥市公安局第一批招考聘用警務(wù)輔助人員591人高頻重點(diǎn)提升(共500題)附帶答案詳解
- 2024年常德職業(yè)技術(shù)學(xué)院?jiǎn)握新殬I(yè)適應(yīng)性測(cè)試題庫(kù)
- 2024-2024年上海市高考英語(yǔ)試題及答案
- 2023年全國(guó)高考體育單招考試英語(yǔ)試卷試題真題(精校打印版)
評(píng)論
0/150
提交評(píng)論