DeepSeek原理與效應(yīng)+DeepSeek深度分析解讀_第1頁(yè)
DeepSeek原理與效應(yīng)+DeepSeek深度分析解讀_第2頁(yè)
DeepSeek原理與效應(yīng)+DeepSeek深度分析解讀_第3頁(yè)
DeepSeek原理與效應(yīng)+DeepSeek深度分析解讀_第4頁(yè)
DeepSeek原理與效應(yīng)+DeepSeek深度分析解讀_第5頁(yè)
已閱讀5頁(yè),還剩70頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

深度解讀DeepSeek:

原理與效應(yīng)熊德意天津大學(xué)dyxiong@https://dyxiong.github.iohttps://tjunlp-lab.github.io仁文伏羲

伏羲傳語(yǔ)天津大學(xué)自然語(yǔ)言處理實(shí)驗(yàn)室The

Natural

Language

Processing

Laboratory

at

Tianjin

UniversityOpenEval營(yíng)大語(yǔ)言模型發(fā)展路線圖DeepSeek

V2-V3/R1技術(shù)原理

DeepSeek效應(yīng)未來(lái)展望03報(bào)告目錄生成式Al:2014——2024ENIAC

圖靈測(cè)試達(dá)特茅斯會(huì)議ELIZA

Al

寒冬I專家系統(tǒng)Al

寒冬II

統(tǒng)計(jì)方法

NN

再興起1945

1950

1956

19661974-19801980-19871987-1990s1990-2000s2006-AGI...ASI2014

2024生成式AI生成式AI:

使用生成式模型生成各類數(shù)據(jù)

(語(yǔ)言、語(yǔ)音、圖片、視頻等)o

Attention:

數(shù)據(jù)依存關(guān)系建模o

Transformer:

數(shù)據(jù)生成的統(tǒng)一架構(gòu)o

Scaling

Laws:

數(shù)據(jù)學(xué)習(xí)、生成的擴(kuò)展法則oRLHF:生成與人類價(jià)值對(duì)齊的數(shù)據(jù)o

01/R1:生成式求解問(wèn)題——生成問(wèn)題求解的過(guò)程和答案(推理)1414

Feature

Map1.hput

2.Convouional3.RNNmithatenion

4.Wordbymape

Feature

Eatraction

over

theimoge

Attention

Tran2sformerScalingLaws

|GPT-3

2020RLHF|ChatGPT2022Figure

L.Our

modellearmsawords/imagealignment.Thevisualized

attentional

maps(3)are

explained

in

Sctions3.1&5.42024water生成式Al:2014——2024ENIAC

圖靈測(cè)試達(dá)特茅斯會(huì)議

ELIZA

Al寒冬I專家系統(tǒng)Al寒冬II

統(tǒng)計(jì)方法NN再興起1945

1950

1956

19661974-19801980-19871987-1990s1990-2000s2006-AGI...ASI20142024生成式AI生成式Al:

使用生成式模型生成各類數(shù)據(jù)

(語(yǔ)言、語(yǔ)音、圖片、視頻等)o

Attention:

數(shù)據(jù)依存關(guān)系建模o

Transformer:

數(shù)據(jù)生成的統(tǒng)一架構(gòu)o

Scaling

Laws:

數(shù)據(jù)學(xué)習(xí)、生成的擴(kuò)展法則oRLHF:

生成與人類價(jià)值對(duì)齊的數(shù)據(jù)o

01/R1:

生成式求解問(wèn)題——生成復(fù)雜問(wèn)題的答案(推理)2021一ohaScaling

Laws

|GPT-3

2020RLHF|ChatGPT

2022Tran2sformerLakasr

KaisrGoogle

Bninlukanzkatserdgoogle.coAidamN.GomeeUniveniy

of

Torote

aidanecs.toranto,adAshishVaswanlGook

Bainavasvaniegoogle.coLion

Jooes

GoogleResanch

11iontgoogle.ccAttention

Is

All

You

NeedIⅢiPolosukhin111a.polopukhintpai1.c04Jkab

Uokoreir

Google

Research

szogoogle.comNiki

PrmarGoogleRescarchnikipagoogle.cAttentionNoamShareerGoogleBmin

men0gpogle.cPotionBror0生成式Al:2014——2024ENIAC

圖靈測(cè)試達(dá)特茅斯會(huì)議

ELIZAAl寒冬I專家系統(tǒng)Al寒冬II統(tǒng)計(jì)方法NN再興起1945

1950

1956

19661974-19801980-19871987-1990s1990-2000s2006-AGI...ASI2014

2024生成式AI生成式Al:

使用生成式模型生成各類數(shù)據(jù)

(語(yǔ)言、語(yǔ)音、圖片、視頻等)o

Attention:

數(shù)據(jù)依存關(guān)系建模o

Transformer:

數(shù)據(jù)生成的統(tǒng)一架構(gòu)o

Scaling

Laws:

數(shù)據(jù)學(xué)習(xí)、生成的擴(kuò)展法則oRLHF:生成與人類價(jià)值對(duì)齊的數(shù)據(jù)o

01/R1:生成式求解問(wèn)題——生成復(fù)雜問(wèn)題的答案(推理)Attention

Tran2sformerScaling

Laws|GPT-3

2020RLHF|ChatGPT

2022ComputePF-days.non-ermbedding2024Dataset

Size

tokensParametersnon-embeddingTastLoss圖靈測(cè)試達(dá)特茅斯會(huì)議ELIZA

Al寒冬I專家系統(tǒng)

Al寒冬II

統(tǒng)計(jì)方法NN

再興起E1945

1950

1956

19661974-19801980-19871987-1990s1990-2000s2006-20142024生成式AI生成式Al:使用生成式模型生成各類數(shù)據(jù)

(語(yǔ)言、語(yǔ)音、圖片、視頻等)o

Attention:數(shù)據(jù)依存關(guān)系建模o

Transformer:

數(shù)據(jù)生成的統(tǒng)一架構(gòu)o

Scaling

Laws:

數(shù)據(jù)學(xué)習(xí)、生成的擴(kuò)展法則o

RLHF:

生成與人類價(jià)值對(duì)齊的數(shù)據(jù)o

01/R1:生成式求解問(wèn)題——生成復(fù)雜問(wèn)題的答案(推理)Scaling

Laws|GPT-3RLHF|ChatGPT

01/R12020

2022

2024生成式AI:2014——2024andtaaagersd

poky

tanrrd

medetrertooenerlearinyadooithmTran2sformerAGI...ASIAtte01tion生成式Al:2014——2024ENIAC

圖靈測(cè)試達(dá)特茅斯會(huì)議

ELIZAAl寒冬I專家系統(tǒng)Al寒冬II統(tǒng)計(jì)方法NN再興起1945

1950

1956

19661974-19801980-19871987-1990s1990-2000s2006-AGI...ASI20142024生成式AI生成式Al:使用生成式模型生成各類數(shù)據(jù)

(語(yǔ)言、語(yǔ)音、圖片、視頻等)o

Attention:

數(shù)據(jù)依存關(guān)系建模o

Transformer:

數(shù)據(jù)生成的統(tǒng)一架構(gòu)o

Scaling

Laws:

數(shù)據(jù)學(xué)習(xí)、生成的擴(kuò)展法則oRLHF:生成與人類價(jià)值對(duì)齊的數(shù)據(jù)o01/R1:生成式求解問(wèn)題——生成復(fù)雜問(wèn)題的答案(推理)Questiom:Ifa>1,thenthesumof

thereal

solutions

of

va-va+x=xis

equal

toResponse

<think>Tosolvethe

equationva-Va+x=x,let's

start

by

squaring

both…

(va-Va+x)=x2

一a-va+x=x2Rearrangetoisolatethe

innersquare

root

terme(a-x2)2=a+x一a2

-2ax2+(x2)2=a+x一x?-2ax2

-x+(a2

-a)=0…Wait,wait.WaitThat'sanahamoment

Ican

flag

hereLet'sreevaluatethisstepby-step

to

identify

if

the

correct

sum

can

be

Westartedwith

the

equatioreVa-va+x=xFirst,let'ssquare

both

sidesa

-

V

a

+

x

=

x

2

V

a

+

x

=

a

-

x

2Next,Icouldsquareboth

sides

again,treating

the

equation:…Attention

Tran2sformerScaling

Laws

GPT-3

2020RLHF|ChatGPT

20222024Trainedtopredictthenext

word

in

a

sentence:Thecat

ischasing

the10010100101010

100101001100101001自然語(yǔ)言處理:人類語(yǔ)言的智能化

處理與分析,使計(jì)算機(jī)具備聽(tīng)、說(shuō)、

讀、寫、譯等人所具備的語(yǔ)言能力語(yǔ)言模型:自然語(yǔ)言統(tǒng)計(jì)建模,簡(jiǎn)

單說(shuō),就是預(yù)測(cè)句子中的下一個(gè)單

詞是什么自然語(yǔ)言處理與語(yǔ)言模型NATURALLANGUAGE

PROCESSING(NLP)

FORARTIFICIALINTELLIGENCEdog

5%mouse70%

squirel

20%

boy

5%house

0%Language

ModelsThikpadmAnthropicAlWebGPTSErnie3.0TitanGopherGLaMBLOOMmTOBLOOMZ

養(yǎng)Galatica

XTO

9-10-NAVER11-12\InstructGPTCodeGenMT-NLGOPTGPT-NeoX-20BTk-InstructCohereWeLM

CmTII)wssHUAWEIYuLan-ChatStarCoderCodeGen2ChatGLMFalconPaLM2PythiaVicunaPanGu-ZInternLME2

QwenMistralE2Qwen2DeepSeek-V2XLLaMA3大語(yǔ)言模型:2018——2024OPT-IMLXZhao

et

al.A

Survey

of

Large

Language

Models.arXiv:2303.18223inspur

Yuan

1.0GGY4-6Bard0LLaMA周DeepseekMixtralGMiniCPMGemmaOGG(xSparrowFlan-T5Flan-PaLMLuminousNLLBAl21labs3AAIErnie

3.0Jurassic-1CPM-216

7-12-—

—2024—1-6→GPT-4身

0

LLaMA2PanGu-αHUAWEE2PLUGG

FLANGPT-3身Codex

-

5-8CodeGeeXGLM

AlexaTM智譜·面aG

mT51-4GO◎UL2PaLMYaLM-

2023ChatGPT

身—2019—2020

2021G

T5

GGShardAlphaCodeChinchilla百源開(kāi)照LaMDAPublicly

Available2022I1-3HyperCLOVA7-1011-12.Ai2X對(duì)齊訓(xùn)練數(shù)據(jù)PromptRosponses軟件資源分配docker

kubernetes任務(wù)調(diào)度模型訓(xùn)練預(yù)訓(xùn)練

對(duì)齊訓(xùn)練

SFTRLHFDPOBet

of

N

sam

plingData

Parallel

TensorParallelExpert

Parallel

ZeROPipelineParallelSoquene

ParalelFashAttention數(shù)據(jù)處理和管理Data

Processing

and

Managem

ent算力管理Com

puting

Managem

ent模型評(píng)測(cè)OpenEva

UltraEval

OpenCompassXChatbot

ArenaFlagEval

-

openLMLaderbard知識(shí)能力價(jià)值對(duì)齊負(fù)載均街性能監(jiān)控安全可信專業(yè)領(lǐng)域彈性擴(kuò)展容錯(cuò)機(jī)制大語(yǔ)言模型:技術(shù)棧通用模型行業(yè)模型Specialized

Model行業(yè)模型領(lǐng)域?qū)R訓(xùn)練動(dòng)態(tài)批處理模型量化模型剪枝算子優(yōu)化模型蒸餾性能監(jiān)控處理流程數(shù)據(jù)去重領(lǐng)域分類910B910A質(zhì)量篩選版本控制slurm通用模型General-purposeModel預(yù)訓(xùn)練數(shù)據(jù)數(shù)據(jù)分類行業(yè)對(duì)齊數(shù)據(jù)領(lǐng)域微調(diào)訓(xùn)練應(yīng)用層Application行業(yè)棋型部署

行業(yè)模型評(píng)測(cè)網(wǎng)頁(yè)

書籍

代碼

論文百科品AAMD語(yǔ)營(yíng)檢測(cè)內(nèi)容過(guò)濾M1350M1300模

著自主規(guī)劃圖文創(chuàng)作智能客服信息檢索H100A100工具調(diào)用代碼生成nVIDIA.評(píng)測(cè)數(shù)據(jù)行業(yè)數(shù)據(jù)硬件Ascond數(shù)據(jù)處理

預(yù)訓(xùn)練

后訓(xùn)練

應(yīng)用部署數(shù)據(jù)治理

基座模型

對(duì)齊模型

紅隊(duì)測(cè)試數(shù)據(jù)要素

自監(jiān)督學(xué)習(xí)

微調(diào)&強(qiáng)化

·商業(yè)落地知識(shí)源頭

能力涌現(xiàn)

安全可信

模型壓縮o

訓(xùn)練范式

o關(guān)鍵·預(yù)訓(xùn)練——基座模型

●模型架構(gòu)·后訓(xùn)練——對(duì)齊模型

·訓(xùn)練算法·推理訓(xùn)練——推理模型·

擴(kuò)展法則殺手锏:性能/成本曲線|性價(jià)比大語(yǔ)言模型:生命周期與范式The

bitter

lesson

is

based

on

the

histor-ical

observations

that1)Al

researchers

have

often

tried

to

build

knowledge

intotheir

agents,2)this

always

helps

in

the

short

term,and

is

personally

satisfying

to

the

researcher,but

3)in

the

longrun

it

plateaus

and

even

inhibits

furtherprogress,and

4)breakthrough

progresseventually

arrives

by

an

opposing

ap-

proach

based

on

scaling

computation

bysearch

and

learning.擴(kuò)展法則The

BitterLessonSasha

Rushand

Daniel

Ritter.SpeculationsonTest-TimeScaling.2024[Sutton,2019]Two

Era'sAlignment

PipelinesHumanInstructions(-10k)

PPO

optinizationBase

Model

SFT

Model

AlignedModel成本較低大部分實(shí)驗(yàn)室可做成本高昂(上千萬(wàn))

少數(shù)企業(yè)/實(shí)驗(yàn)室可做NewSynthetieCompletionsxNroundsDPQ,PPORejoction

Samplng.ormutpeoptintzations

Aligned

ModelN+1-

Final

Model大語(yǔ)言模型:后訓(xùn)練范式Aligned

ModelNHumanpreferences(~1M*?)Re-usepreferencepromptsReward

ModelBase

ModelInfercornectHuman+SyntheticInstructions(-1M+7initiatFTtrahingReward

Model/LLMJudge~Llama

3.1/Nemotron~InstructGPTHumanpreferencs(-100k)Re-use

preference

promptsSOpenAIQ*exp

d)Thederoninalorofafactonis7hasthan

3ines

she

munnetactlt4⑧

●urcaltenumr回四④

som

dinaler

is-7.-5eGPTain(l①

wothasme7=25①

a

7四②

x

-14●@

sox-7過(guò)程獎(jiǎng)勵(lì)模型PRMReminder:AlphaZero推理語(yǔ)言模型?Sasha

Rushand

Daniel

Ritter.Speculations

on

Test-TimeScaling.2024MCTS大語(yǔ)言模型發(fā)展路線圖DeepSeek

V2-V3/R1技術(shù)原理DeepSeek效應(yīng)未來(lái)展望03報(bào)告目錄2023.11DeepSeek

V12024.5DeepSeek

V2天邊的兩多云(國(guó)內(nèi)外現(xiàn)狀)o模型架構(gòu):大部分企業(yè)采用已驗(yàn)證架構(gòu)(試錯(cuò)成本高昂)【

】o推理模型:大部分實(shí)驗(yàn)室仍在苦苦猜測(cè)摸索Q*/01(OpenAl保密)【

】DeepSeek:2023——2024.11DeepSeek

R1-Lite2025.01DeepSeek

R12024.12DeepSeek

V3DeepSeek

V2主要?jiǎng)?chuàng)新o

DeepSeekMoEo

MLADeepSeekMoEo

稀疏激活:計(jì)算不隨規(guī)模呈線性增長(zhǎng)o

相比傳統(tǒng)MoE:細(xì)粒度專家(共享+路由)o

路由&通信改造:·

Device-Limited

Routing·Auxiliary

Loss

for

Load

BalanceToken-Dropping

StrategyTransformerBlock×LFeed-Forward

NetworkRMS

NormAttentionDeepSeekMoEOOOO

OO00

Routed

ExpertOutput

Hidden

h{Shared

Expert1

N?

1

2

3

4

N-1

N,Router

dhl

Top-KInput

Hidden

utMulti-Head

LatentAttention(MLA)OCached

During

InferenceOutput

Hidden

u:OOOOOOO0Multi-HeadAttentionOO

0OLatent

c8LatentcOV2規(guī)模:236B

totalparameters,21B

activatedparameters,128K

context

window

Input

Hidden

heOOOO

OO00[lqS;a&concatenatelafapply

RoPE{[kS,;k{](

ooOconcatenatelkE回applyRoPEDeepSeek:

技術(shù)創(chuàng)新——模型架構(gòu)|V2rocrhKeyscompresed

LatentKVQueriesMLA:低秩壓縮,降低KVcache占用空間ached

During

InferenceRMS

NormMuHeadAttention(MHA)!Grouped-queryAtention(GOA)!Mut-queryAttenton(MOA!MuuHeadLatentAttenton(MLA){k

匠{qS3OCoO{vH·Mixtral8x7BCommand

RLLaMA38BO

LLaMA

234BMistral

7BLLaMA133BLLaMA

213BLLaMA

270BO

LLaMA165BLLaMA

1

Family-LLaMA2FamilyLLaMA3

FamilyMixtral

FamilyCommand

R

FamilyQwen1.5

Family(a)

(b)殺手锏:性能/成本曲線|性價(jià)比KVCacheforGeneration(KB/Token)DeepSeek67Breducing

KV

cache

by93.3%100200300400MaximumGenerationThroughput(Tokens/Sec)DeepSeek67BDeepSeek-V20TrainingCosts(KGPU

Hours/T

Tokens)DeepSeek67Bsaving

42.5%oftrainingcosts050

100150DeepSeek-V2Mixtral8x22B

LLaMA

370BCommand

R+DBRXQwen1.572B

ODeepSeek:

技術(shù)創(chuàng)新——模型架構(gòu)|V2訓(xùn)練開(kāi)銷存儲(chǔ)開(kāi)銷生成速度0

20

40

60

80

100ActivatedParameters(Billions)Performance

(MMLU)807570656055576%of

maximum

throughputDeepSeek

67BGrok-1DeepSeek-V2DeepSeek-V2Qwen1.532B01

N?

1

2

3

4N--

1N,Router

dhhlTop-KOO0O

InputHidden

uMulti-Head

Latent

Attention(MLA)OOcachedDuring

InferenceOutput

Hidden

u:

OOOO

OOO0Multi-Head

Attention{[qS,;ql}

O0O

{[k{,;k{J}[concatenatef

conctenotelDeepSeek

V3主要?jiǎng)?chuàng)新o

InfrastructuresoMulti-Token

Prediction

(MTP)Infrastructureso減少流水線氣泡o

高效節(jié)點(diǎn)間All-to-All通信o

FP8訓(xùn)練o

低精度存儲(chǔ)與通信MTP:

一次預(yù)測(cè)多個(gè)topkenTransformerBlock

×LFeed-ForwardNetworkRMS

NormAttentionRMS

NormetMin

MaddouputHeoad-4MTPModue1oupurHeadfranfomerliockMTP

Modube2etoupgHeadTransformerlocktrngfrmerlock×LlherhaFmbeddlguebwr

mtDeepSeek:

技術(shù)創(chuàng)新——模型架構(gòu)|V3V3規(guī)模:671Btotalparameters,37Bactivatedparameters,trainedon14.8TtokensImput

Hiden

h,Oo00'o000OO-0Olatent

c{

LatentcODeepSeekMoEOutput

Hidden

h{93

(qRouted

Expert

Shared

Expertkf|

回applyRoPE后fapplyRoPE{k

匠OO{v&B殺手锏:性能/成本曲線|性價(jià)比DeepSeek:

技術(shù)創(chuàng)新——模型架構(gòu)|V3MMLU

Redux

ZeroEval

Score

VS

Input

API

Price($/1M

Tokens)Training

CostsPre-Training

Context

Extension

Post-TrainingTotalin

H800

GPU

Hours2664K119K5K2788Kin

USD$5.328M$0.238M$0.01M$5.576MTable

1|Training

costs

of

DeepSeek-V3,assumingtherentalprice

of

H800

is

$2per

GPUhour.Duringthe

pre-trainingstate,training

DeepSeek-V3oneachtrilliontokens

requiresonly

180K

H800

GPU

hours,i.e.,

3.7daysonourownclusterwith2048

H800GPUs.Consequently,our

pre-training

stageiscompleted

in

less>E.g.Llama3405B

used30.8M

GPU-hours,while

DeepSeek-V3looksto

be

a

stronger

model

at

only

2.8M

GPU-hours(~11X

less

compute).Super

interesting!And

DeepSeek

was

trained

in

H800's

which

areprobably

also

a

tad

(or

noticeably?)slower

than

Meta's

H100's.大規(guī)模高性能加速器

(折舊)大模型研發(fā)人員成本大模型架構(gòu)技術(shù)探索成本

大模型數(shù)據(jù)成本大模型最終訓(xùn)練成本DeepSeek:

技術(shù)創(chuàng)新——模型架構(gòu)|V3

成本殺手锏:性能/成本曲線|性價(jià)比thantwomonthsandcosts2664K

GPU

hours.SebastianRaschka@rasbt大模型部署推理成本大模型研發(fā)成本成本-岡DeepSeek

V2-V3及R1在模型架構(gòu)上選擇稀疏MoE模型而非稠密模型,并進(jìn)行和積累了大量技術(shù)創(chuàng)新,包括MLA、FP8

訓(xùn)練、MoE

All-to-AlI通信瓶頸解決、MTP等,

這些技術(shù)并不是所有都是原始創(chuàng)新,但是能夠進(jìn)行如此多大模型架構(gòu)底層創(chuàng)新的實(shí)驗(yàn)室,在全世界可能也只有少數(shù)幾個(gè);DeepSeek

所有模型架構(gòu)上的創(chuàng)新均是圍繞“降本增效”:在基本不損害性能前提

下,盡可能通過(guò)算法挖掘和提升硬件訓(xùn)練和解碼效率美國(guó)采取芯片禁令(全球三級(jí)管控)策略維持自己的Al領(lǐng)導(dǎo)地位,DeepSeek

算法繞過(guò)了美國(guó)的算力護(hù)城河DeepSeek:

技術(shù)創(chuàng)新——?jiǎng)?chuàng)新程度DeepSeek

R1主要?jiǎng)?chuàng)新o

DeepSeek-R1-Zero:

大規(guī)模RL

訓(xùn)練,發(fā)現(xiàn)了RL

訓(xùn)練的Scaling

Laws,RL訓(xùn)練涌現(xiàn)“aha”時(shí)刻o

推理模型訓(xùn)練技術(shù)框架:

4步法,有效解決了R1-Zero

存在問(wèn)題,將推理與對(duì)齊合為一體o

強(qiáng)化學(xué)習(xí)訓(xùn)練框架:GRPO,

來(lái)

自DeepSeekMath,降低了強(qiáng)化學(xué)習(xí)訓(xùn)練成本o

推理模型蒸餾:

將大模型推理能力蒸餾到小模型,優(yōu)于小模型直接進(jìn)行推理訓(xùn)練(規(guī)模效應(yīng))為什么MCTS+PRM是“誤區(qū)”o

The

bitter

lesson:scalabilityo

OpenAl

競(jìng)爭(zhēng)策略DeepSeek:

技術(shù)創(chuàng)新——推理模型|R1DeepSeek:

技術(shù)創(chuàng)新——推理模型|R1-ZeroLarge-scale

Reasoning-OrientedReinforcement

Learning3.通過(guò)prompt

策略引導(dǎo)模型思考和給出答案,避免基座

模型不能生成停止符使用標(biāo)記<think></think><answer></answer>R1-Zero存在問(wèn)題:poorreadability,language

mixingstepsDeepSeek-v3-Base

DeepSeek-R1-Zero2.RL

Training

Scaling

Law:

涌現(xiàn)reflection

、aha自動(dòng)涌現(xiàn)出搜索、反思、頓悟、糾錯(cuò)與testing-time

scaling

law—致,可從性能增長(zhǎng)曲線和長(zhǎng)

度增長(zhǎng)曲線推出推理時(shí)scaling

lawA

conversation

between

User

and

Assistant.The

user

asks

a

question,and

the

Assistant

solves

it.The

assistant

firstthinks

aboutthereasoningprocess

inthe

mind

and

then

provides

the

userwith

the

answer.The

reasoning

process

and

answer

are

enclosed

within

<think></think>and<answer></answer>tags,respectively,i.e,<think>reasoningprocesshere</think><answer>answerhere</answer>.User:prompt.Assistant:Table1|TemplateforDeepSeek-R1-Zero.promptwillbe

replaced

with

the

specific

reasoningquestion

during

training.1.強(qiáng)化學(xué)習(xí)訓(xùn)練規(guī)模大業(yè)內(nèi)通常訓(xùn)練幾十RL

steps,DeepSeek訓(xùn)練幾千RL

Tülu

3最大發(fā)布模型只訓(xùn)練了~50RL

stepsKerconnects.ai/p/deepseek-r1-recipe-for-01Fgure3/The

average

rspone

lknghfDwpskRI-Zme

onthe

trainingstduring

theRLpoces.DepSok-R1-ZronuhuralylearstosbvereasoningtaskswihmarethinkngtimeFgum2|AIMEaecurayafDwpskRI-ZcmduringtrainingForeachquotbnwsmple16responsesandakuletheowrallawerageaccuncytoensure

astulleevaluation.stepsDeepSeek-V3-base(200K

samples)Step3.

RejectionSamplingSFT

3/4reasoning

data(600K)1/4

general

instruction

data

(200K)Reasoning

Data長(zhǎng)CoT

數(shù)據(jù)General-Purpose

ModelDeepSeek-R1Step

0.GeneratingLong

CoT

data

Step

4.General

RLRLHF

Preference

Tuning

with

safety

rewardso

DeepSeek-R1

不是唯一的推理模型框架,2025年將出現(xiàn)更多新的框架o

要復(fù)現(xiàn)上述框架,需要DeepSeek

開(kāi)源相關(guān)數(shù)據(jù)Step

2.Reasoning-orientedRLStep3

Reasoning

Data

類似訓(xùn)練R1-Zero

Math,Code,Logic直至訓(xùn)練收斂

(600K

samples)Few-shot

ICL+

人工后期refining

Reasoning

RL

with

rule-based

rewardsDeepSeek:

技術(shù)創(chuàng)新——推理模型|R1

Recipe大規(guī)模強(qiáng)化學(xué)習(xí)DeepSeek-R1-Zero

中間推理模型Step

3

Instruction

DataWriting,QA,trans,etc.SFTCheckpoint

RL-tuned

ModelStep1.

ReasoningSFT

Cold

Start1.強(qiáng)化學(xué)習(xí)框架GRPO(DeepSeekMath)采用蒙特卡洛采用估算以取代Value模型,降低

計(jì)算和存儲(chǔ)開(kāi)銷2.強(qiáng)化學(xué)習(xí)獎(jiǎng)勵(lì)模型o

采用easily

verifiable

rewards rewardo

避免過(guò)程獎(jiǎng)勵(lì)模型:計(jì)算復(fù)雜,容易reward●·AccuracyReferenceModelRewardModelqTrained

ModelsFrozenModelsA?A?AFigure4|DemonstrationofPPOandourGRPO.GRPOforegoesthevaluemodel,insteadestimatingthebaselinefromgroupscores,significantlyreducingtrainingresources.hackingDeepSeek:

技術(shù)創(chuàng)新——推理模型|RLFormat

reward·Language-consistency

rewardKLReference

ModelReward

ModelGroupComputationGAE

APolicyModelPolicyModelValue

Model0?0?0GRPO◆田-

rPPOr1T?|KLrgqV0Qwen2.5-Math-1.5B,SFTQwen2.5-14B,Qwen2.5-32B,Llama-3.1-8B,andStep

3

Reasoning

DataMath,Code,Logic(600K

samples)Step

3

Instruction

DataWriting,QA,trans,etc.(200K

samples)推理模型蒸餾到小模型o

reasoning能力可以蒸餾到小模型o

大模型蒸餾到小模型優(yōu)于小模型直接通過(guò)大規(guī)模RL訓(xùn)練o

再次驗(yàn)證了模型規(guī)模在AGI發(fā)展中的重要性o推理者同樣需要規(guī)模支撐DeepSeek:

技術(shù)創(chuàng)新——推理模型|推理能力蒸餾DeepSeek-R1-Distill-Qwen2.5DeepSeek-R1-Distill-LlamaLlama-3.3-70B-InstructQwen2.5-Math-7B,DeepSeekvs

OpenAICreated

by

pc■openAl-o1-1217MMLUDlamond囊uygnSource:DeepSeek

OHiclar

Website

Morecharts:殺手锏:性能/成本曲線|性價(jià)比DeepSeek:

技術(shù)創(chuàng)新——推理模型|R1Pricing:InputandOutput

PricesUSD

per

1MTokens■Input

price

■Output

pricecodstorceAcompectve

erooramming

plottormshere

coders

solheMATH-500AcolecticnAIME2024Amothor500

toughmothprobemssWE-bench

VerifiedDeepSeek-R1ondreosonlngAtest

ofGPOAModelsLogicalLevel

1Level

2Level

3OpenSource?Model

SizeDeepSeek-R1(API)76.10%90.48%77.14%61.70%Yes671BDeepSeck-R1(網(wǎng)頁(yè))74.84%80.95%78.57%63.83%Yes671Bol-preview72.33%88.10%74.29%55.32%NoundisclosedDeepSeek-R1(非官方API-together)70.44%80.95%78.57%48.94%Yes671BQwQ-32B63.52%73.81%70.00%44.68%Yes32Bhunyuan-turbo-latest62.26%85.71%65.71%36.17%NoundisclosedGLM-Zero-preview61.64%71.43%71.43%38.30%NoundisclosedDoubao-pro-32k61.01%83.33%62.86%38.30%NoundisclosedYi-Lightning52.83%64.29%60.00%31.91%NoundisclosedDeepSeek-V2.5-121049.69%69.05%57.14%21.28%YesundisclosedErnie-4.0-Turbo-8k49.06%66.67%54.29%25.53%NoundisclosedDeepSeek-V349.06%66.67%52.86%27.66%Yes671BSenseChat-5-120247.17%64.29%50.00%27.66%NoundisclosedGPT-4-Turbo42.77%57.14%48.57%21.28%NoundisclosedSpark4.0Ultra39.62%57.14%44.29%17.02%NoundisclosedMoonshot-v1-32k38.99%45.24%48.57%19.15%NoundisclosedGPT-3.5-Turbo29.56%35.71%35.71%14.89%NoundisclosedDeepSeek-R1(網(wǎng)頁(yè))平均思考時(shí)間Average

Times(s)AllCorrectWrongOverall147.26100.69285.83Level

183.5763.88167.25Level

2132.4991.98281.00Level3226.19158.37345.88DeepSeek:

技術(shù)創(chuàng)新——推理模型|R1TJUNLP實(shí)測(cè)DeepSeek-R1邏輯推理性能DeepSeek

R1是在探明方向

(OpenAl

o1引領(lǐng)和證實(shí)的方向)上進(jìn)行0-1的創(chuàng)新突破,獨(dú)立探索出基于大規(guī)模強(qiáng)化學(xué)習(xí)的大語(yǔ)言模型推理技術(shù)路線,避開(kāi)了過(guò)去一年

多(

自O(shè)penAl

的Q*

在社交媒體討論)業(yè)內(nèi)廣泛思索的通過(guò)在訓(xùn)練中進(jìn)行顯式搜索、

過(guò)程獎(jiǎng)勵(lì)模型(即Search+PRM)

實(shí)現(xiàn)推理的“誤區(qū)”;貢

獻(xiàn)

:o獨(dú)立探索出推理技術(shù)路線o將技術(shù)路線公開(kāi)發(fā)布(解惑了業(yè)內(nèi)的“不知”)o模型開(kāi)源

(MITLicense)Dee

pSeek

R1打破了美國(guó)第一梯隊(duì)企業(yè)以閉源形成的技術(shù)護(hù)城河,進(jìn)一步動(dòng)搖

了美國(guó)的"AIDominance"DeepSeek:

技術(shù)創(chuàng)新——?jiǎng)?chuàng)新程度大語(yǔ)言模型發(fā)展路線圖DeepSeek

V2-V3/R1技術(shù)原理DeepSeek

效應(yīng)未來(lái)展望報(bào)告目錄Overnight,Microsoft,NVIDIA,andAmazonallconnectedtoDeepSeek!

Andrew

Ng:Al

inChina

isonthe

rise.New

InteligenoeSouree·Jan3116:49

文/AMSFT+0.35%

NVDA+1.71%AMZN

+195%

口unusual_whales@unusualwhales·Jan28BREAKING:Thisisnotamemecoin.Thisis

Nvidia,SNVDA,themostvaluablecompanyin

the

word

before

today.t

isdown

17%.It

lost$560

billion

in

marketcaptodaysofar,thelargest

in

market

history.<AppsTop

Charts

All

AppsFree

Apps

Paid

AppsMicrosoft,NVIDIA,andAmazonembraceDeepSeekR1,alongwithUSACloud

Computingplatforms.Andrew

NgandtheformerCEOofIntelpraise

DeepSeek's

innovativecapabilities.開(kāi)源vs

源創(chuàng)新&人才&Vision2

ChatGPTThe

official

app

by

OpenAI3

ThreadsConnect

andshare

ideasDeepSeek:

效應(yīng)OnthelastdayofJanuary,theenthusiasmfrom

DeepSeekshows

nosigns

of

waning.算力價(jià)格戰(zhàn)認(rèn)知誤區(qū)1

DeepSeek-AlAssistantIntelligent

AIAssistantOpenme17so.00s2.00s?.00S?.00s?.00$10.00$12.00

$14.00

$16.00

$18.00

S20.00

$22.00

$24.00

S26.00

S28.00

$30.00Price(USD

per

MTokens)產(chǎn)品:性價(jià)比永遠(yuǎn)是王道

技術(shù)也是如此

數(shù)百億美元構(gòu)建的前沿技術(shù)護(hù)城河一夜間被攻破DeepSeek

R1DeepSeek

V3Gemini1.5

Pro(Sep)

Qwen2.5

MaxLlama

3/370B

01-minio3-mini

Claude

3.5

Sonnet

(Oct)GPT-40(Nov'24)Mistral

Large

2(Nov

24)DeepSeek:

效應(yīng)——算力價(jià)格戰(zhàn)AArtificial

Analysis015550-Llama

3.18B95-90-85-80-75-70-65-60-ArtificialAnalysisQualityIndexGPT-3

選擇閉源之后,大模型開(kāi)源vs

閉源之爭(zhēng)、之戰(zhàn)一直存在DeepSeek

R1的開(kāi)源發(fā)布,一舉趕超閉源大模型,是大模型開(kāi)源史上的里程碑美國(guó)Al第一梯隊(duì)企業(yè)的前沿技術(shù)封閉被打破開(kāi)源vs

閉源不僅涉及技術(shù)的公開(kāi)性,也關(guān)乎Al安全治理Thiscoderepositoryandthemodelweightsare

licensed

underthe

MIT

License.DeepSeek-R1series

supportcommercialuse,allowforanymodificationsandderivativeworks,including,butnotlimitedto,distillationfor

trainingotherLLMs.Pleasenote

that:·DeepSeek-R1-Distill-Qwen-1.5B,DeepSeek-R1-Distll-Qwen-7B,DeepSeek-R1-Distill-Qwen-14BandDeepSeek-R1-Distil-Qwen-32BarederivedfromQwen-2.5series,whichareoriginallylicensedunderApache2.0

License,andnowfinetunedwith800ksamplescuratedwith

DeepSeek-R1.·DeepSeek-R1-Distll-Llama-8BisderivedfromLlama3.1-8B-Baseandis

originally

licensed

under

llama3.1

license.·DeepSeek-R1-Distll-Llama-70BisderivedfromLlama3.3-70B-Instructandis

originally

licensed

underlama3.3license.samaltmancO-HOST

·

4dagoyes,wearediscussing.ipersonallythinkwe

have

been

on

the

wrong

side

of

history

here

and

need

tofigureouta

different

open

source

strategy;not

everyone

at

openai

shares

this

view,and

it's

also

notourcurrenthighest

priority.白個(gè)

5

1

0

2Share

…DeepSeek:

效應(yīng)——開(kāi)源

vs

閉源lolzinventor

·5d

agoWould

you

consider

releasing

some

model

weights,and

publishing

some

research?曰

164Award

Share

…OpenAI

CEO

Sam

Altman

IVerified7.LicenseDeepSeek:

效應(yīng)——認(rèn)知誤區(qū)如果ChatGPT刷新了我們對(duì)Al的認(rèn)知,那么DeepSeek在某種程度上顛覆了:o

美國(guó)人對(duì)中國(guó)Al水平的認(rèn)知:長(zhǎng)久以來(lái),美國(guó)認(rèn)為中國(guó)在Al科技創(chuàng)新上更多是跟隨者角色o

大模型研發(fā)成本的認(rèn)知:大模型研發(fā)成本需要數(shù)千萬(wàn)乃至上億美元14.中

國(guó)

的Sora

模型何時(shí)到來(lái),可以看中國(guó)

的ChatGPT

何時(shí)到來(lái)。過(guò)去

一年,國(guó)內(nèi)

大語(yǔ)言模型發(fā)展迅速,甚至出現(xiàn)了百模大戰(zhàn)

的熱鬧景象,但“熱鬧”較多的是同質(zhì)化競(jìng)

爭(zhēng),較少的是底層基礎(chǔ)技術(shù)的原創(chuàng)性突破。15.

國(guó)內(nèi)和國(guó)外大模型的差距不在于模型能

力高低,也不在于應(yīng)用,而在于底層核心技

術(shù)

。而底層核心技術(shù)突破的最主要障礙不是

算力受限,也不是數(shù)據(jù)規(guī)模和質(zhì)量受限,而

是缺乏足夠數(shù)量的具有技術(shù)遠(yuǎn)見(jiàn)、敢于技術(shù)

冒險(xiǎn)的大模型人才。16

.

大模型技術(shù)仍然在不斷發(fā)展和突破中,

未來(lái)格局存在很多變數(shù)。大模型頂尖人才技術(shù)型人才:銳意進(jìn)行大模型底層技術(shù)創(chuàng)

新和冒險(xiǎn)(第

類人才)戰(zhàn)略型人才

:具有AGI技術(shù)遠(yuǎn)見(jiàn)和vision

(第二類人才)為鞏固并提升我國(guó)在這

領(lǐng)域的

國(guó)際競(jìng)爭(zhēng)力,可以從以下布局和規(guī)劃

著手。第

,進(jìn)

步提升以大模型為

代表的前沿人工智能在國(guó)家科技和產(chǎn)

業(yè)發(fā)展中的戰(zhàn)略地位,成立人工智能

,

領(lǐng)

導(dǎo)AI

產(chǎn)

會(huì)

,

統(tǒng)籌資源,制定AI

政策和計(jì)劃,推進(jìn)

術(shù)

創(chuàng)

產(chǎn)

業(yè)

發(fā)

展。

二,重點(diǎn)規(guī)劃和建設(shè)前沿人工智能相

關(guān)的國(guó)家基礎(chǔ)設(shè)施,包括超級(jí)智算網(wǎng)

絡(luò)、通用及行業(yè)數(shù)據(jù)基礎(chǔ)設(shè)施、大規(guī)

模人工智能軟件基礎(chǔ)平臺(tái)、人工智能

安全與測(cè)評(píng)基礎(chǔ)設(shè)施、大模型開(kāi)源平

臺(tái)等。第三,開(kāi)展大模型關(guān)鍵理論和

技術(shù)攻關(guān),啃硬骨頭,探新疆域,研

發(fā)

經(jīng)

實(shí)

驗(yàn)

術(shù)

四,

創(chuàng)

發(fā)

態(tài),形成大模型技術(shù)創(chuàng)新氛圍,鼓勵(lì)

耐心資本敢投廣投大模型硬核技術(shù)創(chuàng)

業(yè)企業(yè)。第五,重視人工智能人才培

養(yǎng)和成長(zhǎng),培養(yǎng)

批具有長(zhǎng)遠(yuǎn)眼光

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論