大語(yǔ)言模型綜述 A Survey of Large Language Models_第1頁(yè)
大語(yǔ)言模型綜述 A Survey of Large Language Models_第2頁(yè)
大語(yǔ)言模型綜述 A Survey of Large Language Models_第3頁(yè)
大語(yǔ)言模型綜述 A Survey of Large Language Models_第4頁(yè)
大語(yǔ)言模型綜述 A Survey of Large Language Models_第5頁(yè)
已閱讀5頁(yè),還剩101頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

yofLargeLanguageModelsWayneXinZhao,KunZhou*,JunyiLi*,TianyiTang,XiaoleiWang,YupengHou,YingqianMin,BeichenZhang,JunjieZhang,ZicanDong,YifanDu,ChenYang,YushuoChen,ZhipengChen,JinhaoJiang,RuiyangRen,YifanLi,XinyuTang,ZikangLiu,PeiyuLiu,Jian-YunNieandJi-RongWenAbstract—EversincetheTuringTestwasproposedinthe1950s,humanshaveexploredthemasteringoflanguageintelligencebymachine.Languageisessentiallyacomplex,intricatesystemofhumanexpressionsgovernedbygrammaticalrules.Itposesasigni?cantchallengetodevelopcapablearti?cialintelligence(AI)algorithmsforcomprehendingandgraspingalanguage.Asamajorapproach,languagemodelinghasbeenwidelystudiedforlanguageunderstandingandgenerationinthepasttwodecades,evolvinglargescalecorporashowingstrongcapabilitiesinsolvingvariousnaturallanguageprocessingNLPhatmodelscalingcanleadtoanimprovedmodelcapacitytheyfurtherinvestigatethescalingeffectbyincreasingtheparameterscaletoanevenlargersize.Interestingly,whentheparameterscaleexceedsacertainlevel,theseenlargedlanguagemodelsnotonlyachieveasigni?cantperformanceimprovement,butalsoexhibitsomespecialabilities(e.g.,in-contextlearning)thatarenotpresentinsmall-scalelanguagemodels(e.g.,BERT).Todiscriminatethelanguagemodelsindifferentparameterscales,theresearchcommunityhascoinedthetermlargelanguagemodels(LLM)forthePLMsofsigni?cantsize(e.g.,containingtensorhundredsofbillionsofparameters).Recently,theresearchonLLMshasbeenlargelyadvancedbybothacademiaandindustry,andaremarkableprogressisthelaunchofChatGPT(apowerfulAIchatbotdevelopedbasedonLLMs),whichhasattractedwidespreadattentionfromsociety.ThetechnicalevolutionofLLMshasbeenmakinganimportantimpactontheentireAItywhichwouldrevolutionizethewayhowwedevelopanduseAIalgorithmsConsideringthisrapidtechnicalprogressinthissurveywereviewtherecentadvancesofLLMsbyintroducingthebackground,key?ndings,andmainstreamtechniques.Inparticular,wefocusonfourmajoraspectsofLLMs,namelypre-training,adaptationtuning,utilization,andcapacityevaluation.Besides,wealsoleresourcesfordevelopingLLMsanddiscusstheremainingissuesforfuturedirectionsThissurveyprovidesanuptodatereviewoftheliteratureonLLMs,whichcanbeausefulresourceforbothresearchersandengineers.AlignmentCapacityEvaluation ◆L1INTRODUCTIONL ANGUAGEisaprominentabilityinhumanbeingsto expressandcommunicate,whichdevelopsinearlychildhoodandevolvesoveralifetime[1,2].Machines,however,cannotnaturallygrasptheabilitiesofunderstand-ingandcommunicatingintheformofhumanlanguage,unlessequippedwithpowerfularti?cialintelligence(AI)algorithms.Ithasbeenalongstandingresearchchallengetoachievethisgoal,toenablemachinestoread,write,andcommunicatelikehumans[3].Technically,languagemodeling(LM)isoneofthemajoresIngeneral,LMaimstomodelthegenerativelikelihoodofwordsequences,soastopredicttheprobabilitiesoffuture(ormissing)tokens.TheresearchofLMhasreceivedextensiveattentionintheliterature,whichcanbedividedintofourmajordevelopmentstages:lsSLMSLMsaredevelopedbasedonstatisticallearningmethodsthatroseinthe1990s.ThebasicideaistobuildthewordpredictionmodelbasedontheMarkovassumption,e.g.,predictingthenextwordbasedonthemostrecentcontextTheSLMswith●Version:v5(updateonApril16,2023).●GitHublink:/RUCAIBox/LLMSurvey●*K.ZhouandJ.Licontributeequallytothiswork.●TheauthorsaremainlywithGaolingSchoolofArti?cialIntelligenceandSchoolofInformation,RenminUniversityofChina,Beijing,China;Jian-versitedeMontrealCanadaa?xedcontextlengthnarealsocalledn-gramlanguagemodels,e.g.,bigramandtrigramlanguagemodels.SLMshavebeenwidelyappliedtoenhancetaskperformanceininformationretrieval(IR)[8,9]andnaturallanguageprocessing(NLP)[10–12].However,theyoftensufferfromthecurseofdimensionality:itisdif?culttoaccuratelyestimatehigh-orderlanguagemodelssinceanexponentialnumberoftransitionprobabilitiesneedtobeestimated.Thus,speciallydesignedsmoothingstrategiessuchasback-offestimation[13]andGood–Turingestimation[14]havebeenintroducedtoalleviatethedatasparsityproblem.NeurallanguagemodelsNLMNLMscharacterizetheprobabilityofwordsequencesbyneuralnetworks,e.g.,recurrentneuralnetworks(RNNs).Asaremarkablecontribution,theworkin[15]introducedtheconceptofdistributedrepresentationofwordsandbuiltthewordpredic-tionfunctionconditionedontheaggregatedcontextfeatures(i.e.,thedistributedwordvectors).Byextendingtheideaoflearningeffectivefeaturesforwordsorsentences,ageneralneuralnetworkapproachwasdevelopedtobuildauni?edsolutionforvariousNLPtasks[18].Further,word2vec[19,20]wasproposedtobuildasimpli?edshal-lowneuralnetworkforlearningdistributedwordrepresen-tations,whichweredemonstratedtobeveryeffectiveacrossavarietyofNLPtasks.Thesestudieshaveinitiatedtheuseoflanguagemodelsforrepresentationlearning(beyondwordsequencemodeling),havinganimportantimpactonthe?eldofNLP.tempt,ELMo[21]wasproposedtocapturecontext-awarewordrepresentationsby?rstpre-trainingabidirectionalLSTM(biLSTM)network(insteadoflearning?xedwordrepresentations)andthen?ne-tuningthebiLSTMnetworkaccordingtospeci?cdownstreamtasks.Further,basedonthehighlyparallelizableTransformerarchitecture[22]withanismsBERTwasproposedbypretrainingbidirectionallanguagemodelswithspeciallyde-signedpre-trainingtasksonlarge-scaleunlabeledcorpora.Thesepre-trainedcontext-awarewordrepresentationsareveryeffectiveasgeneral-purposesemanticfeatures,whichhavelargelyraisedtheperformancebarofNLPtasks.Thisstudyhasinspiredalargenumberoffollow-upwork,whichsetsthe“pre-trainingand?ne-tuning”learningparadigm.Followingthisparadigm,agreatnumberofstudiesonPLMshavebeendeveloped,introducingeitherdifferentarchitectures[24,25](e.g.,GPT-2[26]andBART[24])orimprovedpre-trainingstrategies[27–29].Inthisparadigm,itoftenrequires?ne-tuningthePLMforadaptingtodifferentdownstreamtasks.scalingPLM(e.g.,scalingmodelsizeordatasize)oftenleadstoanimprovedmodelcapacityondownstreamtasks(i.e.,followingthescalinglaw[30]).AnumberofstudieshaveexploredtheperformancelimitbytraininganeverlargerPLM(e.g.,the175B-parameterGPT-3andthe540B-parameterPaLM).Althoughscalingismainlyconductedinmodelsize(withsimilararchitecturesandpre-trainingtasks),theselarge-sizedPLMsdisplaydifferentbehaviorsfromsmallerPLMs(e.g.,330M-parameterBERTand1.5B-gentabilities[31])insolvingaseriesofcomplextasks.ForshottasksthroughincontextereasGPTcannotdowellThustheresearchtheselarge-sizedPLMs[32–35].AremarkableapplicationofLLMsisChatGPT2thatadaptstheLLMsfromtheGPTingconversationabilitywithhumans.Intheexistingliterature,PLMshavebeenwidelydis-cussedandsurveyed[36–39],whileLLMsareseldomre-viewedinasystematicway.Tomotivateoursurvey,we?rsthighlightthreemajordifferencesbetweenLLMsandPLMs.First,LLMsdisplaysomesurprisingemergentabilitiesthatmaynotbeobservedinprevioussmallerPLMs.Theseabili-rmanceoflanguagemodelsoncomplextasks,makingAIalgorithmsunprecedentlypowerfulandeffective.Second,LLMswouldrevolutionizethewaythathumansdevelopanduseAIalgorithms.UnlikesmallPLMs,themajorapproachtoaccessingLLMsisthroughthepromptinginterface(e.g.,GPT-4API).HumanshavetounderstandhowLLMsworkandformattheirtasksinawaythatLLMscanfollow.Third,thedevelopmentofLLMsnolongerdrawsacleardistinctionbetweenresearchanden-gineering.ThetrainingofLLMsrequiresextensivepracticalexperiencesinlarge-scaledataprocessinganddistributedanasmallPLMandemergentabilitiesmaynotoccurinsomeLLMs.2./blog/chatgpt/paralleltraining.TodevelopcapableLLMs,researchershavetosolvecomplicatedengineeringissues,workingwithineersNowadays,LLMsareposingasigni?cantimpactontheAIcommunity,andtheadventofChatGPTandGPT-4leadstotherethinkingofthepossibilitiesofarti?cialgeneralintelligence(AGI).OpenAIhaspublishedatechnicalarticleentitled“PlanningforAGIandbeyond”,whichdiscussestheshort-termandlong-termplanstoapproachAGI[40],andamorerecentpaperhasarguedthatGPT-4mightbeconsideredasanearlyversionofanAGIsystem[41].TheresearchareasofAIarebeingrevolutionizedbytherapidprogressofLLMs.Inthe?eldofNLP,LLMscanserveasageneral-purposelanguagetasksolver(tosomeextent),andtheresearchparadigmhasbeenshiftingtowardstheuseofLLMs.Inthe?eldofIR,traditionalsearchenginesarechallengedbythenewinformationseekingwaythroughAIchatbots(i.e.,ChatGPT),andNewBing3presentsaninitialattemptthatenhancesthesearchresultsbasedonLLMsInthe?eldofCV,theresearcherstrytodevelopChatGPT-likevision-languagemodelsthatcanbetterservemultimodaldialoguesandGPT-4[46]hassupportedmulti-modalinputbyintegratingthevisualinformation.Thisnewwaveoftechnologywouldpotentiallyleadtoaprosperousecosystemofreal-worldapplicationsbasedonLLMs.Forinstance,Microsoft365isbeingempoweredbyLLMs(i.e.,Copilot)toautomatetheof?cework,andOpenAIsupportstheuseofpluginsinChatGPTforimplementingspecialfunctions.Despitetheprogressandimpact,theunderlyingprin-ciplesofLLMsarestillnotwellexplored.Firstly,itismysteriouswhyemergentabilitiesoccurinLLMs,insteadofsmallerPLMs.Asamoregeneralissue,therelacksadeep,detailedinvestigationofthekeyfactorsthatcontributetothesuperiorabilitiesofLLMs.ItisimportanttostudywhenandhowLLMsobtainsuchabilities[47].Althoughtherearesomemeaningfuldiscussionsaboutthisproblem[31,47],moreprincipledinvestigationsareneededtouncoverthe“secrets“ofLLMs.Secondly,itisdif?cultfortheresearchcommunitytotraincapableLLMs.Duetothehugede-mandofcomputationresources,itisverycostlytocarryoutrepetitive,ablatingstudiesforinvestigatingtheeffectofvariousstrategiesfortrainingLLMs.Indeed,LLMsaremainlytrainedbyindustry,wheremanyimportanttrainingdetails(e.g.,datacollectionandcleaning)arenotrevealedtothepublic.Thirdly,itischallengingtoalignLLMswithhumanvaluesorpreferences.Despitethecapacities,LLMsarealsolikelytoproducetoxic,?ctitious,orharmfulcon-tents.Itrequireseffectiveandef?cientcontrolapproachestoeliminatingthepotentialriskoftheuseofLLMs[46].Facedwithbothopportunitiesandchallenges,itneedsmoreattentionontheresearchanddevelopmentofLLMs.InordertoprovideabasicunderstandingofLLMs,thissurveyconductsaliteraturereviewoftherecentadvancesinLLMsfromfourmajoraspects,includingpre-training(howtopre-trainacapableLLM),adaptationtuning(howtoeffectivelytunepre-trainedLLMsfromthetwoperspectivesofeffectivenessandsafety),utilization(howtouseLLMsforsolvingvariousdownstreamtasks)andcapabilityeval-3./newuation(howtoevaluatetheabilitiesofLLMsandexistingempirical?ndings).Wethoroughlycombtheliteratureandsummarizethekey?ndings,techniques,andmethodsofLLMs.Forthissurvey,wealsocreateaGitHubprojectwebsitebycollectingthesupportingresourcesforLLMs,atthelink/RUCAIBox/LLMSurvey.WearealsoawareofseveralrelatedreviewarticlesonPLMsorLLMs[32,36,38,39,43,48–54].ThesepaperseitherdiscussPLMsorsomespeci?c(orgeneral)aspectsofLLMs.Comparedwiththem,wefocusonthetechniquesandmethodstodevelopanduseLLMsandprovidearelativelycomprehensivereferencetoimportantaspectsofLLMs.Theremainderofthissurveyisorganizedasfollows:Section2introducesthebackgroundforLLMs,withtheterminology,settings,resources,andorganizationoutline,followedbythesummarizationofavailableresourcesfordevelopingLLMsinSection3.Sections4,5,6,and7reviewandsummarizetherecentprogressfromthefouraspectsofpre-training,adaptationtuning,utilization,andcapacityevaluation,respectively.Finally,weconcludethesurveyinSection8bysummarizingthemajor?ndingsanddiscusstheremainingissuesforfuturework.2OVERVIEWInthissection,weintroducethebackgroundofLLMswithkeyterminologies,abilitiesandtechniques,andthensum-marizethetechnicalevolutionoftheGPT-seriesmodels.2.1BackgroundforLLMsargelanguagemodelsLLMsrefertolanguagemodelsthatcontainhundredsofbillions(ormore)ofpa-rameterswhicharetrainedonmassivetextdata[32],suchasGPT-3[55],PaLM[56],Galactica[35],andLLaMA[57].Speci?cally,LLMsarebuiltupontheTransformerarchi-tecture[22],wheremulti-headattentionlayersarestackedinaverydeepneuralnetwork.ExistingLLMsmainlyadoptsimilarmodelarchitectures(i.e.,Transformer)andpre-trainingobjectives(i.e.,languagemodeling)assmalllanguagemodels.Asthemajordifference,LLMslargelyscalethemodelsize,pre-trainingdata,andtotalcompute(ordersofmagni?cation).Theycanbetterunderstandthenaturallanguageandgeneratehigh-qualitytextbasedonthegivencontext(i.e.,prompts).Suchacapacityimprove-mentcanbepartiallydescribedbythescalinglaw,wheretheperformanceroughlyfollowsasubstantialincreasewithoweversomeabilitiesegin-contextlearning[55])areunpredictableaccordingtothescalinglaw,whichcanbeobservedonlywhenthemodelabilitiesofLLMsareformallyde?nedas“theabilitiesthatarenotpresentinsmallmodelsbutariseinlargemodels”,whichisoneofthemostprominentfeaturesthatdistin-guishLLMsfrompreviousPLMs.ItfurtherintroducesaInexistingliterature,thereisnoformalconsensusontheminimumparameterscaleforLLMs,sincethemodelcapacityisalsorelatedtodatasizeandtotalcompute.Inthissurvey,wetakeaslightlyloosede?nitionofLLMs,andmainlyfocusondiscussinglanguagemodelswithamodelsizelargerthan10B.notablecharacteristicwhenemergentabilitiesoccur[31]:performancerisessigni?cantlyaboverandomwhenthelevelByanalogysuchanemergenttransitioninphysics[31,58].Inprinciple,emergentabilitiescanbede?nedinrelationtosomecomplextasks[31,59],whilewearemoreconcernedwithgeneralabilitiesthatcanbeappliedtosolveavarietyoftasks.Here,webrie?yintroducethreerepresentativeemergentabilitiesforLLMs,describedasfollows.ntroducedbyGPTassumingthatthelanguagemodelhasbeenprovidedwithanaturallanguageinstruc-tionand/orseveraltaskdemonstrations,itcangeneratetheexpectedoutputforthetestinstancesbycompletingthewordsequenceofinputtext,withoutrequiringadditionaltrainingorgradientupdate5.●Instructionfollowing.By?ne-tuningwithamixtureofmulti-taskdatasetsformattedvianaturallanguagedescrip-tions(calledinstructiontuning),LLMsareshowntoperformwellonunseentasksthatarealsodescribedintheformofinstructions[28,61,62].Withinstructiontuning,LLMsareenabledtofollowthetaskinstructionsfornewtaskswithoutusingexplicitexamples,thushavinganimprovedgeneralizationability.usuallydif?culttosolvecomplextasksthatinvolvemultiplereasoningsteps,e.g.,mathematicalwordproblems.While,withthechain-of-thoughtreasoningstrategy[33],LLMscansolvesuchtasksbyutilizingthepromptingmechanismthatinvolvesintermediatereasoningstepsforderivingthe?nalanswer.Thisabilityisspeculatedtobepotentiallyobtainedbytrainingoncode[33,47].KeyTechniquesforLLMs.Ithasbeenalongwaythatlearners.Inthedevelopmentprocess,anumberofimpor-tanttechniquesareproposed,whichlargelyimprovethecapacityofLLMs.Here,webrie?ylistseveralimportanttechniquesthat(potentially)leadtothesuccessofLLMs,asfollows.oincreasethemodelcapacityofLLMs.Astheinitialattempt,GPT-3?rstlyin-creasesthemodelsizetoanextremelylargescaleof175Bparameters.Lateron,PaLMfurtherraisestheparameterscaletoanewrecordof540B.Asdiscussedbefore,alargemodelsizeisessentialtoemergentabilities.While,scalingisnotonlyconductedonmodelsizebutalsorelatedtodatasizeandtotalcompute[34,63].Arecentstudy[34]ofmodelsize,datasize,andtotalcompute,givena?xedbudget.Further,thequalityofthepre-trainingdataplaysakeyroleinachievinggoodperformance,sothatdatacollectionandcleaningstrategiesareveryimportanttoconsiderwhenscalingthepre-trainingcorpora.●Training.Duetothehugemodelsize,itisverychal-lengingtosuccessfullytrainacapableLLM.Distributedtrainingalgorithmsareneededtolearnthenetworkparam-5.Insomerecentstudies[60],italsoshowsthatin-contextlearningimplicitlyperformsmeta-optimizationthroughtheattentionmecha-nism.etersofLLMs,inwhichvariousparallelstrategiesareoftenjointlyutilized.Tosupportdistributedtraining,severalopti-ationframeworkshavebeenreleasedtofacilitatetheimplementationanddeploymentofparallelalgorithms,suchmizationtricksarealsoimportantfortrainingstabilityandmodelperformance,e.g.,restarttoovercometraininglossspike[56]andmixedprecisiontraining[68].Morerecently,GPT-4[46]proposestodevelopspecialinfrastructureandoptimizationmethodsthatreliablypredicttheperformanceoflargemodelswithmuchsmallermodels.●Abilityeliciting.Afterbeingpre-trainedonlarge-scalecorpora,LLMsareendowedwithpotentialabilitiesasgeneral-purposetasksolvers.While,theseabilitiesmightnotbeexplicitlyexhibitedwhenLLMsperformsomespe-ci?ctasks.Asthetechnicalapproach,itisusefultode-signsuitabletaskinstructionsorspeci?cin-contextlearn-ingstrategiestoelicitsuchabilities.Forinstance,chain-of-thoughtpromptinghasbeenshowntobeusefultosolvecomplexreasoningtasksbyincludingintermediateepsBesideswecanfurtherperforminstructiontuningonLLMswithtaskdescriptionsexpressedinnaturallanguage,forimprovingthegeneralizabilityofLLMsonhilethesetechniquesmainlycorrespondtotheemergentabilitiesofLLMs,whichmaynotshowtheanguagemodels●Alignmenttuning.SinceLLMsaretrainedtocapturethedatacharacteristicsofpre-trainingcorpora(includingbothhigh-qualityandlow-qualitydata),theyarelikelytogeneratetoxic,biased,orevenharmfulcontentforhumans.ItisnecessarytoalignLLMswithhumanvalues,e.g.,helpful,honest,andharmless.Forthispurpose,InstructGPT[61]de-signsaneffectivetuningapproachthatenablesLLMstofol-lowtheexpectedinstructions,whichutilizesthetechniquegwithhumanfeedbackRLHFItincorporateshumaninthetrainingloopwithelaboratelydesignedlabelingstrategies.ChatGPTisindeeddevelopedonasimilartechniquetoInstructGPT,whichshowsastrongalignmentcapacityinproducinghigh-quality,harmlessre-sponses,e.g.,rejectingtoanswerinsultingquestions.●Toolsmanipulation.Inessence,LLMsaretrainedastextgeneratorsovermassiveplaintextcorpora,thusperforminglesswellonthetasksthatarenotbestexpressedintheformoftext(e.g.,numericalcomputation).Besides,theircapacitiesarealsolimitedtothepre-trainingdata,e.g.,theinabilitytocaptureup-to-dateinformation.Totackletheseissues,arecentlyproposedtechniqueistoemployexternaltoolstocompensateforthede?cienciesofLLMs[70,71].Forexample,LLMscanutilizethecalculatorforaccuratecomputation[70]andemploysearchenginestoretrieveunknowninformation[71].Morerecently,ChatGPThasenabledthemechanismofusingexternalplugins(existingornewlycreatedapps)6,whicharebyanalogywiththe“eyesandears”ofLLMs.SuchamechanismcanbroadlyexpandthescopeofcapacitiesforLLMs.Besides,manyotherfactors(e.g.,theupgradeofhard-ware)alsocontributetothesuccessofLLMs.While,welimitourdiscussiontothemajortechnicalapproachesandkey?ndingsfordevelopingLLMs.6./blog/chatgpt-pluginshnicalEvolutionofGPTseriesModelsDuetotheexcellentcapacityincommunicatingwithhu-mans,ChatGPThasignitedtheexcitementoftheAIcom-powerfulGPTmodelwithspeciallyoptimizedconversationcapacities.Consideringtheever-growinginterestinChat-GPTandGPTmodels,weaddaspecialdiscussionaboutthetechnicalevolutionoftheGPT-seriesmodels,tobrie?ysummarizetheprogresshowtheyhavebeendevelopedinberoughlydividedintothefollowingstages7.EarlyExplorations.AccordingtooneinterviewwithIlyaSutskever8(aco-founderandchiefscientistofOpenAI),theideaofapproachingintelligentsystemswithlanguagemodelswasalreadyexploredintheearlydaysofOpe-nAI,whileitwasattemptedwithrecurrentneuralnet-works(RNN)[104].WiththeadventofTransformer,OpenAIdevelopedtwoinitialGPTmodels,namelyGPT-1[105]andGPT-2[26],whichcanbeconsideredasthefoundationtomorepowerfulmodelssubsequentlyi.e.,GPT-3andGPT-4.●GPT-1.In2017,theTransformermodel[22]wasintro-ducedbyGoogle,andtheOpenAIteamquicklyadaptedtheirlanguagemodelingworktothisnewneuralnetworkarchitecture.Theyreleasedthe?rstGPTmodelin2018,i.e.,GPT-1[105],andcoinedtheabbreviationtermGPTasthemodelname,standingforGenerativePre-trainedTransformer.GPT-1wasdevelopedbasedonagenerative,decoder-onlyTransformerarchitecture,andadoptedatask-agnosticlearningapproachthatcombinesunsupervisedpretrainingandsupervised?ne-tuning.GPT-1hassetupthecorear-chitecturefortheGPT-seriesmodelsandestablishedtheunderlyingprincipletomodelnaturallanguagetext,i.e.,predictingthenextword.●GPT-2.FollowingasimilararchitectureofGPT-1,GPT-2[26]increasedtheparameterscaleto1.5B,whichwastrainedwithalargewebpagedatasetWebText.AsclaimedinthepaperofGPT-2,itsoughttoperformtasksviaun-supervisedlanguagemodeling,withoutexplicit?ne-tuningusinglabeleddata.Tomotivatetheapproach,theyintro-ducedaprobabilisticformulationformulti-tasklearning,i.e.,p(output|input,task)(similarformshavebeentakeninanearlierwork[106]),whichpredictstheoutputcondi-tionedontheinputandtaskinformation.Tomodelthiscon-ditionalprobability,naturallanguagetextcanbeemployedasauni?edwaytoformatinput,outputandtaskinforma-tion.Inthisway,theprocessofsolvingataskcanbecastasawordpredictionproblemforgeneratingthesolutiontext.Further,theyintroducedamoreformalclaimforthisidea:“Sincethe(task-speci?c)supervisedobjectiveisthesameastheunsupervised(languagemodeling)objectivebutonlyevaluatedonasubsetofthesequence,theglobalminimumoftheunsupervisedobjectiveisalsotheglobalminimumofthesupervisedobjective(forvarioustasks)”[26]9.Abasic7.Notethatthediscussionofthispartcanbesomewhatsubjective.Theoverallviewpointsandsummariesaremadebasedontheund

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論