版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
GocgleDeepMind
?2024Google.Allrightsreserved
Gemini1.5:Unlockingmultimodal
understandingacrossmillionsoftokensofcontext
GeminiTeam,Google
1
Inthisreport,wepresentthelatestmodeloftheGeminifamily,Gemini1.5Pro,ahighlycompute-efficientmultimodalmixture-of-expertsmodelcapableofrecallingandreasoningoverfine-grainedinformationfrommillionsoftokensofcontext,includingmultiplelongdocumentsandhoursofvideoandaudio.Gemini1.5Proachievesnear-perfectrecallonlong-contextretrievaltasksacrossmodalities,improvesthestate-of-the-artinlong-documentQA,long-videoQAandlong-contextASR,andmatchesorsurpassesGemini1.0Ultra’sstate-of-the-artperformanceacrossabroadsetofbenchmarks.StudyingthelimitsofGemini1.5Pro’slong-contextability,wefindcontinuedimprovementinnext-tokenpredictionandnear-perfectretrieval(>99%)uptoatleast10Mtokens,agenerationalleapoverexistingmodelssuchasClaude2.1(200k)andGPT-4Turbo(128k).Finally,wehighlightsurprisingnewcapabilitiesoflargelanguagemodelsatthefrontier;whengivenagrammarmanualforKalamang,alanguagewithfewerthan200speakersworldwide,themodellearnstotranslateEnglishtoKalamangatasimilarleveltoapersonlearningfromthesamecontent.
1.Introduction
WepresentourlatestmultimodalmodelfromtheGeminiline:Gemini1.5Pro.ThisisourfirstreleasefromGemini1.5,anewfamilyofhighly-capablemultimodalmodelswhichincorporatesanovelmixture-of-expertsarchitectureaswellasmajoradvancesintrainingandservinginfrastructurethatallowittopushtheboundaryofefficiency,reasoning,andlong-contextperformance.Gemini1.5Proisbuilttohandleextremelylongcontexts;ithastheabilitytorecallandreasonoverfine-grainedinformationfromuptoatleast10Mtokens.Thisscaleisunprecedentedamongcontemporarylargelanguagemodels(LLMs),andenablestheprocessingoflong-formmixed-modalityinputsincludingentirecollectionsofdocuments,multiplehoursofvideo,andalmostadaylongofaudio.Gemini1.5ProsurpassesGemini1.0Proandperformsatasimilarlevelto1.0Ultraonawidearrayofbenchmarkswhilerequiringsignificantlylesscomputetotrain.
Theabilitytomodeldataofincreasinglylongercontextshastrackedthedevelopmentofmoregeneralandcapablelanguagemodels,fromthenowtoy2-gramlanguagemodelproposedby
Shannon
(1948),tothemodernn-grammodelsofthe1990s&2000s(
Brantsetal.,
2007;
ChenandGoodman,
1999;
Jelinek,
1998;
KneserandNey,
1995)typicallyconstrainedto5tokensofcontext,torecurrent
neuralnetworkslanguagemodelsfromthe2010swhichcouldeffectivelyconditiononhundredsof
tokens(Jozefowiczetal.,
2016;
Mikolovetal.,
2010),tothemodernTransformer(Vaswanietal.,
2017
)whichcanconditiononhundredsofthousandsoftokens(
Anthropic,
2023)
.Gemini1.5Procontinuesthistrendbyextendinglanguagemodelcontextlengthsbyoveranorderofmagnitude.Scalingtomillionsoftokens,wefindacontinuedimprovementinpredictiveperformance(Section
),nearperfectrecall(>99%)onsyntheticretrievaltasks(Figure
1
andSection
),andahostof
surprisingnewcapabilitieslikein-contextlearningfromentirelongdocuments(Section
).
1Pleasesendcorrespondencetogemini-1_5-report@.
Gemini1.5:Unlockingmultimodalunderstandingacrossmillionsoftokensofcontext
2
video
videoHaystack
6121824303642485460
Minutes
3672108144180
Minutes
Audio
AudioHaystack
252
204
492540
588
444
300348
Minutes
84010801320
Minutes
4357
Text
TextHaystack
32k
256k
512k
128k
Tokens
2M
5M
Tokens
Figure1|Gemini1.5Proachievesnear-perfect“needle”recall(>99.7%)upto1Mtokensof“haystack”inallmodalities,i.e.,text,videoandaudio.Itevenmaintainsthisrecallperformancewhenextendingto10Mtokensinthetextmodality(approximately7Mwords);2Mtokensintheaudiomodality(upto22hours);2.8Mtokensinthevideomodality(upto3hours).Thex-axisrepresentsthecontextwindow,andthey-axisthedepthpercentageoftheneedleplacedforagivencontextlength.Theresultsarecolor-codedtoindicate:greenforsuccessfulretrievalsandredforunsuccessfulones.
Tomeasuretheeffectivenessofourmodel’slong-contextcapabilities,weconductexperimentsonbothsyntheticandreal-worldtasks.Insynthetic“needle-in-a-haystack”tasksinspiredby
Kamradt
(2023)thatprobehowreliablythemodelcanrecallinformationamidstdistractorcontext,we
findthatGemini1.5Proachievesnear-perfect(>99%)“needle”recalluptomultiplemillionsoftokensof“haystack”inallmodalities,i.e.,text,videoandaudio,andevenmaintainingthisrecallperformancewhenextendingto10Mtokensinthetextmodality.Inmorerealisticmultimodallong-contextbenchmarkswhichrequireretrievalandreasoningovermultiplepartsofthecontext(suchasansweringquestionsfromlongdocumentsorlongvideos),wealsoseeGemini1.5Prooutperformingallcompetingmodelsacrossallmodalitiesevenwhenthesemodelsareaugmentedwithexternalretrievalmethods.Finally,wequalitativelyshowcasethein-contextlearningabilitiesofGemini1.5Proenabledbyverylongcontext:forexample,learningtotranslateanewlanguagefromasinglesetoflinguisticdocumentation.Withonlyinstructionalmaterials(500pagesoflinguisticdocumentation,adictionary,and≈400parallelsentences)allprovidedincontext,Gemini1.5ProiscapableoflearningtotranslatefromEnglishtoKalamang,alanguagespokenbyfewerthan200speakersinwesternNewGuineaintheeastofIndonesianPapua
2,
andthereforealmostnoonlinepresence.Moreover,wefindthatthequalityofitstranslationsiscomparabletothatofapersonwhohaslearnedfromthesamematerials.
2KalamangLanguage:
/lang/1891
Gemini1.5:Unlockingmultimodalunderstandingacrossmillionsoftokensofcontext
3
Gemini1.5Pro
Relativeto1.0Pro
Relativeto1.0Ultra
Long-ContextText,Video&Audio
from32kupto10Mtokens
from32kupto10Mtokens
CoreCapabilities
Text
Win-rate:87.1%(27/31benchmarks)
Win-rate:54.8%(17/31benchmarks)
Win-rate:100%(13/13benchmarks)
Win-rate:77%(10/13benchmarks)
Vision
Win-rate:77%(10/13benchmarks)
Win-rate:46%(6/13benchmarks)
Audio
Win-rate:60%(3/5benchmarks)
Win-rate:20%(1/5benchmarks)
Table1|Gemini1.5ProcomparedtoGemini1.0family.Gemini1.5Promaintainshighlevelsofperformanceevenasitscontextwindowincreases.DetailedresultsarepresentedinTable
7.
Importantly,thisleapinlong-contextperformancedoesnotcomeattheexpenseofthecoremulti-modalcapabilitiesofthemodel
.3
Overall,wefindthatGemini1.5ProgreatlysurpassesGemini1.0Pro,performingbetteronthevastmajorityofbenchmarks(i.e.,27/31),increasingthemargininparticularforMath,ScienceandReasoning(+28.9%),Multilinguality(+22.3%),VideoUnderstanding(+11.2%)andCode(+8.9%)(seeTable
7
forbreakdowns).However,amorestrikingcomparisonistheonewithGemini1.0Ultra,astate-of-the-artmodelacrossmanycapabilities.DespiteGemini1.5Prousingsignificantlylesstrainingcomputeandbeingmoreefficienttoserve,wefindGemini1.5Protoperformbetteronmorethanhalfofthebenchmarks(16/31),inparticularontextbenchmarks(10/13)andmanyofthevisionbenchmarks(6/13).
Inthefollowingsections,weprovideanoverviewofthemodelarchitectureandpresenttheresultsoflarge-scalequantitativeevaluationscomparingGemini1.5ProtootherLLMs.Wepresentdetailedevaluationsforthemodel’slongcontextcapabilitiesfollowedbyevaluationsofitscorecapabilities,similartotheGemini1.0technicalreport(
Gemini-Teametal.,
2023
),coveringwell-studiedbenchmarksacrosstext,code,image,videoandaudio.Finally,wediscussourapproachtoresponsibledeployment,includingourprocessforimpactassessmentdevelopingmodelpolicies,evaluations,andmitigationsofharmbeforedeploymentdecisions
.4
2.ModelArchitecture
Gemini1.5Proisasparsemixture-of-expert(MoE)Transformer-basedmodelthatbuildsonGemini1.
0’s(Gemini-Teametal.,
2023)researchadvancesandmultimodalcapabilities
.Gemini1.5ProalsobuildsonamuchlongerhistoryofMoEresearchatGoogle(
Clarketal.,
2022;
Duetal.,
2022;
Fedus
etal.,
2021;
Lepikhinetal.,
2020;
Riquelmeetal.,
2021;
Shazeeretal.,
2017;
Zophetal.,
2022)and
languagemodelresearchinthebroaderliterature(Aniletal.,
2023;
Anthropic,
2023;
Brownetal.,
2020;
Chowdheryetal.,
2023;
Hoffmannetal.,
2022;
Jiangetal.,
2024;
Kimetal.,
2021;
OpenAI,
2023;
Raeetal.,
2021;
Raffeletal.,
2020;
Rolleretal.,
2021;
Thoppilanetal.,
2022;
Touvronetal.,
2023a
,b;
Vaswanietal.,
2017
).MoEmodelsusealearnedroutingfunctiontodirectinputstoasubsetofthemodel’sparametersforprocessing.Thisformofconditionalcomputation(
Bengioetal.,
2013;
3Wedefinethecorecapabilitiesasthosecapabilitiesofthemodelthatareprimarilynonlong-context(e.g.,math,science,reasoning,multilinguality,codeetc.)similartocapabilitiescoveredintheGemini1.0technicalreport
(Gemini-Teametal.,
2023
).
4Seethemodelcard
(Mitchelletal.,
2019a
)inAppendixSection
8.1.
Gemini1.5:Unlockingmultimodalunderstandingacrossmillionsoftokensofcontext
4
DavisandArel,
2014
;
Jacobsetal.,
1991
)allowsmodelstogrowtheirtotalparametercountwhilekeepingthenumberofparametersthatareactivatedforanygiveninputconstant.
Ahostofimprovementsmadeacrossnearlytheentiremodelstack(architecture,data,optimizationandsystems)allowsGemini1.5ProtoachievecomparablequalitytoGemini1.0Ultra(seeSection
5
),whileusingsignificantlylesstrainingcomputeandbeingsignificantlymoreefficienttoserve.Gemini1.5Proalsoincorporatesaseriesofsignificantarchitecturechangesthatenablelong-contextunderstandingofinputsupto10milliontokenswithoutdegradingperformance.Translatedintorealworlddata,thiscontextlengthenablesGemini1.5Promodelstocomfortablyprocessalmostadayofaudiorecordings(i.e.,22hours),morethantentimestheentiretyofthe1440pagebook(or
587,287words)"WarandPeace",theentireFlax(Heeketal.,
2023)codebase(41,070linesofcode),
orthreehoursofvideoat1frame-per-second.Further,sincethemodelisnativelymultimodalandsupportsinterleavingofdatafromdifferentmodalities,itcansupportamixofaudio,visual,text,andcodeinputsinthesameinputsequence.InSection
4.1
,wehighlightsomeofthenovelcapabilitiesenabledbytheseadvances,includingevaluationsthatyieldedpositiveresultsoncontextlengthsupto10million.Wenotethatunderstandingthelimitsofthesecapabilitiesandstudyingtheirexcitingcapabilitiesandapplicationsremainsanareaofcontinuedresearchexploration.
3.TrainingInfrastructureandDataset
LikeGemini1.0Ultraand1.0Pro,Gemini1.5Proistrainedonmultiple4096-chippodsofGoogle’sTPUv4accelerators,distributedacrossmultipledatacenters,andonavarietyofmultimodalandmultilingualdata.Ourpre-trainingdatasetincludesdatasourcedacrossmanydifferentdomains,includingwebdocumentsandcode,andincorporatesimage,audio,andvideocontent.Fortheinstruction-tuningphasewefinetunedGemini1.5Proonacollectionofmultimodaldata(containingpairedinstructionsandappropriateresponses),withfurthertuningbasedonhumanpreferencedata.WereferreaderstotheGemini1.0TechnicalReport(
Gemini-Teametal.,
2023
)forfurtherinformation.
4.Long-contextEvaluation
Existingevaluationsareincreasinglystrainedbythenewandrapidlyadvancingcapabilitiesoflargemultimodalmodels.Theytypicallyfocusonindividualmodalitiesand/orarerestrictedtotaskswithshortercontextlengths.Hence,thereisagrowingneedforbenchmarkswhichexemplifythenuancedrequirementsofrealworldlongmixed-modalityusecases.Amongthese,wehighlightthequantitativeassessmentofreasoningcapabilitiesacrosslongmixed-modalitysequencesasakeychallenge.
Withthechallengesofevaluatingincreasinglycapablemodelsinmind,ourevaluationofGemini1.5Profirstfocusesonunderstandingandevaluatingitsnovelcapabilities.Subsequently,weexplorecorebenchmarks,coveringcapabilitiesstudiedintheGemini1.0TechnicalReport(
Gemini-Team
etal.,
2023
).Specifically,weevaluateGemini1.5Prointhreemaincategories:
5
1.Qualitativelong-contextmultimodalevaluations:manuallyprobeandstress-testthemodel’slong-contextabilities,especiallyfornovelcapabilitieswherenoquantitativebenchmarksexist.
2.Quantitativelong-contextmultimodalevaluations:measurethemodel’slong-contextabilitiesonbothsyntheticandreal-worldtaskswithwell-definedmetrics.
3.Quantitativecoreevaluations:identifyprogressandregressionincorecapabilities(e.g.,coding,
math,science,multilingualityandinstructionfollowing).
5WenotethatalltheevaluationsarefromthesamecheckpointoftheGemini1.5Promodelthatisinstructiontunedpostpre-training,unlessotherwisestated.
Gemini1.5:Unlockingmultimodalunderstandingacrossmillionsoftokensofcontext
5
4.1.QualitativeExamplesofMultimodalLong-ContextCapabilities
Theabilitytoprocessmultiplemillionsoftokensunlockspracticalapplicationsthatwerenotpossiblebefore.InthissectionwedemonstratesomesurprisinginteractionsweobservedwithGemini1.5Proacrosscode,textandvideo.
AsshownintheFigure
2,
Gemini1.5ProisabletoingestentirelargecodebasessuchasJAX(746,152tokens),andanswerveryspecificqueriesaboutthem.inFigure
3
weshowGemini1.5Pro’sabilitytolearnanewlanguagebasedonlyonreferencematerialsgiveninitsinput(seeSection
forquantitativemetricsforthisusecase).Additionally,wetestGemini1.5Pro’sabilitytoansweranimagequerygiventheentiretextofLesMisérablesandobservethatbeingnativelymultimodalallowsittolocateafamousscenefromahand-drawnsketch,asshowninFigure
4.
Lastly,weaskGemini1.5Proquestionsaboutanentiremovieof45minutesinFigure
5
whichthemodelanswersseamlesslywhileretrievingmomentsandtimestampsdowntoasecond.
6
Figure2|Giventheentire746,152tokenJAXcodebaseincontext,Gemini1.5Procanidentifythespecificlocationofacoreautomaticdifferentiationmethod.
Figure3|Givenareferencegrammarbookandabilingualwordlist(dictionary),Gemini1.5ProisabletotranslatefromEnglishtoKalamangwithsimilarqualitytoahumanwholearnedfromthe
samematerials.
6ForadditionalshortvideosofdemonstrationsofthelongcontextabilitiesofGemini1.5Proacrossvideo,text,andcodesee
https://deepmind.google/technologies/gemini/.
Gemini1.5:Unlockingmultimodalunderstandingacrossmillionsoftokensofcontext
6
Figure4|WiththeentiretextofLesMisérablesintheprompt(1382pages,732ktokens),Gemini
1.5Proisabletoidentifyandlocateafamousscenefromahand-drawnsketch.
Figure5|Whenpromptedwitha45minuteBusterKeatonmovie“SherlockJr."(1924)(2,674framesat1FPS,684ktokens),Gemini1.5Proretrievesandextractstextualinformationfromaspecificframeinandprovidesthecorrespondingtimestamp.Atbottomright,themodelidentifiesasceneinthemoviefromahand-drawnsketch.
Gemini1.5:Unlockingmultimodalunderstandingacrossmillionsoftokensofcontext
7
4.2.Long-contextEvaluations
Forthepastfewyears,LLMresearchhasprioritizedexpandingthecontextwindowfromwhichmodelscanincorporateinformation(
Anthropic,
2023;
OpenAI,
2023)
.Thisemphasisstemsfromtherecognitionthatawidercontextwindowallowsmodelstoincorporatealargeramountofnew,task-specificinformationnotfoundinthetrainingdataatinferencetime,leadingtoimprovedperformanceinvariousnaturallanguageormultimodaltasks.Recentapproachestoimprovingthelong-contextcapabilitiesofmodelsfallintoafewcategories,includingnovelarchitecturalapproaches(
Ainslieetal.,
2023
;
GuandDao,
2023
;
Guoetal.,
2021
;
Orvietoetal.,
2023
;
Zaheeretal.,
2020
),post-training
modifications(Bertschetal.,
2023;
Chenetal.;
Pressetal.,
2021;
Xiongetal.,
2023),retrieval
-
augmentedmodels(Guuetal.,
2020;
Izacardetal.,
2022;
Jiangetal.,
2022;
Karpukhinetal.,
2020;
Santhanametal.,
2021
),memory-augmentedmodels(
Bulatovetal.,
2022,
2023
;
Martinsetal.,
2022;
Muetal.,
2023;
Wuetal.,
2022a,b;
Zhongetal.,
2022),andtechniquesforbuildingmore
coherentlong-contextdatasets(
Shietal.,
2023c;
Staniszewskietal.,
2023
).Thisactivityhasresultedinmeasurableimprovementsonlong-contextcapabilitiesofLLMsoverthepastseveralmonths,withtherecentconcurrentworkofLiuetal.(2024)exploringcontextwindowof7Bmodelsupto1Mmultimodaltokens.Notably,amongthestate-of-the-artLLMs,Anthropichassuccessfullyextendedthecontextoftheirtext-onlyClaude2modelto100ktokens,whileOpenAIhasrecentlyreleasedGPT-4Turboreaching128ktokens.Finally,thelatestadditiontotheserieswasClaude2.1withacontextwindowof200ktokens.
Gemini1.5Prosignificantlyextendsthiscontextlengthfrontiertomultiplemillionsoftokenswithalmostnodegradationinperformance,makingitpossibletoprocesssignificantlylargerinputs.ComparedtoClaude2.1witha200ktokencontextwindow,Gemini1.5Proachievesa100%recallat200ktokens,surpassingClaude2.1’s98%.This100%recallismaintainedupto530ktokens,andrecallis99.7%at1Mtokens.Whenincreasingfrom1Mtokensto10Mtokens,themodelretains99.2%recall.Moreover,Gemini1.5Pro’snativemultimodalcapabilitiesenablesthemodeltoingestmultiplehoursofaudioandvideorecordingsalongsideorinterleavedwithtext.SuchrecallcapabilitiesaresummarizedinFigure
1.
Belowwereportresultsonlong-contextevaluationsacrossallthreemodalities,i.e.,text,visionandaudio.
Theevaluationmethodologywefollowedtomeasurethelong-contextcapabilityofGemini1.5Proconsistsofbothdiagnostic-focusedprobingofthelongcontextcapabilities(e.g.,perplexityoverlongsequences,needle-in-a-haystackretrievalstudies)andrealisticevaluationsspecificallydesignedformultimodallong-contexttasks(e.g.,long-documentQA,long-contextautomaticspeechrecognition,learningtotranslateanewlanguagefromonlyonebook,andlong-contextvideoQA).Toprovideareferencepoint,throughoutthissectionwecompareGemini1.5Prowiththeleadingmodelavailableexternallyforeachtask.WiththeevaluationharnesswedevelopedforGemini1.5Proweareabletoquantifythequalityoflong-contextunderstandingcapabilitiesreliablyallthewayupto10Mtokens.
4.2.1.DiagnosticLong-ContextEvaluations
PerplexityoverLongSequences
Westartbyreportingresultsonthetextmodality.Toevaluatetheabilityofthemodelstomakeuseofverylongcontextstoimprovenext-tokenprediction,whichistheobjectivefunctionusedtotrainlanguagemodels,werecordthenegativelog-likelihood(NLL)oftokensatdifferentpositionsintheinputsequencesfromheld-outtext(i.e.,notusedintraining).Here,alowervalueimpliesanimprovedprediction.Typically,weexpecttokensatthebeginningofasequencetohavehighNLL,asthereislittletonocontextthatthemodelcanusetopredictthem,andtokenslaterinthesequencetohavelowerNLLasmoreinformationbecomesavailabletothemodel.Theshapeoftheresulting
Gemini1.5:Unlockingmultimodalunderstandingacrossmillionsoftokensofcontext
8
NegativeLog-Likelihood
(lowerisbetter)
CumulativeAverageNLLforCode
Gemini1.0ProGemini1.5Pro
Powerlawfit(r2=0.998)
1285122K8K32K128K512K2M10M
Sequenceposition
CumulativeAverageNLLforLongDocuments
Gemini1.0ProGemini1.5Pro
Powerlawfit(r2=0.998)
2561K4K16K64K256K1M
Sequenceposition
NegativeLog-Likelihood
(lowerisbetter)
Figure6|Cumulativeaveragenegativelog-likelihood(NLL)asafunctionoftokenpositioninlongdocumentsandcodedata.Alowervaluedemonstratesbetterprediction.Gemini1.5Proshowsimprovedpredictionsupto1Mtokensforlong-documentsand10Mtokensforcode,whereasGemini1.0Proimprovesuptoonly32Ktokens.TheNLLfollowsapower-lawtrendupuntil1Mtokens(documents)and2Mtokens(code)withadeviatingtrendat10Mtokens.
curveindicatestheabilitiesofmodelstoreasonoverlong-context.Adownwardtrendsignifiesmodelsmakinguseoflong-contexttoreducemodels’uncertainty.Ontheotherhand,anupwardtrendsignifiesthatmodelsareunabletoeffectivelyuseinformationfromthepreviouscontextandmaybedeterioratinginpredictionquality,highlightingthelimitationsintheirlong-contextunderstandingcapability.
Weperformthisanalysisontwodatasources:(a)adatasetoflongdocumentswithupto1milliontokens,and(b)adatasetofcoderepositoriesconstructedbyfirstrandomlyshufflingallthefilesandthenconcatenatingthem.Thecodedatasetcontainssequenceslongerthan1milliontokenswithsomenaturalformofsemanticassociation(e.g.,awholerepository),allowingforfurtherevaluationofsequencesofupto10Mtokens.Figure
6
showsthecumulativeNLLuptoaspecifictokenindex
.7
WealsofitapowerlawoftheformL(x)=αxβ+ytothesedatapoints(dashedline).
WefindinFigure
6
thatNLLdecreasesmonotonicallywithsequencelengthandthuspredictionaccuracyimprovesuptothetestedsequencelengths(1Mforlongdocuments,and10Mforcode),indicatingthatourmodelscanmakeuseofthewholeinputevenatverylong-contextlengths.Thissuggeststhatthemodelisabletoimproveitspredictionsbyfindingusefulpatternsintokens,eveniftheyoccurredmillionsoftokensinthepast,asinthecaseofcode.
Finally,weseethisimprovedpredictionfollowsaregularpower-lawstructure.Whileitiswellknownthatlanguagemodelsfollowapower-lawintermsoftrainingcomputetomodelperformance(NLL)(
Kaplanetal.,
2020)uptoaverylargescale,wedemonstratethatapowerlawcanhold
betweenlog-lossandcontextlengthuptoextremelylongcontextlengths.Weseethepower-lawfitisquiteaccurateupto1Mtokensforlong-documentsandabout2Mtokensforcode.Frominspectinglongercodetokenpredictionscloserto10M,weseeaphenomenaoftheincreasedcontextoccasionallyprovidingoutsizedbenefit(e.g.duetorepetitionofcodeblocks)whichmayexplainthepower-law
deviation.Howeverthisdeservesfurtherstudy,andmaybedependentontheexactdatasetused.
7WenotethatweareunabletoobtainlogitsforothercommerciallyavailableLLMsforcomparison.
Gemini1.5:Unlockingmultimodalunderstandingacrossmillionsoftokensofcontext
9
0
Depth(%)
14
29
43
57
71
86
100
Gemini1.5Pro:From1kto1Mtokens
32k128k256k512k1M
Tokens
Upto10Mtokens
2M5M10M
Tokens
GPT-4Turbo:From1kto128ktokens
Depth(%)
0142943577186
100
32k128k256k512k1M
Tokens
Figure7|TextHaystack.ThisfigurecomparesGemini1.5ProwithGPT-4Turboforthetextneedle-in-a-haystacktask.Greencellsindicatethemodelsuccessfullyretrievedthesecretnumber,graycellsindicateAPIerrors,andredcellsindicatethatthemodelresponsedidnotcontainthesecretnumber.ThetoprowshowsresultsforGemini1.5Pro,from1kto1Mtokens(topleft),andfrom1Mto10Mtokens(topright).ThebottomrowshowsresultsonGPT-4Turbouptothemaximumsupportedcontextlengthof128ktokens.Theresultsarecolor-codedtoindicate:greenforsuccessfulretrievalsandredforunsuccessfulones.
Text-Haystack
Next,wemovetotestinglong-contextrecallusingtherecentlyintroducedneedle-in-a-haystack
evaluation(Kamradt,
2023),whichtestsamodel’sabilitytoretrieveatext(i.e.,“needle”)insertedat
variouspositionsintoasequence(i.e.,“haystack”).Followingpriorwork(
Dhinakaran,
2024
),weuseasetofconcatenatedandrepeatedessayswrittenbyPaulGraham
8
tofillthedesiredcontextlength.Weinsertaneedleatlinearlyspacedintervalsfromthebeginningtotheendofthecontext,wheretheneededisi.e.,“Thespecialmagic{city}numberis:{number}”wherethecityandnumberarevariedforeachquery,andpromptthemodelwith“Hereisthemagicnumber:”.Wereportwhetherthemagicnumberrecallwascorrectatvariouscontextlengths(xaxis–thehaystack)asafunctionofitspositionintheinputsequenceexpressedintermsofdepthpercentage(yaxis),e.g.,depthat100%wouldindicateaneedleinsertedattheveryendoftheinputwhereas0%attheverybeginning.
AscanbeseeninFigure
7,
Gemini1.5Proachieves100%recallupto530ktokensand>99.7%recallupto1Mtokens.Thistask,whilesimple,providesacleardemonstrationthatGemini1.5Proisabletoreliablyretrieveinformationfromlongdocumentsupto1Mtokens.Forreference,wereportresultsforGPT-4Turbouptothe128Ksequencele
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 山西醫(yī)科大學(xué)《人體工程學(xué)》2023-2024學(xué)年第一學(xué)期期末試卷
- 2024年私人車輛抵押典當(dāng)行合作協(xié)議3篇
- 2024年室內(nèi)外裝飾裝修工程設(shè)計(jì)施工環(huán)保檢測(cè)合同3篇
- 2024年白糖采購(gòu)正式合同
- 2024年度外資企業(yè)股權(quán)轉(zhuǎn)讓及市場(chǎng)準(zhǔn)入服務(wù)協(xié)議3篇
- 2024年度違約責(zé)任條款及賠償細(xì)則示范合同3篇
- 2024年汽車免租金借用協(xié)議版B版
- 2024年文化旅游投資入股合作協(xié)議范本3篇
- 2024年電子商務(wù)企業(yè)股權(quán)轉(zhuǎn)讓與平臺(tái)運(yùn)營(yíng)服務(wù)合同3篇
- 2024年生態(tài)農(nóng)業(yè)用地經(jīng)營(yíng)權(quán)租賃承包合同3篇
- 一氯二氟甲烷安全技術(shù)說明書MSDS
- 企業(yè)外來人員管理制度規(guī)章制度
- 石油化工建設(shè)工程竣工報(bào)告
- 洞室開挖安全教育培訓(xùn)
- 房地產(chǎn)運(yùn)營(yíng)管理工作思路
- 決策分析案例分析報(bào)告
- 任務(wù)5.6 泰森多邊形分析
- 復(fù)旦大學(xué)免疫實(shí)驗(yàn)小鼠脾臟單個(gè)核細(xì)胞分離及細(xì)胞計(jì)數(shù)
- 《危重病醫(yī)學(xué)》試題庫(kù)
- 會(huì)理衛(wèi)生系統(tǒng)招聘2022年考試真題及答案解析【最全版】
- 苯-乙苯連續(xù)精餾塔的設(shè)計(jì)
評(píng)論
0/150
提交評(píng)論