智能體實驗室:將大語言模型(LLM)智能體作為研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第1頁
智能體實驗室:將大語言模型(LLM)智能體作為研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第2頁
智能體實驗室:將大語言模型(LLM)智能體作為研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第3頁
智能體實驗室:將大語言模型(LLM)智能體作為研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第4頁
智能體實驗室:將大語言模型(LLM)智能體作為研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第5頁
已閱讀5頁,還剩88頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

2025-1-9

Correspondingauthor(s):SamuelSchmidgall(sschmi46@)

AgentLaboratory:UsingLLMAgentsasResearchAssistants

SamuelSchmidgall1,2,YushengSu1,ZeWang1,XimengSun1,JialianWu1,XiaodongYu1,JiangLiu1,ZichengLiu1andEmadBarsoum1

1AMD,2JohnsHopkinsUniversity

arXiv:2501.04227v1[cs.HC]8Jan2025

Historically,scientificdiscoveryhasbeenalengthyandcostlyprocess,demandingsubstantialtimeandresourcesfrominitialconceptiontofinalresults.Toacceleratescientificdiscovery,reduceresearchcosts,andimproveresearchquality,weintroduceAgentLaboratory,anautonomousLLM-basedframeworkcapableofcompletingtheentireresearchprocess.Thisframeworkacceptsahuman-providedresearchideaandprogressesthroughthreestages—literaturereview,experimentation,andreportwritingtoproducecomprehensiveresearchoutputs,includingacoderepositoryandaresearchreport,whileenablinguserstoprovidefeedbackandguidanceateachstage.WedeployAgentLaboratorywithvariousstate-of-the-artLLMsandinvitemultipleresearcherstoassessitsqualitybyparticipatinginasurvey,providinghumanfeedbacktoguidetheresearchprocess,andthenevaluatethefinalpaper.Wefoundthat:(1)AgentLaboratorydrivenbyo1-previewgeneratesthebestresearchoutcomes;

(2)Thegeneratedmachinelearningcodeisabletoachievestate-of-the-artperformancecomparedtoexistingmethods;(3)Humaninvolvement,providingfeedbackateachstage,significantlyimprovestheoverallqualityofresearch;(4)AgentLaboratorysignificantlyreducesresearchexpenses,achievingan84%decreasecomparedtopreviousautonomousresearchmethods.WehopeAgentLaboratoryenablesresearcherstoallocatemoreefforttowardcreativeideationratherthanlow-levelcodingandwriting,ultimatelyacceleratingscientificdiscovery.

。https://AgentLaboratory.github.io

Figure1|AgentLaboratorytakesasinputahumanresearchideaandasetofnotes,providesthistoapipelineofspecializedLLM-drivenagents,andproducesaresearchreportandcoderepository.

AgentLaboratory:UsingLLMAgentsasResearchAssistants

2

1.Introduction

Scientistsfrequentlyfaceconstraintsthatlimitthenumberofresearchideastheycanexploreatanygiventime,resultinginideasbeingprioritizedbasedonpredictedimpact.Whilethisprocesshelpsdeterminewhichconceptsareworthinvestingtimeinandhowbesttoallocatelimitedresourceseffectively,manyhighqualityideasremainunexplored.Iftheprocessofexploringideashadlesslimitations,researcherswouldbeabletoinvestigatemultipleconceptssimultaneously,increasingthelikelihoodofscientificdiscovery.

Inanefforttoachievethis,recentworkhasexploredthecapabilityofLLMstoperformresearchideationandautomatedpapergeneration,whereLLMagentsperformtheroleofhumanscientists

(Baeketal.

(2024

);

Ghafarollahi&Buehler

(2024b

);

Luetal.

(2024a

);

Swansonetal.

(2024

)).Theworkof

Baeketal.

(2024)introducesResearchAgent,whichautomaticallygeneratesresearch

ideas,methods,andexperimentdesigns,iterativelyrefiningthemthroughfeedbackfrommultiplereviewingagentsthatmirrorpeerdiscussionsandleveragehuman-alignedevaluationcriteriatoimprovetheoutputs.

Luetal.

(2024a)exploresfullyautomatedpapergeneration,whereTheAI

Scientistframeworkgeneratesnovelresearchideas,writescode,conductsexperiments,andcreatesafullscientificpaperwithanautomatedpeer-reviewsystemtoevaluatethework.EventhoughtheseworksdemonstratethatcurrentLLMscangenerateideasjudgedtobemorenovelthanthoseproducedbyhumanexperts,

Sietal.

(2024)indicatesthatLLMsstillexhibitweaknessesinfeasibility

andimplementationdetails,suggestingacomplementaryratherthanreplacementroleforLLMsinresearch.Therefore,weaimtodesignanautonomousagentpipelinethatcanassisthumanstowardimplementingtheirownresearchideas.

Inthiswork,weintroduceAgentLaboratory,anautonomouspipelineforacceleratingtheindividual’sabilitytoperformmachinelearningresearch.Unlikepreviousapproaches,whereagents

participateintheirownresearchideationindependentofhumaninput(Baeketal.

(2024

);

Luetal.

(2024b)),AgentLaboratory

isdesignedtoassisthumanscientistsinexecutingtheirownresearchideasusinglanguageagents.AgentLaboratorytakesasinputahumanresearchideaandoutputsaresearchreportandcoderepositoryproducedbyautonomouslanguageagents,allowingvariouslevelsofhumaninvolvement,wherefeedbackcanbeprovidedatafrequencybasedonuserpreference.Adetailedlistofourcontributionsareprovidedbelow:

1.WeintroduceAgentLaboratory,anopen-sourceLLMagentframeworkforacceleratingtheindividual’sabilitytoperformresearchinmachinelearning.Inordertoaccommodateallusers,AgentLaboratoryiscomputeflexible,wherevariouslevelsofcomputecanbeallocatedbasedontheindividual’saccesstocomputeresource(e.g.,CPU,GPU,memory)andmodelinferencebudget.

2.HumanevaluatorsratedpapersgeneratedusingAgentLaboratoryacrossexperimentalquality,reportquality,andusefulness,showingthatwhiletheo1-previewbackendwasperceivedasthemostuseful,o1-miniachievedthehighestexperimentalqualityscores,andgpt-4owasbehindinallmetrics.

3.NeurIPS-styleevaluationsshowedthato1-previewperformedbestamongbackends,particularlyinclarityandsoundness,accordingtohumanreviewers.However,acleargapemergedbetweenhumanandautomatedevaluations,withautomatedscoressignificantlyoverestimatingquality(6.1/10vs.3.8/10overall).Similardiscrepancieswereseenacrossclarityandcontributionmetrics,suggestingtheneedforhumanfeedbacktocomplementautomatedevaluationsformoreaccurateassessmentsofresearchquality.

4.Co-pilotmodeinAgentLaboratorywasevaluatedoncustomandpreselectedtopics,showinghigheroverallscorescomparedtoautonomousmode.Co-pilotpapersalsosawtrade-offs

AgentLaboratory:UsingLLMAgentsasResearchAssistants

3

inexperimentalqualityandusefulness,reflectingchallengesinaligningagentoutputswithresearcherintent.

5.Theco-pilotfeatureinAgentLaboratoryisoverallfoundtohavehighutilityandusabilitywhenratedbyhumanusers,withmostparticipantsdecidingtocontinueusageaftertheirexperience

6.Detailedcostandinferencetimestatistics,aswellasthebreakdownofcostperpaperphase,arepresentedfordifferentmodelback-ends,demonstratingthatAgentLaboratoryoffersautomaticresearchatagreatlyreducedpricecomparedwithotherworks(only$2.33USDperpaperwithagpt-4obackend).

7.State-of-the-artperformanceonasubsetofMLE-Benchchallengesusingtheproposedmle-solver,achievinghigherconsistencyandscoringcomparedtoothersolvers,andearningmoremedals,

includinggoldandsilver,thanMLAB,OpenHands,andAIDE.

Wehopethatthisworktakesasteptowardacceleratingscientificdiscoveryinmachinelearning,allowingresearcherstoallocatemoreefforttowardcreativeideationandexperimentdesignratherthanlow-levelcodingandwriting.

2.Background&RelatedWork

LargelanguagemodelsTheresearchagentsinthispaperarebuiltonautoregressivelargelanguage

models(LLMs),whicharetrainedonextensivetextcorporatopredictconditionalprobabilitiesoftoken

sequences,p(xt|x<t;θ),andgeneratetextcompletionsthroughsampling,wherext~softmax(W·ht),

withhtasthehiddenstateandWasthelearnedweightmatrixmappingtotokenprobabilities.LLMs

utilizetransformerarchitectures(Vaswani

(2017))tocapturelong-rangedependenciesintext

.These

models,suchasClaude(

Anthropic

(2024)),Llama(

Dubeyetal.

(2024

);

Touvronetal.

(2023a

,b)),

andChatGPT(Achiametal.

(2023

);

Hurstetal.

(2024

);

OpenAI

(2022)),leveragevastdatasets

andscalingtechniques,thusenablingthemtoperformawidearrayoflanguage-basedtasks,suchas

translation,summarization,andreasoning,bygeneralizingpatternslearnedduringpretrainingto

novelinputs

Brown

(2020

).

LLMAgentsWhileLLMsdemonstratestrongunderstandingandreasoningabilities,theyfacechal-lengeswhenexecutingtasksinreal-worldscenarios.Toovercometheselimitations,theircapabilitiesareextendedthroughstructuredframeworks,enablingthemtoautonomouslyandsemi-autonomously

performtaskexecutionandsemi-autonomouslyperformtaskexecution(Chenetal.

(2023b

);

Li

etal.

(2023

);

Qianetal.

(2024

);

Wuetal.

(2023

)).Thesesystems,referredtoasagents,utilize

techniquessuchaschain-of-thoughtprompting(Weietal.

(2022)

),iterativerefinement(Shinnetal.

(2024)),self-improvement(

Huangetal.

(2022)),andexternaltoolintegrationtoexecutecomplex

workflows(Haoetal.

(2024

);

Qinetal.

(2023

);

Schicketal.

(2023

)).LLMagentshavemaderemarkableprogressinsolvingtasksofreal-worldsignificance,suchassoftwareengineering

Jimenez

etal.

(2023

);

Wangetal.

(2024b

);

Yangetal.

(2024)),cybersecurity(Abramovichetal.

(2024

);

Fangetal.

(2024

);

Wanetal.

(2024)),andmedicaldiagnosis(McDuffetal.

(2023

);

Schmidgall

etal.

(2024

);

Tuetal.

(2024

)).TherehasalsobeenprogressinapplyingLLMsagentstoembodied

problemssuchasautonomousrobotics(Blacketal.

(2024

);

Brohanetal.

(2022,

2023);

Kimetal.

(2024)),webtasks(

Dengetal.

(2024

);

Guretal.

(2023

);

Heetal.

(2024

);

Puttaetal.

(2024

);

Shi

etal.

(2017)),andgameplaying(ALetal.

(2024

);

Fengetal.

(2024

);

Wangetal.

(2023

)).ForabroaderoverviewofLLMagents,referto

Wangetal.

(2024a

).

AgentLaboratory:UsingLLMAgentsasResearchAssistants

4

AutomatedmachinelearningAutomatedmachinelearningisanareaofactiveresearch,withmanyapproachesfocusedonusingKaggle,anonlineplatformformachinelearningcompetitions,asabenchmarkforevaluatingagentperformance.NotableeffortsincludeMLE-

Bench(Chanetal.

(2024)),DS-bench(

Jingetal.

(2024)),andMLAgentBench(

Huangetal.

(2024))whichpropose

using75,74,and6KagglechallengesrespectivelyasbenchmarkstomeasuretheabilitiesofMLagentsintaskssuchasdatapreparation,modeldevelopment,andsubmission.SeveralML"solvers"whichcansolveMLchallengeshavebeenintroduced,suchasAIDE(

Schmidtetal.

(2024)),CodeActAgent

(referredtoas“OpenHands")(

Wangetal.

(2024b)),andResearchAgent(referredtoas“MLAB")

fromMLAgentBench(

Huangetal.

(2024))whichautomatefeatureimplementation,bugfixing,and

coderefactoringwithahighsuccessrate.AgentK(

Grosnitetal.

(2024))demonstratestheabilityto

solveKagglechallengesatthehuman-levelwithachallengeURLprovidedasinput.

AIinScientificDiscoveryAIhasbeenusedtosupportscientificdiscoveryacrossnumerousdisci-plinesfordecades.Forinstance,AIhasbeenusedfordiscoveryinmathematics(

Romera-Paredes

etal.

(2024

)

),materialscience(Merchantetal.

(2023

);

Pyzer-Knappetal.

(2022

);

Szymanskietal.

(2023)),chemistry(

Hayesetal.

(2024

);

Jumperetal.

(2021)),algorithmdiscovery(Fawzietal.

(2022)),andcomputationalbiology(

Dingetal.

(2024

)).TheseapproachespositionAIasatoolratherthananagentperformingresearchinautonomousresearch.

LLMsforresearchrelatedtasksLLMshavedemonstratedstrongcapabilitiesindiverseresearch-

relatedtasks,suchascodegeneration(Chenetal.

(2021

);

Nijkampetal.

(2022

)),end-to-endsoftware

development(Haietal.

(2024

);

Phanetal.

(2024

);

Qianetal.

(2023,

2024)),codegenerationfor

discovery(Chenetal.

(2024b

);

Ghafarollahi&Buehler

(2024a

);

Guetal.

(2024

);

Guoetal.

(2024

);

Huetal.

(2024b

);

Ifarganetal.

(2024

);

Majumderetal.

(2024)),researchquestion-answering

(Chenetal.

(2024a

);

Lálaetal.

(2023

);

Linetal.

(2024

);

Songetal.

(2024

)),researchideation

(Baeketal.

(2024

);

Ghafarollahi&Buehler

(2024b

);

Lietal.

(2024a

);

Sietal.

(2024

)),automatedpaperreviewing(

D’Arcyetal.

(2024

);

Liangetal.

(2024

);

Luetal.

(2024b

);

Wengetal.

(2024

)),literaturesearch(

Ajithetal.

(2024

);

Kang&Xiong

(2024

);

Lietal.

(2024b

);

Pressetal.

(2024

)),

andpredictingtheoutcomeofexperiments(Ashokkumaretal.

(2024

);

Lehretal.

(2024

);

Luoetal.

(2024

);

Manningetal.

(2024

);

Zhangetal.

(2024

)).AlthoughLLMshavemadenotableprogressinsolvingtheaforementionedtasks,ideationhasstruggledtoprogress,withsomeworkshowingthat

LLMideationleadstogreaternoveltythanhumans(Sietal.

(2024

)),whileothersshowreducedcreativity(

Chakrabartyetal.

(2024))andgreaterhomogeneouseffects(Andersonetal.

(2024

);

Zhouetal.

(2024))thatmaylimitcreativediscoverywithouthumanguidance

.

Additionally,researchonhuman-AIcollaborationhasreachedmixedconclusionsabouttheidea

novelty(Ashkinazeetal.

(2024

);

Liuetal.

(2024

);

Padmakumar&He

(2024

)).Thesefindingssuggestthat,withthecurrentLLMs,thestrongestresearchsystemswouldcombinehuman-guidedideationwithLLM-basedworkflows.

LLMsforautonomousresearchRecentadvancementsinautomatedscientificworkflowshavefocusedonleveragingLLMstoemulatetheprocessofresearch.

Swansonetal.

(2024

)introducesateamofLLMagentsworkingasscientistsalongsideahumanresearcherwhoprovideshigh-levelfeedback,withtheendresultbeingnovelnanobodybindersaimedataddressingrecentvariantsofSARS-CoV-2.

ChemCrow(M.Branetal.

(2024

)

)andCoscientist(Boikoetal.

(2023

))demonstratetheabilityforautonomousideationandexperimentationinchemistry.

ResearchAgent(Baeketal.

(2024

))automatesresearchideageneration,experimentdesign,anditerativerefinementusingfeedbackfromreviewingagentsalignedwithhumanevaluationcriterion.

TheAIScientist(Luetal.

(2024a

))extends

AgentLaboratory:UsingLLMAgentsasResearchAssistants

5

Figure2|AgentLaboratoryWorkflow.ThisimageillustratesthethreeprimaryphasesofAgentLaboratory:LiteratureReview,Experimentation,andReportWriting,eachfeaturingdistincttasks,tools,andhuman-agentroles.ThepipelineintegrateshumaninputwithLLM-drivenagents,suchasthePhDandPostdocagents,whichhandleliteraturereviews,experimentalplanning,datapreparation,andresultinterpretation.Specializedtoolslikemle-solverforexperimentationandpaper-solverforreportgenerationautomatetediousresearchtasks,enablingcollaborationbetweenhumanresearchersandAItoproducehigh-qualityresearchoutputs.

thisautomationtoencompassend-to-endscientificdiscovery,includingcoding,experimentexecution,andautomatedpeerreviewformanuscriptgeneration.Despitetheseadvancements,studieslike

Sietal.

(2024)highlightlimitationsinthefeasibilityandimplementationdetailsofLLMideation,

indicatingacomplementaryratherthanreplacementroleforLLMsinautonomousresearch.

3.AgentLaboratory

Overview.AgentLaboratorybeginswiththeindependentcollectionandanalysisofrelevantresearchpapers,progressesthroughcollaborativeplanninganddatapreparation,andresultsinautomatedexperimentationandcomprehensivereportgeneration.AsshowninFigure

2,

theoverallworkflowconsistsofthreeprimaryphases:(1)LiteratureReview,(2)Experimentation,and(3)ReportWriting.Inthissection,wewillintroducethesephasesindetailalongwiththecorrespondinginvolvedagents.Furthermore,inSection

4

,wewillconductqualitativeandquantitativeanalysestodemonstratethestrengthsofAgentLaboratoryanditsabilitytogenerate

3.1.LiteratureReview

LiteratureReview.Theliteraturereviewphaseinvolvesgatheringandcuratingrelevantresearchpapersforthegivenresearchideatoprovidereferencesforsubsequentstages.Duringthisprocess,thePhDagentutilizesthearXivAPItoretrieverelatedpapersandperformsthreemainactions:summary,fulltext,andaddpaper.Thesummaryactionretrievesabstractsofthetop20papersrelevanttotheinitialqueryproducedbytheagent.Thefulltextactionextractsthecompletecontentofspecificpapers,andtheaddpaperactionincorporatesselectedsummariesorfulltextsintothecuratedreview.Thisprocessisiterativeratherthanasingle-stepoperation,astheagentperformsmultiplequeries,evaluatestherelevanceofeachpaperbasedonitscontent,andrefinesthe

AgentLaboratory:UsingLLMAgentsasResearchAssistants

6

selectiontobuildacomprehensivereview.Oncethespecifiednumberofrelevanttexts(N=max)isreachedviatheaddpapercommand,thecuratedreviewisfinalizedforuseinsubsequentphases.

3.2.Experimentation

PlanFormulationTheplanformulationphasefocusesoncreatingadetailed,actionableresearchplanbasedontheliteraturereviewandresearchgoal.Duringthisphase,thePhDandPostdocagentscollaboratethroughdialoguetospecifyhowtoachievetheresearchobjective,detailingexperimentalcomponentsneededtocompletethespecifiedresearchideasuchaswhichmachinelearningmodelstoimplement,whichdatasetstouse,andthehigh-levelstepsoftheexperiment.Onceaconsensusisreached,thePostdocagentsubmitsthisplanusingtheplancommand,whichservesasasetofinstructionsforsubsequentsubtasks.

DataPreparation.Thegoalofthedatapreparationphaseistowritecodethatpreparesdataforrunningexperiments,usingtheinstructionsfromtheplanformulationstageasaguideline.TheMLEngineeragentexecutescodeusingPythoncommandcommandandobservesanyprintedoutput.TheMLEngineerhasaccesstoHuggingFacedatasets,searchableviathesearchHFcommand.Afteragreeingonthefinalizeddatapreparationcode,theSWEngineeragentsubmitsitusingthesubmitcodecommand.Beforethefinalsubmissionproceeds,thecodeisfirstpassedthroughaPythoncompilertoensurethattherearenocompilationissues.Thisprocesswillbeiterativelyexecuteduntilthecodeisbug-free.

RunningExperiments.Intherunningexperimentsphase,theMLEngineeragentfocusesonimple-mentingandexecutingtheexperimentalplanformulatedprior.Thisisfacilitatedbymle-solver,aspecializedmoduledesignedtogenerate,test,andrefinemachinelearningcodeautonomously.mle-solverbeginsbyproducinginitialcodebasedontheresearchplanandinsightsfromtheliteraturereview.Forthefirstmle-solverstep,theprogramisemptyandmustgenerateafilefromscratch,whichisusedasthetopscoringprogram.Thefollowingprocessesdescribetheworkflowofthemle-solver:

A.CommandExecution.Duringthecommandexecutionphase,aninitialprogramissampledfromamaintainedsetoftop-performingprograms,whichisrepresentedbyasinglefiledur-inginitialization.Themle-solveriterativelyrefinesthisprogramthroughtwooperations,REPLACEandEDIT,tobetteraligntheoutputwithexperimentalobjectives.TheEDITopera-tionidentifiesarangeoflines,substitutingthecodebetweenthespecifiedlinenumberswithnewlygeneratedcode.Incontrast,theREPLACEoperationgeneratesacompletelynewPythonfile.

B.CodeExecution.Afteracodecommandisexecuted,thenewprogramispassedthroughacompilertocheckforruntimeerrors.Ifitsuccessfullycompiles,ascoreisreturnedandthelistoftopprogramsisupdatedifthescoreishigherthantheexistingprograms.Ifthecodedoesnotcompile,theagentattemptstorepairthecodeforNreptries(Nrep=3inourexperiments)beforereturninganerrorandmovingontoanewcodereplacement.

C.ProgramScoring.Ifacodesucceedsincompilation,itissenttoascoringfunctionwhichdeterminesifitisbetterthanpreviouslyimplementedexperimentcode.Inordertoobtainaprogramscore,weimplementascoringfunctionthatusesanLLMrewardmodeltoassesstheeffectivenessoftheMLcodegeneratedbymle-solver.Therewardmodel,invokedasanLM,scorestheprogramonascalefrom0to1consideringtheoutlinedresearchplan,theproducedcode,andtheobservedoutputtodeterminehowaccuratelytheprogramadheresto

AgentLaboratory:UsingLLMAgentsasResearchAssistants

7

Figure3|Overviewofthemle-solverworkflow.ThisdiagramdetailstheiterativeprocessusedbytheMLE-Solvertoautonomouslygeneratemachinelearningcode.Beginningwithexternalresources,theworkflowintegratescommandexecution(A),wherenewcodeisgenerated,followedbycodeexecution(B)tocompileandrepairissuesifneeded.Programscoring(C)evaluatesthegeneratedcodeusingarewardfunction,whileself-reflection(D)helpsrefinefutureiterationsbasedonresults.Performancestabilization(E)ensuresconsistentoutcomesbymaintainingapooloftop-performingprogramsanditerativeoptimization.

theinitialgoals.Ascoreof1isprovidedforresultswithhighalignmentandeverythingbelowonaspectrumofhowcloselytheoutputandcodematchestheplanninggoals.Thisprocessis

similartoexistingmethodsforLLMreasoningtreesearch(Yaoetal.

(2024

)),whereinsteadofaseriesofreasoningstepsbeingtraversedusingself-evaluatedLLMscoring,thesetofpossibleprogramsarebeingtraversed(viaEDITandREPLACEcommands)andtheresultingprogramoutcomeisself-evaluatedtodetermineifaprogramisworthbuildingon.ThisissimilartotheSolutionSpaceSearchofAIDE(

Schmidtetal.

(2024)),howevertheirmethodwasspecifically

designedfortheKagglecompetitionsandissimplyextractingtheaccuracyratherthanscoringtheresearchcodeandoutcomes.

D.SelfReflection.Whetherthecodesucceedsorfails,aself-reflectionisproducedbasedon

theexperimentalresultsortheencounterederrorsignal(Renze&Guven

(2024

);

Shinnetal.

(2024

)).Here,themle-solverispromptedtoreflectontheoutcomeofitsactions.Iftheprogramfailedtocompile,thesolverreflectsonhowtofixthisissueinnextiterations.Ifitsuccessfulycompilesandreturnsascore,thesolverwillreflectonhowtoincreasethisscore.Thesereflectionsaregeneratedtoimprovefutureperformance,ensuringthatthesystemlearnsfromerrors,improvingthequalityandrobustnessofthegeneratedcodeoveriterativecycles.

E.PerformanceStabilizationTopreventperformancedrift,twomechanismsareimplemented:topprogramsamplingandbatch-parallelization.Intopprogramsampling,acollectionofthehighest-scoringprogramsismaintained,andoneprogramisrandomlysampledbeforeexecutingacommand,ensuringdiversitywhileretainingquality.Forbatch-parallelization,eachsolverstepinvolvesmakingNmodificationssimultaneously,withthetopmodificationselectedtoreplacethelowest-scoringprograminthetopcollection.Thesestrategiesusehigh-entropysamplingtomodifythecode,resultinginabalancebetweenexplorationofnewsolutionsand

AgentLaboratory:UsingLLMAgentsasResearchAssistants

8

Figure4|Graphicaloutlineofpaper-solver.Thisdiagramshowcasesthestep-by-stepprocessofgeneratingandrefiningacademicresearchreportsusingthePaper-Solvertool.Theworkflowstartswiththecreationofaninitialreportscaffold(A)byiterativelygeneratingLaTeX-basedsections,followedbyupdatestoensurestructuralcompleteness.(B)ResearchisperformedthroughanArxivtoolduringrelevantsections.IntheReportEditingphase(C),thelanguagemodelappliestargetededitstoimprovethedocument,withLaTeXcompilationverifyingtheintegrityofchanges.Finally,thecompletedreportundergoesareward-basedevaluationduringthePaperReviewphase(D),ensuringalignmentwithacademicstandardsandresearchgoals.

refinementofexistingonesinordertomaintainstablecodemodifications.

ResultsInterpretation.Thegoaloftheresultsinterpretationphaseistoderivemeaningfulinsightsfromexperimentaloutcomestoinformthefinalreport.ThePhDandPostdocagentsdiscusstheirun-derstandingoftheexperimentalresultsproducedbymle-solver.Oncetheyagreeonameaningfulinterpretationthatcouldcontributetoacompellingacademicpaper,thePostdocagentsubmitsitusingtheinterpretationcommand,formingthebasisforthereportwritingphase.

3.3.ReportWriting

ReportWriting.Inthereportwritingphase,thePhDandProfessoragentsynthesizetheresearchfindingsintoacomprehensiveacademicreport.Thisprocessisfacilitatedbyaspecializedmodulecalledpaper-solver,whichiterativelygeneratesandrefinesthereport.Thepaper-solveraimstoactasareportgenerator,positioningtheworkthathasbeenproducedbypreviousstagesofAgentLaboratory.paper-solverdoesnotaimtoentirelyreplacetheacademicpaper-writingprocess,butrathertosummarizetheresearchthathasbeenproducedinahuman-readableformatsothattheresearcherusingAgentLaboratoryunderstandswhathasbeenaccomplished.Theoutputfollowsthestandardstructureofanacademicpaper,ensuringitmeetsconferencesubmissionrequirements(forthepaperscoringphase)whilebeingclearandmethodical.Thefollowingprocessesdescribetheworkflowofpaper-solver:

A.InitialReportScaffold.Thefirsttaskofthepaper-solveristogenerateaninitialscaffoldfortheresearchpaper.Thisscaffoldoutlinesthedocumentstructure,dividingitintoeightstan-dardizedsections:Abstract,Introduction,Background,RelatedWork,Methods,ExperimentalSetup,Results,andDiscussion.D

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論