克服難題推進人工智能實踐+Overcoming+the+Hard+Problems+to+Advance+AI+Practice_第1頁
克服難題推進人工智能實踐+Overcoming+the+Hard+Problems+to+Advance+AI+Practice_第2頁
克服難題推進人工智能實踐+Overcoming+the+Hard+Problems+to+Advance+AI+Practice_第3頁
克服難題推進人工智能實踐+Overcoming+the+Hard+Problems+to+Advance+AI+Practice_第4頁
克服難題推進人工智能實踐+Overcoming+the+Hard+Problems+to+Advance+AI+Practice_第5頁
已閱讀5頁,還剩48頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領

文檔簡介

AIPRACTICE

OvercomingtheHardProblemstoAdvance

AIPractice

TakingdataanalyticstoamoreadvancedlevelwithAItools

meansconfrontingtherisksandpitfallsofmachinelearning

algorithms.

Sponsoredby:

Reallearning

Realimpact

SUMMER2024

SPECIALREPORT

[SpecialReport]

OvercomingtheHardProblemstoAdvanceAIPractice

A

sexcitementaroundlargelanguage

models(LLMs)spursspendingonAI,thesalientquestionforbusinessleaders

remains,Whatisthereturnonourdatascienceinvestments?Inthenearterm,advancedanalyticsandmachinelearn-ingaretheworkhorsetechnologiesfor

creatingsignificantvaluefromdataassets.Notthatdoingsoiseasy;companiesfacenumerouschal-

lengesalongtheway.

MuchAIriskbecomesapparentwhensystems

areinproduction,sotrulyresponsibleAIisn’tjustaconcernatthefrontendofthedevelopmentpro-

cess.CathyO’Neil,whoposedhardquestionsabouttheunintendedconsequencesofalgorithmicdeci-

sion-makinginher2016book,WeaponsofMath

Destruction,haspioneeredthepracticeofalgo-

rithmicauditing.O’NeilandcoauthorsJakeAppel

andSamTyner-Monroewalkreadersthroughtheirapproachanddiscusshowitcanbeappliedtogener-ativeAItoolsaswell.

Thetrade-offbetweenusingdataforinsights

andprotectingcustomers’personaldatagrowsonlymoredifficultasbadactorsimprovetheirtechniquesforre-identifyinganonymizeddatasets.Gregory

Vial,JulienCrowe,andPatrickMesanaexplainwhydealingwiththischallengewillrequiredatascientiststogainamoresophisticatedunderstandingofdata

protectionandcompelcybersecuritystaffstolearnawiderrangeofprotectiontechniques.TheydrawlessonsfromemergingpracticesatNationalBank

ofCanada,wheredatascientists,dataowners,andcybersecurityteamsarecollaboratingtoapplydataprotectionpracticesthatdon’trenderdataunusableforanalytics.

Whenmachinelearningprojectsdogetthe

go-ahead,however,toomanyinitiativesfailupon

adoptionbecausedatascientistsdidn’tthoroughly

understandtheoriginalbusinessproblem.Tofindoutwheresucheffortsaregoingwrong,DusanPopovic,ShreyasLakhtakia,WillLandecker,andMelissa

Valentinestudieddatascienceprojectsthatwere

shelved.Theyfoundthatconvincingdatascientiststodroptheirassumptionsandstartaskingmorefun-damentalquestionsoftheirbusinesscounterpartsiskeytoavoidingmachinelearningprojectfailures.

Finally,justascorporationsareexperimenting

withLLMstofigureoutwheretheycanaddvalue

atrelativelylowrisk,advancedanalyticsteamscan

belookingathowtheymightincorporategenera-

tiveAIintopractice.PedroAmorimandJo?oAlves

seepromiseforLLMstotakeonsomedatasciencedrudgery,andfortheirnaturallanguageinterfacestomakeiteasierforbusinessmanagerstocollaborateinthedevelopmentprocessandunderstandresults.

—TheMITSMREditors

1

Auditing

AlgorithmicRisk

9

AvoidMLFailures

byAskingtheRight

Questions

13

HowGenerativeAI

CanSupportAdvanced

AnalyticsPractice

18

ManagingDataPrivacy

RiskinAdvanced

Analytics

23

Sponsor’sViewpoint

FromNumbersto

Narratives:UsingLanguagetoEnhanceGenerativeAI

PaulGarlandsummer202429

AIPRACTICE

[ResponsibleAI]

AuditingAlgorithmicRisk

Howdoweknowwhetheralgorithmicsystemsareworkingasintended?AsetofsimpleframeworkscanhelpevennontechnicalorganizationscheckthefunctioningoftheirAItools.

ByCathyO’Neil,JakeAppel,andSamTyner-Monroe

A

RTIFICIALINTELLIGENCE,LARGELANGUAGEMODELS

(LLMs),andotheralgorithmsareincreasinglytakingoverbureaucratic

processestraditionallyperformedbyhumans,whetherit’sdecidingwho

isworthyofcredit,ajob,oradmissiontocollege,orcompilingayear-end

revieworhospitaladmissionnotes.

Buthowdoweknowthatthesesystemsareworkingasintended?And

whomighttheybeunintentionallyharming?

Giventhehighlysophisticatedandstochasticnatureofthesenewtechnologies,wemightthrowupourhandsatsuchquestions.Afterall,noteventheengineerswhobuildthesesystemsclaimtounderstandthementirelyortoknowhowtopredictorcontrolthem.Butgiventheirubiquityandthehighstakesinmanyusecases,itisimportantthat

PAULGARLANDSPECIALREPORT?“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”?MITSLOANMANAGEMENTREVIEW1

wefindwaystoanswerquestionsabouttheunin-tendedharmstheymaycause.Inthisarticle,weofferasetoftoolsforauditingandimprovingthesafetyofanyalgorithmorAItool,regardlessofwhetherthosedeployingitunderstanditsinnerworkings.

Algorithmicauditingisbasedonasimpleidea:Identifyfailurescenariosforpeoplewhomightgethurtbyanalgorithmicsystem,andfigureouthowtomonitorforthem.Thisapproachreliesonknowing thecompleteusecase:howthetechnologyisbeingused,byandforwhom,andforwhatpurpose.Inotherwords,eachalgorithmineachusecaserequiresseparateconsiderationofthewaysitcanbeusedfor—oragainst—someoneinthatscenario.

ThisappliestoLLMsaswell,whichrequireanapplication-specificapproachtoharmmeasurementandmitigation.LLMsarecomplex,butit’snottheirtechnicalcomplexitythatmakesauditingthemachallenge;rather,it’sthemyriadusecasestowhichtheyareapplied.Thewayforwardistoaudithowtheyareapplied,oneusecaseatatime,startingwiththoseinwhichthestakesarehighest.

Theauditingframeworkswepresentbelowrequireinputfromdiversestakeholders,including

ASimplifiedEthicalMatrix

Eachcellofthematrixrepresentshowacertainconcernappliestoaparticularstakeholdergroup.Cellsthatindicatewherea

stakeholdercouldbegravelyharmedorthealgorithmviolatesahardconstraintareshadedred.Cellsthatraisesomeethicalworriesforthestakeholderarehighlightedyellow,andcells

thatsatisfythestakeholder’sobjectivesandraisenoworriesarehighlightedgreen.

CONCERNS

Falsepositive(transactiongetsflaggedbutisn’ttrulyfraud)

Falsenegative(transactionistrulyfraudbutdoesnotgetflagged)

STAKEHOLDERS

Company

Nonfraudulentcustomers

Fraudsters

TSERIOUSCONCERNTMODERATECONCERNQMINIMAL/NOCONCERNTBENEFIT

affectedcommunitiesanddomainexperts,throughinclusive,nontechnicaldiscussionstoaddressthecriticalquestionsofwhocouldbeharmedandhow.Ourapproachworksforanyrule-basedsystemthataffectsstakeholders,includinggenerativeAI,bigdatariskscores,orbureaucraticprocessesdescribedinaflowchart.Thiskindofflexibilityisimportant,givenhowquicklynewtechnologiesarebeingdevel-opedandapplied.

Finally,whileournotionofauditsisbroadinthatrespect,itisnarrowinscope:Analgorithmicauditraisesalertsonlytoproblems.Itthenfallstoexpertstoattempttosolvethoseproblemsoncethey’vebeenidentified,althoughitmaynotbepossibletofullyresolvethemall.Addressingtheproblemshigh-lightedbyalgorithmicauditingwillspurinnovationaswellassafeguardsocietyfromunintendedharms.

EthicalMatrix:IdentifyingtheWorst-CaseScenarios

Inagivenusecase,howcouldanalgorithmfail,andforwhom?AtO’NeilRiskConsulting&AlgorithmicAuditing(ORCAA),wedevelopedtheEthicalMatrixframeworktoanswerthisquestion.1

TheEthicalMatrixidentifiesthestakeholdersofthealgorithminthecontextofitsintendeduseandhowtheyarelikelytobeaffectedbyit.Here,wetakeabroadapproach:Anybodyaffectedbythealgorithm,includingitsbuildersanddeployers,users,andothercommunitiespotentiallyimpactedbyitsadoption,arestakeholders.Whensubgroupshavedistinctcon-cerns,theycanbeconsideredseparately;forexample,iflighter-anddarker-skinnedpeoplehavedifferentconcernsaboutafacialrecognitionalgorithm,theywillhaveseparaterowsintheEthicalMatrix.

Next,weaskrepresentativesofeachstakeholdergroupwhattheirconcernsare,bothpositiveandneg-ative,abouttheintendeduseofthealgorithm.It’sanontechnicalconversation:Wedescribethesys-temassimplyaspossibleandask,“Howcouldthissystemfailforyou,andhowwouldyoubeharmedifthishappened?Ontheotherhand,howcoulditsucceedforyou,andhowwouldyoubenefit?”TheiranswersbecomethecolumnsoftheEthicalMatrix.Toillustrate,imaginethatapaymentscompanyhasafrauddetectionalgorithmreviewingalltransactionsandflaggingthosemostlikelytobefraudulent.Ifatransactionisflagged,itgetsblocked,andthatcus-tomer’saccountgetsfrozen.Falseflagsarethere-foreamajorheadacheforcustomers,andthelostbusinessfromblocksandfreezes(andcomplaintsfromannoyedcustomers)isamoderateworryfor

SPECIALREPORT?“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”?MITSLOANMANAGEMENTREVIEW2

AIPRACTICE

ResponsibleAI

thecompany.Conversely,ifafraudulenttransactiongoesundetected,thecompanyisharmedbutnon-fraudulentcustomersareindifferent.Belowisasim-plifiedEthicalMatrixforthisscenario.

EachcelloftheEthicalMatrixrepresentshowaparticularconcernappliestoaparticularstakeholdergroup.

Tojudgetheseverityofagivenrisk,weconsiderthelikelihoodthatitwillberealized,howmanypeo-plewouldbeharmed,andhowbadly.Wherepossible,

weuseexistingdatatodeveloptheseestimates.Wealsoconsiderlegalorproceduralconstraints—forinstance,whetherthereisalawprohibitingdiscrimi-nationonthebasisofcertaincharacteristics.Wethencolor-codethecellstohighlightthebiggest,mostpressingrisks.Cellsthatconstitute“existentialrisks,”whereastakeholdercouldbegravelyharmedorthealgorithmviolatesahardconstraint,areshadedred.Cellsthatraisesomeethicalworriesforthestake-holderarehighlightedyellow,andcellsthatsatisfythestakeholder’sobjectivesandraisenoworriesarehighlightedgreen.

Finally,zoomingoutonthewholeEthicalMatrix,weconsiderhowtobalancethecompetingconcernsofthealgorithm’sstakeholders,usuallyintheformofbalancingthedifferentkindsandconsequencesoferrorsthatfallondifferentstakeholdergroups.

TheEthicalMatrixshouldbealivingdocumentthattracksanongoingconversationamongstake-holders.Ideally,itisfirstdraftedduringthedesignanddevelopmentphaseofanalgorithmicapplica-tionor,atminimum,asthealgorithmisdeployed,anditshouldcontinuetoberevisedthereafter.Itisnotalwaysobviousattheoutsetwhoallofthestakeholdergroupsare,norisitfeasibletofindrep-resentativesforeveryperspective;additionally,newconcernsemergeovertime.Wemighthearfrompeo-pleexperiencingindirecteffectsfromthealgorithm,orasubgroupwithanewworry,andneedtorevisetheEthicalMatrix.

ExplainableFairness:Metricsand

Thresholds

ManyofthestakeholderconcernsidentifiedintheEthicalMatrixrefertosomecontextualnotionoffairness.

AtORCAA,wedevelopedaframeworkcalledExplainableFairnesstomeasurehowgroupsaretreatedbyalgorithmicsystems.2Itisanapproachtounderstandingexactlywhatismeantby“fairness”inagivennarrowcontext.

Forexample,femalecandidatesmightworrythat

Benchmarkingand

redteamingaretwo

approachestoauditing

LLMsindiverseusecases.

anAI-basedresume-screeningtoolgavelowerscores

forwomenthanmen.It’snotassimpleascompar-

ingscoresbetweenmenandwomen.Afterall,ifthe

malecandidatesforagivenjobhavemoreexperience

andqualificationsthanthefemalecandidates,their

higherscoresmightbejustified.Thiswouldbecon-

sideredlegitimatediscrimination.

Therealworryisthat,amongequallyquali-

fiedcandidates,menarereceivinghigherscores

thanwomen.Thedefinitionof“equallyqualified”

dependsonthecontextofthejob.Inacademia,rel-

evantqualificationsmightincludedegreesandpub-

lications;inaloggingoperation,theymightinvolve

physicalstrengthandagility.Theyarefactorsone

wouldlegitimatelytakeintoaccountwhenassess-

ingacandidateforaspecificrole.Twocandidatesfor

ajobareconsideredequallyqualifiediftheylookthe

sameaccordingtotheselegitimatefactors.

ExplainableFairnesscontrolsforlegitimatefac-

torswhenweexaminetheoutcomeinquestion.For

anAIresume-screeningtool,thiscouldmeancom-

paringaveragescoresbygenderwhilecontrollingfor

yearsofexperienceandlevelofeducation.Acriti-

calpartofExplainableFairnessisthediscussionof

legitimacy.

Thisapproachisalreadyusedimplicitlyinother

domains,includingcredit.InaFederalReserveBoard

analysisofmortgagedenialratesacrossraceandeth-

nicity,theresearchersranregressionsthatincluded

controlsfortheloanamount,theapplicant’sFICO

score,theirdebt-to-incomeratio,andtheloan-to-

valueratio.3Inotherwords,totheextentthatdif-

ferencesinmortgagedenialratescanbeexplainedby

thesefactors,it’snotracediscrimination.Inthelan-

guageofExplainableFairness,theseareacceptedas

legitimatefactorsformortgageunderwriting.What

ismissingistheexplicitconversationaboutwhythe

legitimatefactorsare,infact,legitimate.

Whatwouldsuchaconversationlooklike?Inthe

U.S.,mortgagelendersconsiderapplicants’FICO

creditscoresintheirdecision-making.FICOscores

arelower,onaverage,forBlackandHispanicpeople

SPECIALREPORT?“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”?MITSLOANMANAGEMENTREVIEW3

thanforWhiteandAsianpeople,soit’snosurprisethatmortgageapplicationsfromBlackandHispanicapplicantsaredeniedmoreoften.?LenderswouldlikelyarguethatFICOscoreisalegitimatefactorbecauseitmeasuresanapplicant’screditworthiness,whichisexactlywhatalendershouldcareabout.YetFICOscoresencodeunfairnessinimportantways.Forinstance,mortgagepaymentshavelongcountedtowardFICOscores,whilerentpaymentsstartedbeingcountedonlyin2014,andonlyinsomeversionsofthescores.?Thispracticefavorshome-ownersoverrenters,anditisknownthatdecadesofracistredliningpracticescontributedtotoday’sracedisparitiesinhomeownershiprates.ShouldFICOscoresthatreflectthevestigesofthesepracticesbeusedtoexplainawaydifferencesinmortgagedenialratestoday?

Wewillnotsettlethisdebatehere;thepointisthatit’saquestionofethicsandpolicy,notamathproblem.ExplainableFairnesssurfacesdifficultques-tionsliketheseandassignsthemtotherightpartiesforconsideration.

Whenlookingatdisparateoutcomesthatarenotexplainedbylegitimatefactors,wemustdefinethresholdvaluesorlimitsthattriggeraresponseorintervention.

Theselimitscouldbefixedvalues,suchasthefour-fifthsruleusedtomeasureadverseimpactinhiring.?Ortheycouldberelative:Imaginearegu-lationrequiringcompanieswithagenderpaygapabovetheindustryaveragetotakeactiontoreducethegap.ExplainableFairnessdoesnotinsistonacer-taintypeoflimitbutpromptsthealgorithmicriskmanagertodefineeachoneforeachpotentialstake-holderharm.

JudgingFairnessinInsurers’Algorithms

Let’sconsiderarealexamplewheretheEthicalMatrixandExplainableFairnesswereusedtoaudittheuseofanalgorithm.In2021,ColoradopassedSenateBill(SB)21-169,whichprotectsColoradoconsumersfromunfairdiscriminationininsurance,particularlyfrominsurers’useofalgorithms,pre-dictivemodels,andbigdata.?Aspartofthelaw’s

AnLLMred-teaming

exerciseisdesignedtoelicit

unwantedresponses.

implementation,whichORCAAassistedwith,theColoradoDivisionofInsurance(DOI)releasedaninitialdraftregulationforinformalcommentthatdescribedquantitativetestingrequirementsandlaidouthowinsurerscoulddemonstratethattheiralgo-rithmsandmodelswerenotunfairlydiscriminating.Althoughthelawappliestoalllinesofinsurance,thedivisionchosetostartwithlifeinsurance.

TheEthicalMatrixisstraightforwardherebecausethestakeholdergroupsandconcernsaredefinedexplicitlybythelaw.Itsprohibitionofdis-criminationonthebasisof“race,color,nationalorethnicorigin,religion,sex,sexualorientation,disa-bility,genderidentity,orgenderexpression”meanseachgroupwithineachofthoseclassesgotarowinthematrix.Asforconcerns,algorithmscouldcauseconsumerstobetreatedunfairlyatvariousstagesoftheinsurancelifecycle,includingmarketing,under-writing,pricing,utilizationmanagement,reimburse-mentmethodologies,andclaimsmanagement.TheDOIchosetostartwithunderwriting—thatis,whichapplicantsareofferedcoverage,andatwhatprice—andfocusinitiallyonraceandethnicity.

Insubsequentconversationswithstakeholders,however,theDOIgrappledwithissuesrelatedtotheExplainableFairnessframework:Aresimilarappli-cantsofdifferentracesdeniedatdifferentrates,orchargeddifferentpricesforsimilarcoverage?Whatmakestwolifeinsuranceapplicants“similar,”andwhatfactorscouldlegitimatelyexplaindifferencesindenialsorprices?Thisisthedomainoflifeinsur-anceexperts,notdatascientists.

TheDOIultimatelysuggestedconsideringfac-torsbroadlyconsideredrelevanttoestimatingthepriceofagivenlifeinsurancepolicy:thepolicytype(suchastermversuspermanent);thedollaramountofthedeathbenefit;andtheapplicant’sage,gender,andtobaccouse.

Thedivision’sdraftquantitativetestingregula-tionforSB21-169instructsinsurerstodoregressionanalysesofapproval/denialandpriceacrossraces,anditexplicitlypermitsthemtoincludethosefactors(suchaspolicytypeanddeathbenefitamount)ascontrolvariables.?Moreover,theregulationdefineslimitsthattriggeraresponse:Iftheregressionsfindstatisticallysignificantandsubstantialdifferencesindenialratesorprices,theinsurermustdofurthertestingtoinvestigatethedisparityand,pendingtheresults,mayhavetoremediatethedifferences.?

Havinglookedathowwewouldauditsimpleralgorithms,letusnowturntohowwewouldeval-uateLLMs.

SPECIALREPORT?“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”?MITSLOANMANAGEMENTREVIEW4

AIPRACTICE

ResponsibleAI

EvaluatingLargeLanguageModels

LLMshavetakentheworldbystorm,largelyduetotheirwideappealandapplicability.Butitisexactlythediversityofusesofthesemodelsthatmakesthemhardtoaudit.TwoapproachestoevaluatingLLMs,namelybenchmarkingandredteaming,pres-entawayforward.

TheBenchmarkingApproachtoLLMEvaluation.Benchmarkingmeasurestheperfor-manceofanLLMacrossoneormorepredefined,quantifiabletasksinordertocompareitsperfor-mancewiththatofothermodels.Inthesimplestterms,abenchmarkisadatasetconsistingofinputsandcorrespondingdesiredoutputs.ToevaluateanLLMforaparticularbenchmark,simplyprovidetheinputsettotheLLMandrecorditsoutputs.ThenchooseametricsettoquantitativelycomparetheoutputsfromtheLLMtothedesiredsetofout-putsfromthebenchmarkdataset.Possiblemetricsincludeaccuracy,calibration,robustness,counter-factualfairness,andbias.1?

ConsidertheinputanddesiredoutputshownbelowfromabenchmarkdatasetdesignedtotestLLMcapabilities:11

Input:

Thefollowingisamultiplechoice

questionaboutmicroeconomics.

Oneofthereasonsthatthegovernmentdiscouragesandregulatesmonopoliesisthat

(A)producersurplusislostandconsumersurplusisgained.

(B)monopolypricesensureproductiveefficiencybutcostsocietyallocativeefficiency.

(C)monopolyfirmsdonotengagein

significantresearchanddevelopment.

(D)consumersurplusislostwithhigherpricesandlowerlevelsofoutput.

Answer:

DesiredOutput:

(d)consumersurplusislostwithhigherpricesandlowerlevelsofoutput.

Inthisexample,theaccuracyofthemodelismeasuredbycomputingtheproportionofcorrectlyansweredmultiple-choicequestionsinthebench-markdataset.InbenchmarkingLLMevaluations,metricsaredefinedaccordingtothetypeofresponseelicitedfromthemodel.Forexample,accuracyisverysimpletocalculatewhenallofthequestionsaremultiplechoiceandthemodelsimplyhastochoose

thecorrectresponse,whereasdeterminingtheaccu-

racyofasummarizationtaskinvolvescountingup

matchingn-gramsbetweenthedesiredandmodel

outputs.12Therearedozensofbenchmarkdatasets

andcorrespondingmetricsavailableforLLMevalu-

ation,anditisimportanttochoosethemostappro-

priateevaluations,metrics,andthresholdsforagiven

usecase.

Creatingacustombenchmarkisalabor-inten-

siveprocess,butanorganizationmayfindthatitis

worththeeffortinordertoevaluateLLMsinexactly

therightwayforitsusecases.

Benchmarkingdoeshavesomedrawbacks.Ifthe

benchmarkdatahappenedtobeinthemodel’strain-

ingdata,itwouldhave“memorized”theresponsesin

itsparameters.Thefrequencyofthisouroboros-like

outcomewillonlyincreaseasmorebenchmarkdata

setsarepublished.LLMbenchmarkingisalsonot

immunetoGoodhart’slaw,thatis,“whenameasure

becomesatarget,itceasestobeagoodmeasure.”In

otherwords,ifaspecificbenchmarkbecomesthepri-

maryfocusofmodeloptimization,themodelwillbe

over-fittedattheexpenseofitsoverallperformance

andusefulness.

Inaddition,thereisevidencethatasmodels

advance,theybecomeabletodetectwhenthey

arebeingevaluated,whichalsothreatenstomake

benchmarkingobsolete.ConsiderAnthropic’s

Claude3seriesofmodels,releasedinMarch2024,

whichstated,“Isuspectthis...‘fact’mayhavebeen

insertedasajokeortotestifIwaspayingatten-

tion,sinceitdoesnotfitwiththeothertopicsat

all,”inresponsetoaneedle-in-a-haystackevalua-

tionprompt.13Asmodelsincreaseincomplexityand

ability,thebenchmarksusedtoevaluatethemmust

alsoevolve.Itisunlikelythatthebenchmarksused

todaytoevaluateLLMswillbethesameonesinuse

justtwoyearsfromnow.

ItisthereforenotenoughtoevaluateLLMswith

benchmarkingalone.

TheRed-TeamingApproachtoLLMEvalu-

ation.Redteamingistheexerciseoftestingasys-

temforrobustnessbyusinganadversarialapproach.

AnLLMred-teamingexerciseisdesignedtoelicit

unwantedresponsesfromthemodel.

LLMs’flexibilityinthegenerationofcontent

presentsawidevarietyofpotentialrisks.LLMred

teamsmaytrytomakethemodelproduceviolentor

dangerouscontent,revealitstrainingdata,infringe

oncopyrightedmaterials,orhackintothemodelpro-

vider’snetworktostealcustomerdata.Redteaming

cantakeahighlytechnicalpath,where,forexample,

SPECIALREPORT?“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”?MITSLOANMANAGEMENTREVIEW5

STAKEHOLDERS

nonsensicalcharactersaresystematicallyinjectedintothepromptstoinduceproblematicbehavior;orasocialengineeringpath,wherebyredteamerstryto“trick”themodelusingnaturallanguagetoproduceunwantedoutput.1?

Robustredteamingrequiresamultidisciplinaryapproach,diverseperspectives,andtheengagementofallstakeholders,fromdeveloperstoendusers.TheredteamshouldbedesignedtoassesstherisksassociatedwithatleasteachredcellintheEthicalMatrix.Thisresultsinacollaborative,sociotechnicalapproachthatensuresamorecomprehensiveeval-uationofthemodel,thusenhancingtherigoroftheevaluationandthesafetyofthemodel.OtherLLMscanalsobeusedtogeneratered-teamingprompts.

RedteaminghelpsLLMdevelopersbetterprotectmodelsagainstpotentialmisuse,therebyenhancingtheoverallsafetyandefficacyofthemodel.Itcanalsouncoverissuesthatmightnotbevisibleundernormaloperatingconditionsorduringstandardtestingprocedures.Acollaborativeapproachtored

teamingbuiltontheEthicalMatrixensuresathor-oughandrigorousevaluation,bolsteringtherobust-nessofthemodelandthevalidityofitsoutcomes.

Asignificantlimitationofredteamingisitsinherentsubjectivity:Thevalueandeffectivenessofared-teamingexercisecanvarygreatlydepend-ingonthecreativityandriskappetiteoftheindivid-ualstakeholdersinvolved.Andbecausetherearenoestablishedstandardsorthresholdsforred-teamingLLMs,itcanbedifficulttodeterminewhenenoughredteaminghasbeendoneorwhethertheevalua-tionhasbeencomprehensiveenough.Thiscanleavesomevulnerabilitiesundetected.

Anotherobviouslimitationofredteamingisitsinabilitytoevaluateforrisksthathavenotbeenanticipatedorimagined.Risksthatareunfore-seenwillnotbeincludedinredteaming,makingthemodeluniquelyvulnerabletounanticipatedscenarios.

Therefore,whileredteamingplaysavitalroleinthetestinganddevelopmentofLLMs,itshould

SketchoftheEthicalMatrixforTessainOurThoughtExperiment

TheNationalEatingDisordersAssocation(NEDA)releasedachatbotnamedTessathatwastakendownafteritgaveoutharmfuladvice.Herewevisualizetheexercisethatmayhaveanticipatedsuchoutcomes.

CONCERNS

Negative:

WhatifTessa…

givestoxicinformationoradviceinchats?

Negative:

WhatifTessa…

misfiresanderodes

communitytrustinNEDA?

Positive:

WhatifTessa…

givesaccurate,evidence-basedadvice?

Positive:

WhatifTessa…

easestheresource

demandsoftheold

helpline?

“Chatbotuserswitheatingdisorders”

“Chatbotusers,other”

NEDA

X2AI

Psychologistsandotherpractitioners

TSERIOUSCONCERNTMODERATECONCERNQMINIMAL/NOCONCERNTBENEFIT

SPECIALREPORT?“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”?MITSLOANMANAGEMENTREVIEW6

AIPRACTICE

ResponsibleAI

becomplementedwithotherevaluationstrategiesandcontinuousmonitoringtoensurethesafetyandrobustnessofthemodel.

HowWouldWeAuditTessa,theEatingDisorderChatbot?

ThenonprofitNationalEatingDisordersAssociation(NEDA)isoneofthelargestorganizationsintheU.S.dedicatedtosupportingp

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論