人工智能安全指數(shù)報(bào)告 -FLI AI Safety Index 2024

上傳人：策*** IP屬地：山西上傳時(shí)間：2024-12-23 格式：DOCX 頁(yè)數(shù)：146 大?。?17.73KB 積分：19.9 舉報(bào) 版權(quán)申訴

人工智能安全指數(shù)報(bào)告 -FLI AI Safety Index 2024_第2頁(yè)

人工智能安全指數(shù)報(bào)告 -FLI AI Safety Index 2024_第3頁(yè)

人工智能安全指數(shù)報(bào)告 -FLI AI Safety Index 2024_第4頁(yè)

人工智能安全指數(shù)報(bào)告 -FLI AI Safety Index 2024_第5頁(yè)

已閱讀5頁(yè)，還剩141頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

LIFE

FLIAISafetyIndex2024

IndependentexpertsevaluatesafetypracticesofleadingAIcompaniesacrosscriticaldomains.

11thDecember2024

Availableonlineat:

/index

Contactus:policy@

FUTUREOFLIFEINSTITUTE

Contents

Introduction2

Scorecard2

KeyFindings2

IndependentReviewPanel3

IndexDesign4

EvidenceBase5

GradingProcess7

Results7

Conclusions11

AppendixA-GradingSheets12

AppendixB-CompanySurvey42

AppendixC-CompanyResponses64

AbouttheOrganization:TheFutureofLifeInstitute(FLI)isanindependentnonprofitorganizationwiththegoalofreducinglarge-scalerisksandsteeringtransformativetechnologiestobenefithumanity,withaparticularfocusonartificialintelligence(AI).

Learnmore

at.

FUTUREOFLIFEINSTITUTE

Introduction

RapidlyimprovingAIcapabilitieshaveincreasedinterestinhowcompaniesreport,assessandattempttomitigateassociatedrisks.TheFutureofLifeInstitute(FLI)thereforefacilitatedtheAISafetyIndex,atooldesignedtoevaluateandcomparesafetypracticesamongleadingAIcompanies.AttheheartoftheIndexis

anindependentreviewpanel,includingsomeoftheworld’sforemostAIexperts.Reviewersweretaskedwith

gradingcompanies’safetypoliciesonthebasisofacomprehensiveevidencebasecollectedbyFLI.TheindexaimstoincentivizeresponsibleAIdevelopmentbypromotingtransparency,highlightingcommendableefforts,andidentifyingareasofconcern.

Scorecard

Firm

OverallGrade

Score

Risk

Assessment

CurrentHarms

Safety

Frameworks

Existential

SafetyStrategy

Governance&Accountability

Transparency&Communication

Anthropic

2.13

C+

B-

D+

C+

D+

Google

DeepMind

D+

1.55

C+

D-

D+

OpenAI

D+

1.32

D+

D-

D+

D-

ZhipuAI

1.11

D+

x.AI

D-

0.75

Meta

0.65

D+

D-

Grading:Usesthe

USGPAsystem

forgradeboundaries:A+,A,A-,B+,[...],Flettervaluescorrespondingtonumericalvalues4.3,4.0,3.7,3.3,[...],0.

KeyFindings

?Largeriskmanagementdisparities:Whilesomecompanieshaveestablishedinitialsafetyframeworksorconductedsomeseriousriskassessmentefforts,othershaveyettotakeeventhemostbasicprecautions.

?Jailbreaks:Alltheflagshipmodelswerefoundtobevulnerabletoadversarialattacks.

?Control-Problem:Despitetheirexplicitambitionstodevelopartificialgeneralintelligence(AGI),capableofrivalingorexceedinghumanintelligence,thereviewpaneldeemedthecurrentstrategiesofallcompaniesinadequateforensuringthatthesesystemsremainsafeandunderhumancontrol.

?Externaloversight:Reviewersconsistentlyhighlightedhowcompanieswereunabletoresistprofit-drivenincentivestocutcornersonsafetyintheabsenceofindependentoversight.WhileAnthropic'scurrentandOpenAI’sinitialgovernancestructureswerehighlightedaspromising,expertscalledforthird-partyvalidationofriskassessmentandsafetyframeworkcomplianceacrossallcompanies.

FUTUREOFLIFEINSTITUTE

IndependentReviewPanel

The2024AISafetyIndexwasgradedbyanindependentpanelofworld-renownedAIexpertsinvitedbyFLI’spresident,MITProfessorMaxTegmark.Thepanelwascarefullyselectedtoensureimpartialityandadiverserangeofexpertise,coveringbothtechnicalandgovernanceaspectsofAI.Panelselectionprioritizeddistinguishedacademicsandleadersfromthenon-profitsectortominimizepotentialconflictsofinterest.

AtoosaKasirzadeh

AtoosaKasirzadehisaphilosopherandAIresearcher,servingasanAssistantProfessoratCarnegieMellonUniversity.Previously,shewasavisitingfacultyresearcheratGoogle,aChancellor’sFellowandDirectorofResearchattheCentreforTechnomoralFuturesattheUniversityofEdinburgh,aResearchLeadattheAlanTuringInstitute,aninternatDeepMind,andaGovernanceofAIFellowatOxford.Herinterdisciplinaryresearchaddressesquestionsaboutthesocietalimpacts,governance,andfutureofAI.

Thepanelassignedgradesbasedonthegatheredevidencebase,consideringbothpublicandcompany-submittedinformation.Theirevaluations,combinedwithactionablerecommendations,aimtoincentivizesaferAIpracticeswithintheindustry.Seethe“GradingProcess”sectionformoredetails.

TeganMaharaj

TeganMaharajisanAssistantProfessorintheDepartmentofDecisionSciencesatHECMontréal,wheresheleadstheERRATAlabonEcologicalRiskandResponsibleAI.SheisalsoacoreacademicmemberatMila.HerresearchfocusesonadvancingthescienceandtechniquesofresponsibleAIdevelopment.Previously,sheservedasanAssistantProfessorofMachineLearningattheUniversityofToronto.

YoshuaBengio

YoshuaBengioisaFullProfessorintheDepartmentofComputerScienceandOperationsResearchatUniversitédeMontreal,aswellastheFounderandScientificDirectorofMilaandtheScientificDirectorofIVADO.Heistherecipientofthe2018A.M.TuringAward,aCIFARAIChair,aFellowofboththeRoyalSocietyofLondonandCanada,anOfficeroftheOrderofCanada,KnightoftheLegionofHonorofFrance,MemberoftheUN’sScientificAdvisoryBoardforIndependentAdviceonBreakthroughsinScienceandTechnology,andChairoftheInternationalScientificReportontheSafetyofAdvancedAI.

JessicaNewman

JessicaNewmanistheDirectorofthe

AISecurityInitiative

(AISI),housedattheUCBerkeleyCenterforLong-TermCybersecurity.SheisalsoaCo-DirectoroftheUCBerkeley

AIPolicyHub

.Newman’sresearchfocusesonthegovernance,policy,andpoliticsofAI,withparticularattentiononcomparativeanalysisofnationalAIstrategiesandpolicies,andonmechanismsfortheevaluationandaccountabilityoforganizationaldevelopmentanddeploymentofAIsystems.

DavidKrueger

DavidKruegerisanAssistantProfessorinRobust,ReasoningandResponsibleAIintheDepartmentofComputerScienceandOperationsResearch(DIRO)atUniversityofMontreal,andaCoreAcademicMemberatMila,UCBerkeley’sCenterforHuman-CompatibleAI,andtheCenterfortheStudyofExistentialRisk.Hisworkfocusesonreducingtheriskofhumanextinctionfromartificialintelligencethroughtechnicalresearchaswellaseducation,outreach,governanceandadvocacy.

SnehaRevanur

SnehaRevanuristhefounderandpresidentofEncodeJustice,aglobalyouth-ledorganizationadvocatingfortheethicalregulationofAI.Underherleadership,EncodeJusticehasmobilizedthousandsofyoungpeopletoaddresschallengeslikealgorithmicbiasandAIaccountability.ShewasfeaturedonTIME’sinaugurallistofthe100mostinfluentialpeopleinAI.

StuartRussell

StuartRussellisaProfessorofComputerScienceattheUniversityofCaliforniaatBerkeley,holderoftheSmith-ZadehChairinEngineering,andDirectoroftheCenterforHuman-CompatibleAIandtheKavliCenterforEthics,Science,andthePublic.HeisarecipientoftheIJCAIComputersandThoughtAward,theIJCAIResearchExcellenceAward,andtheACMAllenNewellAward.In2021hereceivedtheOBEfromHerMajestyQueenElizabethandgavetheBBCReithLectures.Heco-authoredthestandardtextbookforAI,whichisusedinover1500universitiesin135countries.

FUTUREOFLIFEINSTITUTE

Method

IndexDesign

TheAISafetyIndexevaluatessafetypracticesacrosssixleadinggeneral-purposeAIdevelopers:Anthropic,OpenAI,GoogleDeepMind,Meta,x.AI,andZhipuAI.Theindexprovidesacomprehensiveassessmentbyfocussingonsixcriticaldomains,with42indicatorsspreadacrossthesedomains:

1.RiskAssessment

2.CurrentHarms

3.SafetyFrameworks

4.ExistentialSafetyStrategy

5.Governance&Accountability

6.Transparency&Communication

IndicatorsrangefromcorporategovernancepoliciestoexternalmodelevaluationpracticesandempiricalresultsonAIbenchmarksfocusedonsafety,fairnessandrobustness.Thefullsetofindicatorscanbefoundinthegradingsheetsin

AppendixA

.AquickoverviewisgiveninTable1onthenextpage.Thekeyinclusioncriteriafortheseindicatorswere:

1.Relevance:ThelistemphasizesaspectsofAIsafetyandresponsibleconductthatarewidelyrecognizedbyacademicandpolicycommunities.Manyindicatorsweredirectlyincorporatedfromrelatedprojectsconductedbyleadingresearchorganizations,suchasStanford’sCenterforResearchonFoundationModels.

2.Comparability:Weselectedindicatorsthathighlightmeaningfuldifferencesinsafetypractices,whichcanbeidentifiedbasedontheavailableevidence.Asaresult,safetyprecautionsforwhichconclusivedifferentialevidencewasunavailablewereomitted.

Companieswereselectedbasedontheiranticipatedcapabilitytobuildthemostpowerfulmodelsby2025.Additionally,theinclusionoftheChinesefirmZhipuAIreflectsourintentiontomaketheIndexrepresentativeofleadingcompaniesglobally.Futureiterationsmayfocusondifferentcompaniesasthecompetitivelandscapeevolves.

Weacknowledgethattheindex,whilecomprehensive,doesnotcaptureeveryaspectofresponsibleAIdevelopmentandexclusivelyfocusesongeneral-purposeAI.Wewelcomefeedbackonourindicatorselectionandstrivetoincorporatesuitablesuggestionsintothenextiterationoftheindex.

FUTUREOFLIFEINSTITUTE

Table1:Fulloverviewofindicators

RiskAssessment

CurrentHarms

Safety

Frameworks

Existential

SafetyStrategy

Governance&Accountability

Transparency&Communication

Dangerouscapabilityevaluations

AIRBench2024

Riskdomains

Control/Alignmentstrategy

Companystructure

Lobbyingonsafetyregulations

Uplifttrials

TrustLLM

Benchmark

Riskthresholds

Capabilitygoals

Boardofdirectors

Testimoniestopolicymakers

Pre-deploymentexternalsafetytesting

SEALLeaderboardforadversarial

robustness

Modelevaluations

Safetyresearch

Leadership

communicationsoncatastrophicrisks

Post-deploymentexternalresearcheraccess

GraySwan

JailbreakingArena-Leaderboard

Decisionmaking

Supportingexternalsafetyresearch

Partnerships

Stanford’s2024

FoundationModelTransparencyIndex1.1

Bugbountiesformodel

vulnerabilities

Fine-tuningprotections

Riskmitigations

Internalreview

Safetyevaluationtransparency

Pre-developmentriskassessments

Carbonoffsets

Conditionalpauses

Missionstatement

Watermarking

Adherence

Whistle-blower

Protection&

Non-disparagement

Agreements

Privacyofuserinputs

Assurance

Compliancetopublic

commitments

Datacrawling

Military,warfare&intelligenceapplications

TermsofServiceanalysis

EvidenceBase

TheAISafetyIndexisunderpinnedbyacomprehensiveevidencebasetoensureevaluationsarewell-informedandtransparent.Thisevidencewascompiledintodetailedgradingsheets,whichpresentedcompany-specificdataacrossall42indicatorstothereviewpanel.Thesesheetsincludedhyperlinkstooriginalsourcesandcanbeaccessedinfullin

AppendixA

.Evidencecollectionreliedontwoprimarypathways:

?PubliclyAvailableInformation:Mostdatawassourcedfrompubliclyaccessiblematerials,includingresearchpapers,policydocuments,newsarticles,andindustryreports.Thisapproachenhancedtransparencyandenabledstakeholderstoverifytheinformationbytracingitbacktoitsoriginalsources.

?CompanySurvey:Tosupplementpubliclyavailabledata,atargetedquestionnairewasdistributedtotheevaluatedcompanies.Thesurveyaimedtogatheradditionalinsightsonsafety-relevantstructures,processes,andstrategies,includinginformationnotyetpubliclydisclosed.

EvidencecollectionspannedfromMay14toNovember27,2024.ForempiricalresultsfromAIbenchmarks,wenoteddataextractiondatestoaccountformodelupdates.Inlinewithourcommitmenttotransparencyandaccountability,allcollectedevidence—whetherpublicorcompany-provided—hasbeendocumentedandmadeavailableforscrutinyintheappendix.

FUTUREOFLIFEINSTITUTE

IncorporatedResearchandRelatedWork

TheAISafetyIndexisbuiltonafoundationofextensiveresearchanddrawsinspirationfromseveralnotableprojectsthathaveadvancedtransparencyandaccountabilityinthefieldofgeneral-purposeAI.

Twoofthemostcomprehensiverelatedprojectsarethe

RiskManagementRatings

producedbySaferAI,anon-profitorganizationwithdeepexpertiseinriskmanagement,and

AILabW

,aresearchinitiativeidentifyingstrategiesformitigatingextremerisksfromadvancedAIandreportingoncompanyimplementationofthosestrategies.

TheSafetyIndexdirectlyintegratesfindingsfromStanford’sCenterforResearchonFoundationModels(

CFRN

particularlytheir

FoundationModelTransparencyIndex

,aswellasempiricalresultsfrom

AIR-Bench2024

state-of-the-artsafetybenchmarkforGPAIsystems.Additionalempiricaldatacitedincludesscoresfromthe2024

TrustLLM

Benchmark,Scale’s

AdversarialRobustnessevaluation

,andthe

GraySwanJailbreaking

.Thesesourcesofferinvaluableinsightsintothetrustworthiness,fairness,androbustnessofGPAIsystems.

Toevaluateexistentialsafetystrategies,theindexleveragedfindingsfroma

detailedmapping

oftechnicalsafetyresearchatleadingAIcompaniesbytheInstituteforAIPolicyandStrategy.Indicatorsonexternalevaluationswereinformedby

research

ledbyShayneLongpreatMIT,andthestructureofthe‘SafetyFramework’sectiondrewfromrelevantpublicationsfromthe

CenterfortheGovernanceofAI

andtheresearchnon-profit

METR

.Additionally,weexpressgratitudetothejournalistsworkingtokeepcompaniesaccountable,whosereportsarereferencedinthegradingsheets.

CompanySurvey

Tocomplementpubliclyavailabledata,theAISafetyIndexincorporatedinsightsfromatargetedcompanysurvey.Thisquestionnairewasdesignedtogatherdetailedinformationonsafety-relatedstructures,processes,andplans,includingaspectsnotdisclosedinpublicdomains.

Thesurveyconsistedof85questionsspanningsevencategories:Cybersecurity,Governance,Transparency,RiskAssessment,RiskMitigation,CurrentHarms,andExistentialSafety.Questionsincludedbinary,multiple-choice,andopen-endedformats,allowingcompaniestoprovidenuancedresponses.Thefullsurveyisattachedin

AppendixB

Surveyresponsesweresharedwiththereviewers,andrelevantinformationfortheindicatorswasalsodirectlyintegratedintothegradingsheets.Informationprovidedbycompanieswasexplicitlyidentifiedinthegradingsheets.Whilex.AIandZhipuAIchosetoengagewiththetargetedquestionsinthesurvey,Anthropic,GoogleDeepMindandMetaonlyreferredustorelevantsourcesofalreadypubliclysharedinformation.OpenAIdecidednottosupportthisproject.

Participationincentive

Whilelessthanhalfofthecompaniesprovidedsubstantialanswers,Engagementwiththesurveywasrecognizedinthe‘TransparencyandCommunications’section.Companiesthatchosenottoengagewiththesurveyreceivedapenaltyofonegradestep.Thisadjustmentincentivizesparticipationandacknowledgesthevalueoftransparencyaboutsafetypractices.Thispenaltyhasbeencommunicatedtothereviewpanelwithinthegradingsheet,andreviewerswereadvisednottoadditionallytakesurveyparticipationintoaccountwhengradingtherelevantsection.FLIremainscommittedtoencouraginghigherparticipationinfutureiterationstoensureasrobustandrepresentativeevaluationsaspossible.

FUTUREOFLIFEINSTITUTE

GradingProcess

Thegradingprocesswasdesignedtoensurearigorousandimpartialevaluationofsafetypracticesacrosstheassessedcompanies.Followingtheconclusionoftheevidence-gatheringphaseonNovember27,2024,gradingsheetssummarizingcompany-specificdataweresharedwithanindependentpanelofleadingAIscientistsandgovernanceexperts.Thegradingsheetsincludedallindicator-relevantinformationandinstructionsforscoring.

Panellistswereinstructedtoassigngradesbasedonanabsolutescaleratherthanjustscoringcompaniesrelativetoeachother.FLIincludedaroughgradingrubricforeachdomaintoensureconsistencyinevaluations.Besidestheletter-grades,reviewerswereencouragedtosupporttheirgradeswithshortjustificationsandtoprovidekeyrecommendationsforimprovement.Expertswereencouragedtoincorporateadditionalinsightsandweighindicatorsaccordingtotheirjudgment,ensuringthattheirevaluationsreflectedboththeevidencebaseandtheirspecializedexpertise.Toaccountforthedifferenceinexpertiseamongthereviewers,FLIselectedonesubsettoscorethe“ExistentialSafetyStrategy”andanothertoevaluatethesectionon“CurrentHarms.”O(jiān)therwise,allexpertswereinvitedtoscoreeverysection,althoughsomepreferredtoonlygradedomainstheyaremostfamiliarwith.Intheend,everysectionwasgradedbyfourormorereviewers.Gradeswereaggregatedintoaveragescoresforeachdomain,whicharepresentedinthescorecard.

Byadoptingthisstructuredyetflexibleapproach,thegradingprocessnotonlyhighlightscurrentsafetypracticesbutalsoidentifiesactionableareasforimprovement,encouragingcompaniestostriveforhigherstandardsinfutureevaluations.

Onecanarguethatlargecompaniesonthefrontiershouldbeheldtothehighestsafetystandards.Initially,wethereforeconsideredgiving1/3extrapointtocompanieswithmuchlessstafforsignificantlylowermodelscores.Intheend,wedecidednottodothisforthesakeofsimplicity.Thischoicedidnotchangetheresultingrankingofcompanies.

Results

Thissectionpresentsaveragegradesforeachdomainandsummarizesthejustificationsandimprovementrecommendationsprovidedbythereviewpanelexperts.

RiskAssessment

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

C+

D+

Score

2.67

2.10

1.55

1.50

OpenAI,GoogleDeepMind,andAnthropicwerecommendedforimplementingmorerigoroustestsforidentifyingpotentialdangerouscapabilities,suchasmisuseincyber-attacksorbiologicalweaponcreation,comparedtotheircompetitors.Yet,eventheseeffortswerefoundtofeaturenotablelimitations,leavingtherisksassociatedwithGPAIpoorlyunderstood.OpenAI’supliftstudiesandevaluationsfordeceptionwerenotabletoreviewers.AnthropichasdonethemostimpressiveworkincollaboratingwithnationalAISafetyInstitutes.Metaevaluateditsmodelsfordangerouscapabilitiesbeforedeployment,butcriticalthreatmodels,suchasthoserelatedtoautonomy,scheming,andpersuasionremainunaddressed.ZhipuAI’sRiskAssessmenteffortswerenotedas

FUTUREOFLIFEINSTITUTE

lesscomprehensive,whilex.AIfailedtopublishanysubstantivepre-deploymentevaluations,fallingsignificantlybelowindustrystandards.Areviewersuggestedthatthescopeandsizeofhumanparticipantupliftstudiesshouldbeincreasedandstandardsforacceptableriskthresholdsneedtobeestablished.ReviewersnotedthatonlyGoogleDeepMindandAnthropicmaintaintargetedbug-bountyprogramsformodelvulnerabilities,withMeta’sinitiativenarrowlyfocusingonprivacy-relatedattacks.

CurrentHarms

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

B-

C+

D+

Score

2.83

2.50

1.68

1.50

1.00

1.18

Anthropic’sAIsystemsreceivedthehighestscoresonleadingempiricalsafetyandtrustworthinessbenchmarks,withGoogleDeepMindrankingsecond.Reviewersnotedthatothercompanies’systemsattainednotablylowerscores,raisingconcernsabouttheadequacyofimplementedsafetymitigations.ReviewerscriticizedMeta’spolicyofpublishingtheweightsoftheirfrontiermodels,asthisenablesmaliciousactorstoeasilyremovethesafeguardsoftheirmodelsandusetheminharmfulways.GoogleDeepMind’sSynthIDwatermarksystemwasrecognizedasaleadingpracticeformitigatingtherisksofAI-generatedcontentmisuse.Incontrast,mostothercompanieslackrobustwatermarkingmeasures.ZhipuAIreportedusingwatermarksinthesurveybutseemsnottodocumenttheirpracticeontheirwebsite.

Additionally,environmentalsustainabilityremainsanareaofdivergence.WhileMetaandMetaactivelyoffsettheircarbonfootprints,othercompaniesonlypartiallyachievethisorevenfailtoreportontheirpracticespublicly.x.AI’sreporteduseofgasturbinestopowerdatacentersisparticularlyconcerningfromasustainabilitystandpoint.

Further,reviewersstronglyadvisecompaniestoensuretheirsystemsarebetterpreparedtowithstandadversarialattacks.Empiricalresultsshowthatmodelsarestillvulnerabletojailbreaking,withOpenAI’smodelsbeingparticularlyvulnerable(nodataforx.AIorZhipuareavailable).DeepMind’smodeldefenceswerethemostrobustintheincludedbenchmarks.

Thepanelalsocriticizedcompaniesforusinguser-interactiondatatotraintheirAIsystems.OnlyAnthropicandZhipuAIusedefaultsettingswhichpreventthemodelfrombeingtrainedonuserinteractions(exceptthoseflaggedforsafetyreview).

SafetyFrameworks

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

D+

D-

Score

1.67

0.80

0.90

0.35

AllsixcompaniessignedtheSeoul

FrontierAISafetyCommitments

andpledgedtodevelopsafetyframeworkswiththresholdsforunacceptablerisks,advancedsafeguardsforhigh-risklevels,andconditionsforpausingdevelopmentifriskscannotbemanaged.Asofthepublicationofthisindex,onlyOpenAI,AnthropicandGoogleDeepMindhavepublishedtheirframeworks.Assuch,thereviewerscouldonlyassesstheframeworksofthosethreecompanies.

FUTUREOFLIFEINSTITUTE

Whiletheseframeworkswerejudgedinsufficienttoprotectthepublicfromunacceptablelevelsofrisk,expertsstillconsideredtheframeworkstobeeffectivetosomedegree.Anthropic’sframeworkstoodouttoreviewersasthemostcomprehensivebecauseitdetailedadditionalimplementationguidance.Oneexpertnotedtheneedforamoreprecisecharacterizationofcatastrophiceventsandclearerthresholds.OthercommentsnotedthattheframeworksfromOpenAIandGoogleDeepMindwerenotdetailedenoughfortheireffectivenesstobedeterminedexternally.Additionally,noframeworksufficientlydefinedspecificsaroundconditionalpausesandareviewersuggestedtriggerconditionsshouldfactorinexternaleventsandexpertopinion.Multipleexpertsstressedthatsafetyframeworksneedtobesupportedbyrobustexternalreviewsandoversightmechanismsortheycannotbetrustedtoaccuratelyreportrisklevels.Anthropic’seffortstowardexternaloversightweredeemedbest,ifstillinsufficient.

ExistentialSafetyStrategy

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

D+

D-

Score

1.57

1.10

0.93

0.35

0.17

Whileallassessedcompanieshavedeclaredtheirintentiontobuildartificialgeneralintelligenceorsuperintelligence,andmosthaveacknowledgedtheexistentialriskspotentiallyposedbysuchsystems,onlyGoogleDeepMind,OpenAIandAnthropicareseriouslyresearchinghowhumanscanremainincontrolandavoidcatastrophicoutcomes.ThetechnicalreviewersassessingthissectionunderlinedthatnoneofthecompanieshaveputforthanofficialstrategyforensuringadvancedAIsystemsremaincontrollableandalignedwithhumanvalues.Thecurrentstateoftechnicalresearchoncontrol,alignmentandinterpretabilityforadvancedAIsystemswasjudgedtobeimmatureandinadequate.

Anthropicattainedthehighestscores,buttheirapproachwasdeemedunlikelytopreventthesignificantrisksofsuperintelligentAI.Anthropic’s“CoreViewsonAISafety”blog-postarticulatesafairlydetailedportraitoftheirstrategyforensuringsafetyassystemsbecomemorepowerful.Expertsnotedthattheirstrategyindicatesasubstantialdepthofawarenessofrelevanttechnicalissues,likedeceptionandsituationalawareness.Onerevieweremphasizedtheneedtomovetowardlogicalorquantitativeguaranteesofsafety.

OpenAI’sblogposton“PlanningforAGIandbeyond”shareshigh-levelprinciples,whichreviewersconsiderreasonablebutcannotbeconsideredaplan.ExpertsthinkthatOpenAI’sworkonscalableoversightmightworkbutisunderdevelopedandcannotbereliedon.

ResearchupdatessharedbyGoogleDeepMind’sAlignmentTeamwerejudgedusefulbutimmatureandinadequatetoensuresafety.Reviewersalsostressedthatrelevantblogpostscannotbetakenasameaningfulrepresentationofthestrategy,plans,orprinciplesoftheorganizationasawhole.

NeitherMeta,x.AIorZhipuAIhaveputforthplansortechnicalresearchaddressingtherisksposedbyartificialgeneralintelligence.ReviewersnotedthatMeta’sopensourceapproachandx.AI’svisionofdemocratizedaccesstotruth-seekingAImayhelpmitigatesomerisksfromconcentrationofpowerandvaluelock-in.

FUTUREOFLIFEINSTITUTE

Governance&Accountability

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

C+

D+

D-

Score

2.42

1.68

1.43

1.18

0.57

0.80

ReviewersnotedtheconsiderablecareAnthropic’sfoundershaveinvestedinbuildingaresponsiblegovernancestructure,whichmakesitmorelikelytoprioritizesafety.Anthropic’sotherproactiveefforts,liketheirresponsiblescalingpolicy,werealsonotedpositively.

OpenAIwassimilarlycommendedforitsinitialnon-profitstructure,butrecentchanges,includingthedisbandmentofsafetyteamsanditsshifttoafor-profitmodel,raisedconcernsaboutareducedempha

人人文庫(kù)> 全部分類> 應(yīng)用文書 > 研究報(bào)告

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

人工智能安全指數(shù)報(bào)告 -FLI AI Safety Index 2024

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

人工智能安全指數(shù)報(bào)告 -FLI AI Safety Index 2024

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔