




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
SUPPORTPOOL
OFEXPERTSPROGRAMME
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision
Biasevaluation
byDr.KrisSHRISHAK
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
2
AspartoftheSPEprogramme,theEDPBmaycommissioncontractorstoprovidereportsandtoolsonspecifictopics.
TheviewsexpressedinthedeliverablesarethoseoftheirauthorsandtheydonotnecessarilyreflecttheofficialpositionoftheEDPB.TheEDPBdoesnotguaranteetheaccuracyoftheinformationincludedinthedeliverables.NeithertheEDPBnoranypersonactingontheEDPB’sbehalfmaybeheldresponsibleforanyusethatmaybemadeoftheinformationcontainedinthedeliverables.
Someexcerptsmayberedactedorremovedfromthedeliverablesastheirpublicationwouldunderminetheprotectionoflegitimateinterests,including,interalia,theprivacyandintegrityofanindividualregardingtheprotectionofpersonaldatainaccordancewithRegulation(EU)2018/1725and/orthecommercialinterestsofanaturalorlegalperson.
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
3
TABLEOFCONTENTS
1Stateoftheartforbiasevaluation 5
1.1Sourcesofbias 5
1.1.1Biasfromdata 5
1.1.2Algorithmbias 6
1.1.3Evaluationbias 6
1.1.4Sourcesofbiasinfacialrecognitiontechnology 7
1.1.5SourcesofbiasingenerativeAI 7
1.2Methodstoaddressbias 8
1.2.1Pre-processing 9
1.2.2In-processing 11
1.2.3Post-processing 11
1.2.4MethodsforgenerativeAI 12
2Toolsforbiasevaluation 13
2.1IBMAIF360 13
2.2Fairlearn 13
2.3HolisticAI 14
2.4Aequitas 14
2.5What-IfTool 14
2.6Othertoolsconsidered 15
Conclusion 15
Bibliography 16
DocumentsubmittedinMarch2024
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
4
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
5
1STATEOFTHEARTFORBIASEVALUATION
Artificialintelligence(AI)systemsaresocio-technicalsystemswhosebehaviourandoutputscanharmpeople.BiasinAIsystemscanharmpeopleinvariousways.Biascanresultfrominterconnectedfactorsthatmaytogetheramplifyharmssuchasdiscrimination(EuropeanUnionAgencyforFundamentalRights,2022;Weertsetal.,2023).MitigatingbiasinAIsystemsisimportantandidentifyingthesourcesofbiasisthefirststepinanybiasmitigationstrategy.
1.1Sourcesofbias
TheAIpipelineinvolvesmanychoicesandpracticesthatcontributetobiasedAIsystems.BiaseddataisjustoneofthesourcesofbiasedAIsystems,andunderstandingitsvariousformscanhelptodetectandtomitigatethebias.Inoneapplication,thelackofrepresentativedatamightbethesourceofbias,e.g.,medicalAIwheredatafromwomenwithheartattacksislessrepresentedthanmeninthe
dataset.Inanother,theproxyvariablesthatembedgenderbiasmightbetheproblem,e.g.,inrésuméscreening.Increasingthedatasetsizeforwomencouldhelpintheformercase,butnotinthelattercase.
Inadditiontobiasfromdata,AIsystemscanalsobebiasedduetothealgorithmandtheevaluation.Thesethreesourcesofbiasarediscussednext.
1.1.1Biasfromdata
1.Historicalbias:WhenAIsystemsaretrainedonhistoricaldata,theyoftenreflectsocietalbiaswhichareembeddedinthedataset.Out-of-datedatasetswithsensitiveattributesandrelatedproxyvariablescontributetohistoricalbias.Thiscanbeattributedtoacombinationoffactors:howandwhatdatawerecollectedandthelabellingofthedata,whichinvolvessubjectivityandthebiasofthelabeller.AnexampleofhistoricalbiasinAIsystemshasbeenshownwithwordembedding(Gargetal.,2018),whicharenumericalrepresentationsofwordsandareusedindevelopingtextgenerationAIsystems.
2.Representationbias:Representationbiasisintroducedwhendefiningandsamplingfromthetargetpopulationduringthedatacollectionprocess.Representationbiascantaketheformofavailabilitybiasandsamplingbias.
a.Availabilitybias:DatasetsusedindevelopingAIsystemsshouldrepresentthechosentargetpopulation.However,datasetsaresometimeschosenbyvirtueoftheiravailabilityratherthantheirsuitabilitytothetaskathand.Availabledatasetsoftenunderrepresentwomenandpeoplewithdisabilities.Furthermore,availabledatasetsareoftenusedoutofcontextforpurposesdifferentfromtheirintendedpurpose(Paulladaetal.,2021).ThiscontributestobiasedAIsystems.
b.Samplingbias:Itisusuallynotpossibletocollectdataabouttheentiretargetpopulation.Instead,asubsetofdatapointsrelatedtothetargetpopulationiscollected,selectedandused.Thissubsetorsampleshouldberepresentativeofthetargetpopulationforittoberelevantandofhighquality.Forinstance,datacollectedfromscrapingRedditorothersocialmediasitesarenotrandomizedandarenotrepresentativeofthepopulationthatdon’tusethesesites.Suchdataarenot
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
6
generalizableforwiderpopulationbeyondthesesites.Andyet,thedataareusedinAImodelsdeployedinothercontexts.
Whendefiningthetargetpopulation,thesubgroupswithsensitivecharacteristicsshouldbeconsidered.AnAIsystembuiltusingadatasetcollectedfromacitywillonlyhaveasmallpercentageofcertainminoritygroups,say5%.Ifthedatasetisusedas-is,thentheoutputsofthisAIsystemwillbebiasedagainstthisminoritygroupbecausetheyonlymakeup5%ofthedatasetandtheAIsystemhasrelativelylessdatatolearnfromaboutthem.
3.Measurementbias:Datasetscanbetheresultofmeasurementbias.Often,thedatathatiscollectedisaproxyforthedesireddata.Thisproxydataisanoversimplificationofthereality.Sometimestheproxyvariableitselfiswrong.Furthermore,themethodofmeasurement,andconsequently,thecollectionofthedatamayvaryacrossgroups.Thisvariationcouldbeduetoeasieraccesstothedatafromcertaingroupsoverothers.
4.Aggregationbias:Falseconclusionsmaybedrawnaboutindividualsorsmallgroupswhenthedatasetisdrawnfromtheentirepopulation.ThemostcommonformofthisbiasisSimpson’sparadox(Blyth,1972)wherepatternsobservedinthedataforsmallgroupsdisappearwhenonlytheaggregatedataovertheentirepopulationisconsidered.Themostwell-knownexampleofthiscomesfromtheUCBerkeleyadmissionsin1973(Bickeletal.,1975).Basedontheaggregatedata,itseemedthatwomenapplicantswererejectedsignificantlymorethanmen.However,theanalysisofthedataatthedepartmentlevelrevealedthattherejectionrateswerehigherformeninmostdepartments.Theaggregatefailedtorevealthisbecauseahigherproportionofwomenappliedtodepartmentswithlowoverallacceptanceratethantheydidtodepartmentswithhighacceptancerate.
1.1.2Algorithmbias
Althoughmuchofthediscussionaroundbiasfocussesonthebiasfromdata,othersourcesofbiasthatcontributetodiscriminatorydecisionsshouldnotbeoverlooked.Infact,AImodelsreflectbiasedoutputsnotonlyduetothedatasetsbutalsoduetothemodelitself(Hooker,2021).Evenwhenthedatasetsarenotbiasedandareproperlysampled,thealgorithmicchoicescancontributetobiaseddecisions.Thisincludesthechoiceofobjectivefunctions,regularisations,howlongthemodelistrained,andeventhechoiceofstatisticallybiasedestimators(Danks&London,2017).
Thevarioustrade-offsmadeduringthedesignanddevelopmentprocesscouldresultindiscriminatoryoutputs.Suchtrade-offscanincludemodelsizeandthechoiceofprivacyprotectionmechanisms(Ferryetal.,2023;Fiorettoetal.,2022;Kulynychetal.,2022).EvenwithDiversityinFaces(DiF)datasetthathasbroadcoverageoffacialimages,anAImodeltrainedwithcertaindifferentialprivacytechniquesdisproportionatelydegradesperformancefordarker-skinnedfaces(Bagdasaryanetal.,2019).Furthermore,techniquestocompressAImodelscandisproportionallyaffecttheperformanceofAImodelsforpeoplewithunderrepresentedsensitiveattributes(Hookeretal.,2020).
1.1.3Evaluationbias
TheperformanceofAIsystemsisevaluatedbasedonmanymetrics,fromaccuracyto“fairness”.Suchassessmentsareusuallyperformedagainstabenchmark,oratestdataset.Evaluationbiasarisesatthisstagebecausethebenchmarkitselfcouldcontributetobias.
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
7
AIsystemscanperformextremelywellagainstaspecifictestdataset,andthistestperformancemayfailtotranslateintoreal-worldperformancedueto“overfitting”tothetestdataset.Thisisespeciallyaproblemifthetestdatasetcarriesoverhistorical,representationormeasurementbias.
Forinstance,ifthetestdatasetwascollectedfromtheUSA,itisunlikelytoberepresentativeforthepopulationinGermany;or,ifthedatasetwascollectedin2020duringCOVID-19butusedinamedicalsettinginanon-COVID-19year.Thismeans,thatevenifthebiasinthetrainingdatasetismitigated,biasmightcreepinattheevaluationstage.
1.1.4Sourcesofbiasinfacialrecognitiontechnology
Historical,representationandevaluationbiasarethemaincausesofbiasinfacialrecognitiontechnology(FRT)and,morebroadly,imagerecognition.Thisisbecausethetrainingandbenchmarkdatasetsareconstructedfrompublicly-availableimagedatasets,oftenthroughwebscraping,thatarenotrepresentativeofdifferentgroupsanddifferentgeographies(Raji&Buolamwini,2019).
DatabasessuchasOpenImagesandImageNetmostlycontainimagesfromtheUSAandtheUK(Shankaretal.,2017).IJB-AandAdiencehavebeenshowntomostlycontainimagesofpeoplewithlight-skinandunderrepresentingpeoplewithdarkskin(Buolamwini&Gebru,2018).Furthermore,racialslursandderogatoryphrasesgetembeddedduringthelabellingprocessofimages(Birhane&Prabhu,2021;Crawford&Paglen,2021).Anddespitedatasetsbeingflaggedforremoval,someofthesedatasetsarestillbeingused(Peng,2020).Iftheseareusedfortrainingand/ortestingFRT,then,bydesign,they’llbebiased.
Evendatasetsthatattempttoaddresstheproblemcanfailintheprocess.IBM’s“DiversityinFaces”datasetwasintroducedtoaddressthelackofdiversityinimagedatasets(Merleretal.,2019).However,itraisedmoreconcerns(Crawford&Paglen,2021).First,theimageswerescrapedfromthewebsiteFlickrwithouttheconsentofthesiteusers(Salon,2019).Second,itusesskullshapesasanadditionalmeasure,whichhashistoricallybeenusedtoshowracialsuperiorityofwhitepeopleand,hence,embedshistoricalbias(Gould,1996).Finally,thedatasetwasannotatedbythreeAmazonTurkworkerswhoguessedtheageandgenderoftheimagesthatwerescraped.
1.1.5SourcesofbiasingenerativeAI
GenerativeAIallowsforthegenerationofcontentincludingtext,images,audioandvideo.Thesourcesofbiasdiscussedintheprevioussections—biasfromdata,algorithmbiasandevaluationbias—getcarriedovertoAIthatgeneratescontent.Inaddition,generativeAIsystemsaredevelopedwithlargeamountsonuncurateddatascrapedfromtheweb.Thisaddsanadditionallayerofriskasthedeveloperswouldlackadequateknowledgeaboutthedataanditsstatisticalproperties,makingithardertoassessthesourcesofbias.
Furthermore,manyofthegenerativeAImodelsaredevelopedwithoutanintendedpurpose.Apre-trainedmodelisbuiltandthenapplicationsaredevelopedontopofthispre-trainedmodelbyotherorganisations.Thus,thesourceofbiascanbeinthepre-trainedmodelandinthecontextofthedownstreamapplication.Whenbiasisembeddedinthepre-trainedmodel,thebiaswillpropagatedownstreamtoalltheapplications.
GenerativeAIdatasetscanreflecthistoricalbias,representationbiasandevaluationbias(Benderetal.,2021).Biascanalsoariseduetodatalabelling,especiallywhenfine-tuningapre-trainedmodelforaspecificapplication.LabelsorannotationsareoftenaddedtothedatabyunderpaidworkersandAmazonTurks.Theymaychoosethewronglabelsbecausetheyaredistracted,orworse,becausetheyembedtheirownbiasbynotbeingfromtherepresentativepopulationwheretheAIsystemwillbe
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
8
deployed.Thisisespeciallythecasewhenmorethanonelabelcouldpotentiallyapplytothedata(Planketal.,2014).
Althoughthedatasetusedforpre-trainedmodeliscurrentlyneithercuratednorlabelledbyhumans(whichorganisationsclaimtobecostly),theprocessofreinforcementlearningfromhumanfeedbackusedbycompaniesdevelopinggenerativeAIintroducesthesamebiases,albeitatalaterstageinthedevelopmentprocess.
Evenwhenthetextdatasetsarewell-labelled,theycancontainsocietalbiasthatariseduetospuriouscorrelations,whicharestatisticalcorrelationsbetweenfeaturesandoutcomes.InthecaseoftextgenerativeAI,suchspuriouscorrelationscanbeobservedwithwordembeddings,whichunderlietextgenerativeAI(Gargetal.,2018):e.g.,‘man’beingassociatedwith‘programming’and‘woman’beingassociatedwith‘homemaker’.Furthermore,asthesearemathematicalobjects,thecontextualinformationaboutthewordsgetlost,andtheyhavebeenobservedtooutput“doctor”-“father”+“mother”as“nurse.”Pre-trainedlanguagemodelssuchasGPTthatrelyonuncurateddatasetsarealsosusceptibletothisissue(Tan&Celis,2019),andmerelyincreasingthesizeofthemodeldoesnotaddresstheproblem(Sagawaetal.,2020).
1.2Methodstoaddressbias
Noautomatedmechanismcanfullydetectandmitigatebias(Wachteretal.,2020).Thereareinherentlimitationswithtechnicalapproachestoaddressbias(Buyl&DeBie,2024).Theseapproachesarenecessary,butnotsufficientforAIsystems,whicharesocio-technicalsystems(Schwartzetal.,2022).ThemostappropriateapproachesdependonthespecificcontextforwhichtheAIsystemisdevelopedandused.Moreover,thecontextualandsocio-culturalknowledgeshouldcomplementthesetechnicalapproaches.
BasedonwhentheinterventionismadeintheAIlifecycletomitigatebias,thetechnicalmethodsandtechniquestoaddressbiascanbeclassifiedintothreetypes(d’Alessandroetal.,2017):
1.Pre-processing:ThesetechniquesmodifythetrainingdatabeforeitisusedtotrainanAImodeltoobscuretheassociationsbetweensensitivevariablesandtheoutput.Pre-processingcanhelpidentifyhistorical,measurementandrepresentationalbiasindata.
2.In-processing:ThesetechniqueschangethewaytheAItrainingprocessisperformedtomitigatebiasthroughchangesintheobjectivefunctionorwithanadditionaloptimisationconstraint.
3.Postprocessing:ThesetechniquestreattheAImodeltobeopaqueandattempttomitigatebiasafterthecompletionofthetrainingprocess.Theassumptionbehindthesetechniquesisthatitisnotpossibletomodifythetrainingdataorthetraining/learningprocesstoaddressthebias.Thus,thesetechniquesshouldbetreatedasalastresortintervention.
Merelyremovingsensitivevariablesfromthedatasetisnotaneffectiveapproachtomitigatebiasduetotheexistenceofproxyvariables(Dworketal.,2012;Kamiran&Calders,2012).
Pre-processingapproachesareagnostictotheAItypeasitfocussesonthedataset.Thisisanimportantadvantage.Furthermore,manyoftheapproacheshavebeendevelopedandtestedoverthepastdecadeandaremorematurethanin-processingtechniques.Pre-processingapproachesareearly-stageinterventionandcanassistwithchangingthedesignanddevelopmentprocess.However,
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
ifthesetechniquesaretheonlyinterventionused,theymightgivetheillusionthatallthebiashasbeenresolved—whichisnotthecase(Obermeyeretal.,2019).theyareonlythestartingpoint.
Forregulators,preprocessingtechniquesareusefulonlyiftheyhaveaccesstothedatasetsthatwereusedtotrainthemodel.Furthermore,theregulatorneedstoconsiderwhetherotherin-processingandpost-processingtechniqueswereusedbythedeveloperanddeployersoftheAIsystem.
1.2.1Pre-processing
1.Dataprovenance(Cheneyetal.,2009;Gebruetal.,2018):Dataprovenanceisanessentialstepbeforeothermethodstomitigatebiasfromdatacanbeused.Itattemptstoanswerwhere,howandwhythedatasetcametobe,whocreatedit,whatitcontains,howitwillbeused,andbywhom.Intheareaofmachinelearning,theterm‘datasheet’ismorecommonlyused.Dataprovenancecan,inthecontextofdataprotection,includethelistingofpersonaldataandnon-personaldata.
2.Causalanalysis(Glymour&Herington,2019;Salimietal.,2019):DatasetsusedtotrainAImodelsoftenincluderelationshipsanddependenciesbetweensensitiveandnon-sensitivevariables.Thus,anyattemptstomitigatebiasinthedatasetrequiresunderstandingtherelationshipsbetweenthesevariables.Otherwise,non-sensitivevariablescouldactasproxiesforthesensitivevariables.Causalanalysishelpswithidentifyingtheseproxies,oftenintheformofvisualizingasagraphthelinkbetweenthevariablesinthedataset.
Causalanalysiscanbeextendedto“repair”thedatasetbyremovingthedependenciesbasedonpre-defined“fairness”criteria.
1
However,thisapproachreliesonpriorcontextualknowledgeabouttheAImodelanditsdeployment,inadditiontobeingcomputationallyintensiveforlargedatasets.
3.Transformation(Calmonetal.,2017;Feldmanetal.,2015;Zemeletal.,2013):Theseapproachesincludetransformingthedataintoalessbiasedrepresentation.Thesetransformationscouldinvolveeditingthelabelssuchthattheybecomeindependentofspecificprotectedgroupingsorbasedonspecific“fairness”objectives.
Transformationsarenotwithoutlimitations.First,transformationsusuallyaffecttheperformanceoftheAImodelandthereisaninherenttrade-offbetweenbiasmitigationandperformancewhenusingthisapproach.Second,transformationsarelimitedtonumericaldataandcannotbeusedforotherkindsofdatasets.Third,thisapproachissusceptibletobiaspersistingduetotheexistenceofproxyvariables.Forthisreason,theuseofthisapproachshouldbeprecededbycausalanalysistounderstandthelinksbetweenthespecialcategorydataandtheproxyvariablesinthestartingdataset.Eventhen,thereisnoguaranteethatthetransformationshaveeliminatedtherelationshipbetweenthespecialcategorydataand
1Thetechnicalliteratureusestheterm"fairness"andtherearenumerousdefinitionsandmetricsof"fairness"(Hutchinson&Mitchell,2019).ManyofthesehavebeendevelopedinthecontextoftheUSA,somebasedonthe“four-fifthsrule”fromUSFederalemploymentregulation,whicharenotvalidinothercontextsandcountries(Watkinsetal.,2022).Furthermore,thesemetricsareincompatiblewitheachother(Kleinbergetal.,2016).
9
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
proxyvariables.Third,transformationscouldmaketheAImodellessinterpretable(Leprietal.,2018).
4.Massagingorrelabeling(Kamiran&Calders,2012):Relabelingisaspecifictypeoftransformationtostrategicallymodifythelabelsinthetrainingdatasuchthatthedistributionofpositiveinstancesforallclassesisequal.Forexample,ifadatasetcontainsdataaboutmenandwomen,theproportionofthedatasetthatislabelled‘+’forwomenshouldbethesameasthatformen.Iftheproportionislessforwomen,thensomeofthedatapointsforwomenthatwereclosetobeingclassifiedas‘+‘butwereinitiallylabelled‘-’willbechanged,andthereversewillbedonefordatapointsformen.Thisapproachisnotrestrictedtotrainingdatasetandcanalsobeusedforvalidationandtestdatasets.
5.Reweighing(Caldersetal.,2009;Jiang&Nachum,2020;Krasanakisetal.,2018):Insteadofchangingthelabelsinthedataset,thisapproachaddsspecific‘weight’foreachdatapointtoadjustforthebiasinthetrainingdataset.Theweightscanbechosenbasedonthreefactors:
(1)thespecialcategoriesofpersonaldataalongwiththeprobabilityinthepopulationofthissensitiveattribute,(2)theprobabilityofaspecificoutcome[+/-]and(3)observedprobabilityofthisoutcomeforasensitiveattribute.
Forinstance,womenconstitute50%ofallhumans,andifthelabel‘+’isassignedto60%ofalldatainthedataset,then30%ofthedatasetshouldcontainwomenwitha‘+’label.However,ifitisobservedthatonly20%ofdatasethaswomenwitha‘+’label,thena1.5weightisappendedtowomenwitha‘+’label,0.75isappendedtomenwitha‘+’label,andsoon,toadjustforthebias.
Alternatively,amoredynamicapproachcanbetakenbytraininganunweightedclassifiertolearntheweightsandthenretraintheclassifierbyusingthoseweights.
2
Reweighingismoresuitableforsmallmodelswhereretrainingisnottooexpensiveintermsofcostandresources.
6.Resampling(Kamiran&Calders,2012):Incontrasttothepreviousmethods,theresamplingmethoddoesnotinvolveaddingweightstothesample,nordoesitinvolvechanginglabelsinthetrainingdataset.Instead,thisapproachfocussesonhowsamplesfromthedatasetarechosentobeusedfortrainingsuchthatabalancedsetofsamplesisusedfortraining.Datafromtheminorityclasscanbeduplicated,or“oversampled”,whiledatafromthemajorityclasscanbeskipped,or“under-sampled”.ThechoiceusuallydependsonthesizeoftheentiredatasetandtheoverallimpactontheperformanceoftheAImodel.Forinstance,under-samplingrequiresdatasetswithsufficientlylargeamountsofdatafromthedifferentclasses.
7.Generatingartificialtrainingdata(Sattigerietal.,2019):Whenthequantityofavailabledataislimited,especiallyforunstructureddatasuchasimages,agenerativeprocesscanbeusedtodevelopthedataset.Theuseofgenerativeadversarialnetworks(GAN)whichincludesspecificbiasconsiderationscancontributetogeneratingandusinglessbiaseddatasetsfor
2Thisprocessoftraininganunweightedmodelfirst,makesthisapproachofreweighingamixofin-processingandpre-processing.
10
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
11
training.Thisapproachassumesthatanappropriatefairnesscriterionisavailable,whichisastrongassumption,anditrequiressignificantcomputingpower.
1.2.2In-processing
1.Regularisation(Kamishimaetal.,2012):Regularisationisusedinmachinelearningtopenaliseundesiredcharacteristics.Thisapproachwasprimarilyusedtoreduceover-fittingbuthasbeenextendedtoaddressbias.Thisapproachpenalisesclassifierswithdiscriminatorybehaviour.Itisadata-drivenapproachthatreliesonbalancingfairness(asdefinedbyachosenfairnessmetric)andaperformancemetricsuchasaccuracyortheratiobetweentruepositiverateandfalsepositiverateforminoritygroups(Bechavod&Ligett,2017).
Whilethisapproachisgenericandflexible,itreliesonthedeveloperchoosingthemostsuitablemetric,whichallowsforgamification.Inaddition,therearealsoconcernsthatnotallfairnessmeasuresareequallyaffectedbyregularisationparameters(Stefanoetal.,2020).Furthermore,thisapproachcouldresultinreducedaccuracyandrobustness.
2.Constrainedoptimisation(Agarwaletal.,2018;Zafaretal.,2017):Constrainedoptimisation,asthenamesuggests,constrainstheoptimisationfunctionbyincorporatingafairnessmetricduringthemodeltrainingbyeitheradaptinganexistinglearningparadigmorthroughwrappermethods.Inessence,thisapproachchangesthealgorithmoftheAImodel.Inadditiontofairnessmetrics,otherconstraintsthatcapturedisparitiesinpopulationfrequenciescanbeincluded,resultingintrade-offsbetweenthemetrics.
Thechosenfairnessmetriccanresultinvastlydifferentmodelsandhence,thisapproachisheavilyreliantonthechoiceofthefairnessmetric,whichresultsindifficultytobalancetheconstraintsaswellasunstabletraining.
3.Adversarialapproach(Celis&Keswani,2019;Zhangetal.,2018):Whileadversariallearningisprimarilyanapproachtodeterminetherobustnessofmachinelearningmodels,itcanalsobeusedasamethodtodeterminefairness.Anadversarycanattackthemodeltodeterminetheprotectedattributefromtheoutputs.Thentheadversaryfeedbackcanbeusedtopenaliseandupdatethemodeltopreventdiscriminatoryoutputs.Themostcommonapproachofincorporatingthisfeedbackisasanadditionalconstraintintheoptimisationprocess,thatis,throughconstrainedoptimisation.
1.2.3Post-processing
1.Calibration(Pleissetal.,2017):Calibrationistheprocesswheretheproportionofpositivepredictionsisthesameforallsubgroups(protectedorotherwise)inthedata.Thisapproachdoesnotdirectlyaddressthebiasesbuttacklesitindirectlybyensuringthattheprobabilityofpositiveoutcomesisequalacrosssocialgroups.
However,calibrationislimitedinflexibilityandinaccommodatingmultiplefairnesscriteria.Infact,thelatterisshowntobeimpossible(Kleinbergetal.,2016).Althoughmanyapproachessuchasrandomisationduringpost-processinghavebeensuggested,thisisanongoingareaofresearchwithoutaclearconsensusonthebestapproach.
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
12
2.Thre
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 《課間活動》(教案)2024-2025學(xué)年數(shù)學(xué)二年級上冊
- 2025年美容院會員協(xié)議模板
- 學(xué)習(xí)2025年雷鋒精神六十二周年主題活動方案 合計3份
- 2025年青海省安全員A證考試題庫
- 《游山西村》歷年中考古詩欣賞試題匯編(截至2024年)
- 全國河大音像版初中信息技術(shù)七年級下冊第一章第二節(jié)《文字素材的采集》教學(xué)設(shè)計
- 歷史-云南省師范大學(xué)附屬中學(xué)2025屆高三下學(xué)期開學(xué)考試試題和答案
- 2025年??谑袉握新殬I(yè)適應(yīng)性測試題庫附答案
- 2025年度兒童游樂場主題包裝與品牌推廣合作協(xié)議書
- 2025年度個人公司資金走賬專項管理合同協(xié)議
- 電力工程質(zhì)量驗收手冊
- 四年級語文下冊 期末復(fù)習(xí)文言文閱讀專項訓(xùn)練(一)(含答案)(部編版)
- 學(xué)習(xí)新課程標(biāo)準(zhǔn)的方案
- 2024年知識競賽-煙花爆竹安全管理知識競賽考試近5年真題附答案
- 民航基礎(chǔ)知識應(yīng)用題庫100道及答案解析
- 2024年黑龍江省哈爾濱市中考數(shù)學(xué)試卷(附答案)
- 2025年全國計算機二級考試模擬考試題庫及答案(共280題)
- JJF(鄂) 143-2024 路面材料強度試驗儀校準(zhǔn)規(guī)范
- 臺州事業(yè)單位筆試真題2024
- 父母房產(chǎn)繼承協(xié)議書范本
- 51個行業(yè)領(lǐng)域重大事故隱患判定標(biāo)準(zhǔn)和重點檢查事項匯編
評論
0/150
提交評論