人工智能:復(fù)雜算法與有效的數(shù)據(jù)保護監(jiān)-偏見評估督_第1頁
人工智能:復(fù)雜算法與有效的數(shù)據(jù)保護監(jiān)-偏見評估督_第2頁
人工智能:復(fù)雜算法與有效的數(shù)據(jù)保護監(jiān)-偏見評估督_第3頁
人工智能:復(fù)雜算法與有效的數(shù)據(jù)保護監(jiān)-偏見評估督_第4頁
人工智能:復(fù)雜算法與有效的數(shù)據(jù)保護監(jiān)-偏見評估督_第5頁
已閱讀5頁,還剩28頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

SUPPORTPOOL

OFEXPERTSPROGRAMME

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision

Biasevaluation

byDr.KrisSHRISHAK

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

2

AspartoftheSPEprogramme,theEDPBmaycommissioncontractorstoprovidereportsandtoolsonspecifictopics.

TheviewsexpressedinthedeliverablesarethoseoftheirauthorsandtheydonotnecessarilyreflecttheofficialpositionoftheEDPB.TheEDPBdoesnotguaranteetheaccuracyoftheinformationincludedinthedeliverables.NeithertheEDPBnoranypersonactingontheEDPB’sbehalfmaybeheldresponsibleforanyusethatmaybemadeoftheinformationcontainedinthedeliverables.

Someexcerptsmayberedactedorremovedfromthedeliverablesastheirpublicationwouldunderminetheprotectionoflegitimateinterests,including,interalia,theprivacyandintegrityofanindividualregardingtheprotectionofpersonaldatainaccordancewithRegulation(EU)2018/1725and/orthecommercialinterestsofanaturalorlegalperson.

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

3

TABLEOFCONTENTS

1Stateoftheartforbiasevaluation 5

1.1Sourcesofbias 5

1.1.1Biasfromdata 5

1.1.2Algorithmbias 6

1.1.3Evaluationbias 6

1.1.4Sourcesofbiasinfacialrecognitiontechnology 7

1.1.5SourcesofbiasingenerativeAI 7

1.2Methodstoaddressbias 8

1.2.1Pre-processing 9

1.2.2In-processing 11

1.2.3Post-processing 11

1.2.4MethodsforgenerativeAI 12

2Toolsforbiasevaluation 13

2.1IBMAIF360 13

2.2Fairlearn 13

2.3HolisticAI 14

2.4Aequitas 14

2.5What-IfTool 14

2.6Othertoolsconsidered 15

Conclusion 15

Bibliography 16

DocumentsubmittedinMarch2024

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

4

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

5

1STATEOFTHEARTFORBIASEVALUATION

Artificialintelligence(AI)systemsaresocio-technicalsystemswhosebehaviourandoutputscanharmpeople.BiasinAIsystemscanharmpeopleinvariousways.Biascanresultfrominterconnectedfactorsthatmaytogetheramplifyharmssuchasdiscrimination(EuropeanUnionAgencyforFundamentalRights,2022;Weertsetal.,2023).MitigatingbiasinAIsystemsisimportantandidentifyingthesourcesofbiasisthefirststepinanybiasmitigationstrategy.

1.1Sourcesofbias

TheAIpipelineinvolvesmanychoicesandpracticesthatcontributetobiasedAIsystems.BiaseddataisjustoneofthesourcesofbiasedAIsystems,andunderstandingitsvariousformscanhelptodetectandtomitigatethebias.Inoneapplication,thelackofrepresentativedatamightbethesourceofbias,e.g.,medicalAIwheredatafromwomenwithheartattacksislessrepresentedthanmeninthe

dataset.Inanother,theproxyvariablesthatembedgenderbiasmightbetheproblem,e.g.,inrésuméscreening.Increasingthedatasetsizeforwomencouldhelpintheformercase,butnotinthelattercase.

Inadditiontobiasfromdata,AIsystemscanalsobebiasedduetothealgorithmandtheevaluation.Thesethreesourcesofbiasarediscussednext.

1.1.1Biasfromdata

1.Historicalbias:WhenAIsystemsaretrainedonhistoricaldata,theyoftenreflectsocietalbiaswhichareembeddedinthedataset.Out-of-datedatasetswithsensitiveattributesandrelatedproxyvariablescontributetohistoricalbias.Thiscanbeattributedtoacombinationoffactors:howandwhatdatawerecollectedandthelabellingofthedata,whichinvolvessubjectivityandthebiasofthelabeller.AnexampleofhistoricalbiasinAIsystemshasbeenshownwithwordembedding(Gargetal.,2018),whicharenumericalrepresentationsofwordsandareusedindevelopingtextgenerationAIsystems.

2.Representationbias:Representationbiasisintroducedwhendefiningandsamplingfromthetargetpopulationduringthedatacollectionprocess.Representationbiascantaketheformofavailabilitybiasandsamplingbias.

a.Availabilitybias:DatasetsusedindevelopingAIsystemsshouldrepresentthechosentargetpopulation.However,datasetsaresometimeschosenbyvirtueoftheiravailabilityratherthantheirsuitabilitytothetaskathand.Availabledatasetsoftenunderrepresentwomenandpeoplewithdisabilities.Furthermore,availabledatasetsareoftenusedoutofcontextforpurposesdifferentfromtheirintendedpurpose(Paulladaetal.,2021).ThiscontributestobiasedAIsystems.

b.Samplingbias:Itisusuallynotpossibletocollectdataabouttheentiretargetpopulation.Instead,asubsetofdatapointsrelatedtothetargetpopulationiscollected,selectedandused.Thissubsetorsampleshouldberepresentativeofthetargetpopulationforittoberelevantandofhighquality.Forinstance,datacollectedfromscrapingRedditorothersocialmediasitesarenotrandomizedandarenotrepresentativeofthepopulationthatdon’tusethesesites.Suchdataarenot

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

6

generalizableforwiderpopulationbeyondthesesites.Andyet,thedataareusedinAImodelsdeployedinothercontexts.

Whendefiningthetargetpopulation,thesubgroupswithsensitivecharacteristicsshouldbeconsidered.AnAIsystembuiltusingadatasetcollectedfromacitywillonlyhaveasmallpercentageofcertainminoritygroups,say5%.Ifthedatasetisusedas-is,thentheoutputsofthisAIsystemwillbebiasedagainstthisminoritygroupbecausetheyonlymakeup5%ofthedatasetandtheAIsystemhasrelativelylessdatatolearnfromaboutthem.

3.Measurementbias:Datasetscanbetheresultofmeasurementbias.Often,thedatathatiscollectedisaproxyforthedesireddata.Thisproxydataisanoversimplificationofthereality.Sometimestheproxyvariableitselfiswrong.Furthermore,themethodofmeasurement,andconsequently,thecollectionofthedatamayvaryacrossgroups.Thisvariationcouldbeduetoeasieraccesstothedatafromcertaingroupsoverothers.

4.Aggregationbias:Falseconclusionsmaybedrawnaboutindividualsorsmallgroupswhenthedatasetisdrawnfromtheentirepopulation.ThemostcommonformofthisbiasisSimpson’sparadox(Blyth,1972)wherepatternsobservedinthedataforsmallgroupsdisappearwhenonlytheaggregatedataovertheentirepopulationisconsidered.Themostwell-knownexampleofthiscomesfromtheUCBerkeleyadmissionsin1973(Bickeletal.,1975).Basedontheaggregatedata,itseemedthatwomenapplicantswererejectedsignificantlymorethanmen.However,theanalysisofthedataatthedepartmentlevelrevealedthattherejectionrateswerehigherformeninmostdepartments.Theaggregatefailedtorevealthisbecauseahigherproportionofwomenappliedtodepartmentswithlowoverallacceptanceratethantheydidtodepartmentswithhighacceptancerate.

1.1.2Algorithmbias

Althoughmuchofthediscussionaroundbiasfocussesonthebiasfromdata,othersourcesofbiasthatcontributetodiscriminatorydecisionsshouldnotbeoverlooked.Infact,AImodelsreflectbiasedoutputsnotonlyduetothedatasetsbutalsoduetothemodelitself(Hooker,2021).Evenwhenthedatasetsarenotbiasedandareproperlysampled,thealgorithmicchoicescancontributetobiaseddecisions.Thisincludesthechoiceofobjectivefunctions,regularisations,howlongthemodelistrained,andeventhechoiceofstatisticallybiasedestimators(Danks&London,2017).

Thevarioustrade-offsmadeduringthedesignanddevelopmentprocesscouldresultindiscriminatoryoutputs.Suchtrade-offscanincludemodelsizeandthechoiceofprivacyprotectionmechanisms(Ferryetal.,2023;Fiorettoetal.,2022;Kulynychetal.,2022).EvenwithDiversityinFaces(DiF)datasetthathasbroadcoverageoffacialimages,anAImodeltrainedwithcertaindifferentialprivacytechniquesdisproportionatelydegradesperformancefordarker-skinnedfaces(Bagdasaryanetal.,2019).Furthermore,techniquestocompressAImodelscandisproportionallyaffecttheperformanceofAImodelsforpeoplewithunderrepresentedsensitiveattributes(Hookeretal.,2020).

1.1.3Evaluationbias

TheperformanceofAIsystemsisevaluatedbasedonmanymetrics,fromaccuracyto“fairness”.Suchassessmentsareusuallyperformedagainstabenchmark,oratestdataset.Evaluationbiasarisesatthisstagebecausethebenchmarkitselfcouldcontributetobias.

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

7

AIsystemscanperformextremelywellagainstaspecifictestdataset,andthistestperformancemayfailtotranslateintoreal-worldperformancedueto“overfitting”tothetestdataset.Thisisespeciallyaproblemifthetestdatasetcarriesoverhistorical,representationormeasurementbias.

Forinstance,ifthetestdatasetwascollectedfromtheUSA,itisunlikelytoberepresentativeforthepopulationinGermany;or,ifthedatasetwascollectedin2020duringCOVID-19butusedinamedicalsettinginanon-COVID-19year.Thismeans,thatevenifthebiasinthetrainingdatasetismitigated,biasmightcreepinattheevaluationstage.

1.1.4Sourcesofbiasinfacialrecognitiontechnology

Historical,representationandevaluationbiasarethemaincausesofbiasinfacialrecognitiontechnology(FRT)and,morebroadly,imagerecognition.Thisisbecausethetrainingandbenchmarkdatasetsareconstructedfrompublicly-availableimagedatasets,oftenthroughwebscraping,thatarenotrepresentativeofdifferentgroupsanddifferentgeographies(Raji&Buolamwini,2019).

DatabasessuchasOpenImagesandImageNetmostlycontainimagesfromtheUSAandtheUK(Shankaretal.,2017).IJB-AandAdiencehavebeenshowntomostlycontainimagesofpeoplewithlight-skinandunderrepresentingpeoplewithdarkskin(Buolamwini&Gebru,2018).Furthermore,racialslursandderogatoryphrasesgetembeddedduringthelabellingprocessofimages(Birhane&Prabhu,2021;Crawford&Paglen,2021).Anddespitedatasetsbeingflaggedforremoval,someofthesedatasetsarestillbeingused(Peng,2020).Iftheseareusedfortrainingand/ortestingFRT,then,bydesign,they’llbebiased.

Evendatasetsthatattempttoaddresstheproblemcanfailintheprocess.IBM’s“DiversityinFaces”datasetwasintroducedtoaddressthelackofdiversityinimagedatasets(Merleretal.,2019).However,itraisedmoreconcerns(Crawford&Paglen,2021).First,theimageswerescrapedfromthewebsiteFlickrwithouttheconsentofthesiteusers(Salon,2019).Second,itusesskullshapesasanadditionalmeasure,whichhashistoricallybeenusedtoshowracialsuperiorityofwhitepeopleand,hence,embedshistoricalbias(Gould,1996).Finally,thedatasetwasannotatedbythreeAmazonTurkworkerswhoguessedtheageandgenderoftheimagesthatwerescraped.

1.1.5SourcesofbiasingenerativeAI

GenerativeAIallowsforthegenerationofcontentincludingtext,images,audioandvideo.Thesourcesofbiasdiscussedintheprevioussections—biasfromdata,algorithmbiasandevaluationbias—getcarriedovertoAIthatgeneratescontent.Inaddition,generativeAIsystemsaredevelopedwithlargeamountsonuncurateddatascrapedfromtheweb.Thisaddsanadditionallayerofriskasthedeveloperswouldlackadequateknowledgeaboutthedataanditsstatisticalproperties,makingithardertoassessthesourcesofbias.

Furthermore,manyofthegenerativeAImodelsaredevelopedwithoutanintendedpurpose.Apre-trainedmodelisbuiltandthenapplicationsaredevelopedontopofthispre-trainedmodelbyotherorganisations.Thus,thesourceofbiascanbeinthepre-trainedmodelandinthecontextofthedownstreamapplication.Whenbiasisembeddedinthepre-trainedmodel,thebiaswillpropagatedownstreamtoalltheapplications.

GenerativeAIdatasetscanreflecthistoricalbias,representationbiasandevaluationbias(Benderetal.,2021).Biascanalsoariseduetodatalabelling,especiallywhenfine-tuningapre-trainedmodelforaspecificapplication.LabelsorannotationsareoftenaddedtothedatabyunderpaidworkersandAmazonTurks.Theymaychoosethewronglabelsbecausetheyaredistracted,orworse,becausetheyembedtheirownbiasbynotbeingfromtherepresentativepopulationwheretheAIsystemwillbe

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

8

deployed.Thisisespeciallythecasewhenmorethanonelabelcouldpotentiallyapplytothedata(Planketal.,2014).

Althoughthedatasetusedforpre-trainedmodeliscurrentlyneithercuratednorlabelledbyhumans(whichorganisationsclaimtobecostly),theprocessofreinforcementlearningfromhumanfeedbackusedbycompaniesdevelopinggenerativeAIintroducesthesamebiases,albeitatalaterstageinthedevelopmentprocess.

Evenwhenthetextdatasetsarewell-labelled,theycancontainsocietalbiasthatariseduetospuriouscorrelations,whicharestatisticalcorrelationsbetweenfeaturesandoutcomes.InthecaseoftextgenerativeAI,suchspuriouscorrelationscanbeobservedwithwordembeddings,whichunderlietextgenerativeAI(Gargetal.,2018):e.g.,‘man’beingassociatedwith‘programming’and‘woman’beingassociatedwith‘homemaker’.Furthermore,asthesearemathematicalobjects,thecontextualinformationaboutthewordsgetlost,andtheyhavebeenobservedtooutput“doctor”-“father”+“mother”as“nurse.”Pre-trainedlanguagemodelssuchasGPTthatrelyonuncurateddatasetsarealsosusceptibletothisissue(Tan&Celis,2019),andmerelyincreasingthesizeofthemodeldoesnotaddresstheproblem(Sagawaetal.,2020).

1.2Methodstoaddressbias

Noautomatedmechanismcanfullydetectandmitigatebias(Wachteretal.,2020).Thereareinherentlimitationswithtechnicalapproachestoaddressbias(Buyl&DeBie,2024).Theseapproachesarenecessary,butnotsufficientforAIsystems,whicharesocio-technicalsystems(Schwartzetal.,2022).ThemostappropriateapproachesdependonthespecificcontextforwhichtheAIsystemisdevelopedandused.Moreover,thecontextualandsocio-culturalknowledgeshouldcomplementthesetechnicalapproaches.

BasedonwhentheinterventionismadeintheAIlifecycletomitigatebias,thetechnicalmethodsandtechniquestoaddressbiascanbeclassifiedintothreetypes(d’Alessandroetal.,2017):

1.Pre-processing:ThesetechniquesmodifythetrainingdatabeforeitisusedtotrainanAImodeltoobscuretheassociationsbetweensensitivevariablesandtheoutput.Pre-processingcanhelpidentifyhistorical,measurementandrepresentationalbiasindata.

2.In-processing:ThesetechniqueschangethewaytheAItrainingprocessisperformedtomitigatebiasthroughchangesintheobjectivefunctionorwithanadditionaloptimisationconstraint.

3.Postprocessing:ThesetechniquestreattheAImodeltobeopaqueandattempttomitigatebiasafterthecompletionofthetrainingprocess.Theassumptionbehindthesetechniquesisthatitisnotpossibletomodifythetrainingdataorthetraining/learningprocesstoaddressthebias.Thus,thesetechniquesshouldbetreatedasalastresortintervention.

Merelyremovingsensitivevariablesfromthedatasetisnotaneffectiveapproachtomitigatebiasduetotheexistenceofproxyvariables(Dworketal.,2012;Kamiran&Calders,2012).

Pre-processingapproachesareagnostictotheAItypeasitfocussesonthedataset.Thisisanimportantadvantage.Furthermore,manyoftheapproacheshavebeendevelopedandtestedoverthepastdecadeandaremorematurethanin-processingtechniques.Pre-processingapproachesareearly-stageinterventionandcanassistwithchangingthedesignanddevelopmentprocess.However,

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

ifthesetechniquesaretheonlyinterventionused,theymightgivetheillusionthatallthebiashasbeenresolved—whichisnotthecase(Obermeyeretal.,2019).theyareonlythestartingpoint.

Forregulators,preprocessingtechniquesareusefulonlyiftheyhaveaccesstothedatasetsthatwereusedtotrainthemodel.Furthermore,theregulatorneedstoconsiderwhetherotherin-processingandpost-processingtechniqueswereusedbythedeveloperanddeployersoftheAIsystem.

1.2.1Pre-processing

1.Dataprovenance(Cheneyetal.,2009;Gebruetal.,2018):Dataprovenanceisanessentialstepbeforeothermethodstomitigatebiasfromdatacanbeused.Itattemptstoanswerwhere,howandwhythedatasetcametobe,whocreatedit,whatitcontains,howitwillbeused,andbywhom.Intheareaofmachinelearning,theterm‘datasheet’ismorecommonlyused.Dataprovenancecan,inthecontextofdataprotection,includethelistingofpersonaldataandnon-personaldata.

2.Causalanalysis(Glymour&Herington,2019;Salimietal.,2019):DatasetsusedtotrainAImodelsoftenincluderelationshipsanddependenciesbetweensensitiveandnon-sensitivevariables.Thus,anyattemptstomitigatebiasinthedatasetrequiresunderstandingtherelationshipsbetweenthesevariables.Otherwise,non-sensitivevariablescouldactasproxiesforthesensitivevariables.Causalanalysishelpswithidentifyingtheseproxies,oftenintheformofvisualizingasagraphthelinkbetweenthevariablesinthedataset.

Causalanalysiscanbeextendedto“repair”thedatasetbyremovingthedependenciesbasedonpre-defined“fairness”criteria.

1

However,thisapproachreliesonpriorcontextualknowledgeabouttheAImodelanditsdeployment,inadditiontobeingcomputationallyintensiveforlargedatasets.

3.Transformation(Calmonetal.,2017;Feldmanetal.,2015;Zemeletal.,2013):Theseapproachesincludetransformingthedataintoalessbiasedrepresentation.Thesetransformationscouldinvolveeditingthelabelssuchthattheybecomeindependentofspecificprotectedgroupingsorbasedonspecific“fairness”objectives.

Transformationsarenotwithoutlimitations.First,transformationsusuallyaffecttheperformanceoftheAImodelandthereisaninherenttrade-offbetweenbiasmitigationandperformancewhenusingthisapproach.Second,transformationsarelimitedtonumericaldataandcannotbeusedforotherkindsofdatasets.Third,thisapproachissusceptibletobiaspersistingduetotheexistenceofproxyvariables.Forthisreason,theuseofthisapproachshouldbeprecededbycausalanalysistounderstandthelinksbetweenthespecialcategorydataandtheproxyvariablesinthestartingdataset.Eventhen,thereisnoguaranteethatthetransformationshaveeliminatedtherelationshipbetweenthespecialcategorydataand

1Thetechnicalliteratureusestheterm"fairness"andtherearenumerousdefinitionsandmetricsof"fairness"(Hutchinson&Mitchell,2019).ManyofthesehavebeendevelopedinthecontextoftheUSA,somebasedonthe“four-fifthsrule”fromUSFederalemploymentregulation,whicharenotvalidinothercontextsandcountries(Watkinsetal.,2022).Furthermore,thesemetricsareincompatiblewitheachother(Kleinbergetal.,2016).

9

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

proxyvariables.Third,transformationscouldmaketheAImodellessinterpretable(Leprietal.,2018).

4.Massagingorrelabeling(Kamiran&Calders,2012):Relabelingisaspecifictypeoftransformationtostrategicallymodifythelabelsinthetrainingdatasuchthatthedistributionofpositiveinstancesforallclassesisequal.Forexample,ifadatasetcontainsdataaboutmenandwomen,theproportionofthedatasetthatislabelled‘+’forwomenshouldbethesameasthatformen.Iftheproportionislessforwomen,thensomeofthedatapointsforwomenthatwereclosetobeingclassifiedas‘+‘butwereinitiallylabelled‘-’willbechanged,andthereversewillbedonefordatapointsformen.Thisapproachisnotrestrictedtotrainingdatasetandcanalsobeusedforvalidationandtestdatasets.

5.Reweighing(Caldersetal.,2009;Jiang&Nachum,2020;Krasanakisetal.,2018):Insteadofchangingthelabelsinthedataset,thisapproachaddsspecific‘weight’foreachdatapointtoadjustforthebiasinthetrainingdataset.Theweightscanbechosenbasedonthreefactors:

(1)thespecialcategoriesofpersonaldataalongwiththeprobabilityinthepopulationofthissensitiveattribute,(2)theprobabilityofaspecificoutcome[+/-]and(3)observedprobabilityofthisoutcomeforasensitiveattribute.

Forinstance,womenconstitute50%ofallhumans,andifthelabel‘+’isassignedto60%ofalldatainthedataset,then30%ofthedatasetshouldcontainwomenwitha‘+’label.However,ifitisobservedthatonly20%ofdatasethaswomenwitha‘+’label,thena1.5weightisappendedtowomenwitha‘+’label,0.75isappendedtomenwitha‘+’label,andsoon,toadjustforthebias.

Alternatively,amoredynamicapproachcanbetakenbytraininganunweightedclassifiertolearntheweightsandthenretraintheclassifierbyusingthoseweights.

2

Reweighingismoresuitableforsmallmodelswhereretrainingisnottooexpensiveintermsofcostandresources.

6.Resampling(Kamiran&Calders,2012):Incontrasttothepreviousmethods,theresamplingmethoddoesnotinvolveaddingweightstothesample,nordoesitinvolvechanginglabelsinthetrainingdataset.Instead,thisapproachfocussesonhowsamplesfromthedatasetarechosentobeusedfortrainingsuchthatabalancedsetofsamplesisusedfortraining.Datafromtheminorityclasscanbeduplicated,or“oversampled”,whiledatafromthemajorityclasscanbeskipped,or“under-sampled”.ThechoiceusuallydependsonthesizeoftheentiredatasetandtheoverallimpactontheperformanceoftheAImodel.Forinstance,under-samplingrequiresdatasetswithsufficientlylargeamountsofdatafromthedifferentclasses.

7.Generatingartificialtrainingdata(Sattigerietal.,2019):Whenthequantityofavailabledataislimited,especiallyforunstructureddatasuchasimages,agenerativeprocesscanbeusedtodevelopthedataset.Theuseofgenerativeadversarialnetworks(GAN)whichincludesspecificbiasconsiderationscancontributetogeneratingandusinglessbiaseddatasetsfor

2Thisprocessoftraininganunweightedmodelfirst,makesthisapproachofreweighingamixofin-processingandpre-processing.

10

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

11

training.Thisapproachassumesthatanappropriatefairnesscriterionisavailable,whichisastrongassumption,anditrequiressignificantcomputingpower.

1.2.2In-processing

1.Regularisation(Kamishimaetal.,2012):Regularisationisusedinmachinelearningtopenaliseundesiredcharacteristics.Thisapproachwasprimarilyusedtoreduceover-fittingbuthasbeenextendedtoaddressbias.Thisapproachpenalisesclassifierswithdiscriminatorybehaviour.Itisadata-drivenapproachthatreliesonbalancingfairness(asdefinedbyachosenfairnessmetric)andaperformancemetricsuchasaccuracyortheratiobetweentruepositiverateandfalsepositiverateforminoritygroups(Bechavod&Ligett,2017).

Whilethisapproachisgenericandflexible,itreliesonthedeveloperchoosingthemostsuitablemetric,whichallowsforgamification.Inaddition,therearealsoconcernsthatnotallfairnessmeasuresareequallyaffectedbyregularisationparameters(Stefanoetal.,2020).Furthermore,thisapproachcouldresultinreducedaccuracyandrobustness.

2.Constrainedoptimisation(Agarwaletal.,2018;Zafaretal.,2017):Constrainedoptimisation,asthenamesuggests,constrainstheoptimisationfunctionbyincorporatingafairnessmetricduringthemodeltrainingbyeitheradaptinganexistinglearningparadigmorthroughwrappermethods.Inessence,thisapproachchangesthealgorithmoftheAImodel.Inadditiontofairnessmetrics,otherconstraintsthatcapturedisparitiesinpopulationfrequenciescanbeincluded,resultingintrade-offsbetweenthemetrics.

Thechosenfairnessmetriccanresultinvastlydifferentmodelsandhence,thisapproachisheavilyreliantonthechoiceofthefairnessmetric,whichresultsindifficultytobalancetheconstraintsaswellasunstabletraining.

3.Adversarialapproach(Celis&Keswani,2019;Zhangetal.,2018):Whileadversariallearningisprimarilyanapproachtodeterminetherobustnessofmachinelearningmodels,itcanalsobeusedasamethodtodeterminefairness.Anadversarycanattackthemodeltodeterminetheprotectedattributefromtheoutputs.Thentheadversaryfeedbackcanbeusedtopenaliseandupdatethemodeltopreventdiscriminatoryoutputs.Themostcommonapproachofincorporatingthisfeedbackisasanadditionalconstraintintheoptimisationprocess,thatis,throughconstrainedoptimisation.

1.2.3Post-processing

1.Calibration(Pleissetal.,2017):Calibrationistheprocesswheretheproportionofpositivepredictionsisthesameforallsubgroups(protectedorotherwise)inthedata.Thisapproachdoesnotdirectlyaddressthebiasesbuttacklesitindirectlybyensuringthattheprobabilityofpositiveoutcomesisequalacrosssocialgroups.

However,calibrationislimitedinflexibilityandinaccommodatingmultiplefairnesscriteria.Infact,thelatterisshowntobeimpossible(Kleinbergetal.,2016).Althoughmanyapproachessuchasrandomisationduringpost-processinghavebeensuggested,thisisanongoingareaofresearchwithoutaclearconsensusonthebestapproach.

AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation

12

2.Thre

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論