大語言模型公平性Fairness in LLMs CIKM Tutorial-Fairness in Large Language Models in Three Hours_第1頁
大語言模型公平性Fairness in LLMs CIKM Tutorial-Fairness in Large Language Models in Three Hours_第2頁
大語言模型公平性Fairness in LLMs CIKM Tutorial-Fairness in Large Language Models in Three Hours_第3頁
大語言模型公平性Fairness in LLMs CIKM Tutorial-Fairness in Large Language Models in Three Hours_第4頁
大語言模型公平性Fairness in LLMs CIKM Tutorial-Fairness in Large Language Models in Three Hours_第5頁
已閱讀5頁,還剩491頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

FairnessinLargeLanguageModelsinThreeHours

ThangVietDoanZichongWangNhatNguyenMinhHoangWenbinZhang

1

Thistutorialisgroundedinour

surveysandestablished

benchmarks,

allavailableasopen-source

resources:

/LavinWong/Fair

ness-in-Large-Language-Model

2

WARNING:

Thefollowingslidescontainsexamplesofmodelbiasand

evaluationwhichareoffensiveinnature.

3

LargeLanguageModelsarefascinating!

UnprecedentedLanguage

DiverseApplications

BreakingLanguageand

Capabilities

AcrossIndustries

KnowledgeBoundaries

4

Buttheyarenotperfect!

Source:GPT-4,10/2024

LLMsexhibitunfairnessin

theiranswers!

5

Buttheyarenotperfect!

Source:GPT-4,10/2024

LLMsexhibitunfairnessin

theiranswers!

Emergencyneedtohandlebiasin

LLMs’behavior!

6

BiasmitigatinginLLMsisdifferent

Howbiasisformed

Howtomeasureunfairness

Whatmethodscanbeappliedtomitigatebias

WhatarethetoolsformeasuringandmitigatingbiasWhyismitigatingbiaschallenged

IN

LARGE

LANGUAGEMODELS

7

BiasmitigatinginLLMsisdifferent

Howbiasisformed

Howtomeasureunfairness

Whatmethodscanbeappliedtomitigatebias

WhatarethetoolsformeasuringandmitigatingbiasWhyismitigatingbiaschallenged

IN

LARGE

LANGUAGEMODELS

Webuiltaroadmaptoexplorethesequestions!

8

Roadmap

Section1:BackgroundonLLMs

Section2:QuantifyingbiasinLLMs

Section3:MitigatingbiasinLLMs

Section4:ResourcesforevaluatingbiasinLLMs

Section5:Challengesandfuturedirections

9

Section1:BackgroundonLLMs

10

Content

?ReviewthedevelopmenthistoryofLLMs

?TrainingprocedureofLLMs,howitachievesuchcapabilities

?ExplorethebiassourcesinLLMs

11

1.1HistoryofLLMs

Thissectionisgroundedinour

introductiontoLLMssurvey[1].

[1]Wang,Zichong,Chu,Zhibo,Doan,ThangViet,Ni,Shiwen,Yang,Min,Zhang,Wenbin.“History,

development,andprinciplesoflargelanguagemodels:anintroductorysurvey."AIandEthics(2024):1-17.

12

1.1HistoryofLLMs

a.LanguageModels

●EarlierStages:

StatisticalLMs->NeuralLMs

●N-grams[2]:

●Forexample:

[2]Jurafsky,Dan;Martin,JamesH.(7January2023)."N-gramLanguageModels".SpeechandLanguage

Processing(PDF)(3rdeditiondrafted.).Retrieved24May2022.13

1.1HistoryofLLMs

a.LanguageModels

●EarlierStages:

StatisticalLMs->NeuralLMs

●Word2Vec[3,4]:

14

[3]MikolovT,ChenK,CorradoG,DeanJ(2013)Efficientestimationofwordrepresentationsinvectorspace.In:ProceedingsofICLRWorkshop2013

[4]MikolovT,SutskeverI,ChenK,CorradoG,DeanJ(2013)Distributedrepresentationsofwordsandphrasesandtheircompositionality.AdvNeuralInfProcessSyst26:1

1.1HistoryofLLMs

a.LanguageModels

●EarlierStages:

StatisticalLMs->NeuralLMs

●RNN[5]:

15

[5]A.Graves,A.-r.MohamedandG.Hinton,"Speechrecognitionwithdeeprecurrentneuralnetworks,"2013IEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing,Vancouver,BC,Canada,2013,pp.6645-6649,doi:10.1109/ICASSP.2013.6638947.

1.1HistoryofLLMs

a.LanguageModels

●Drawbacks:

○Poorgeneralization

○Lackoflong-termdependence

○Recurrentcomputation

○Difficultincapturing

complexlinguistic

propertiesandphenomena

16

1.1HistoryofLLMs

UntilTransformers[6]…

[6]Vaswani,A."Attentionisallyouneed."AdvancesinNeuralInformationProcessingSystems(2017).

17

1.1HistoryofLLMs

b.LargeLanguageModels

●UntilTransformers:

○Self-Attention:Long-RangeDependencies

18

1.1HistoryofLLMs

b.LargeLanguageModels

●UntilTransformers:

○Multi-headAttention:

ContextualizedWordRepresentations

19

1.1HistoryofLLMs

b.LargeLanguageModels

●UntilTransformers:

○ParallelizationandScalability

20

1.1HistoryofLLMs

b.LargeLanguageModels

●Transformersrevolutionizedthenaturallanguageprocessinglandscape!

●ResultsinamassivebloomingeraofLLMs:GPT,BERT,LLaMA,Claudeandmoretogo!

●Broadapplicationsacrossdomains:

○Education

○Healthcare

○Technology

○Andsoon…

21

WLLM

g

ag

ar

a

H

f

fr

ft

ts

t

massIV

am

mth

th

d

?

tra

yar

22

g

ag

ar

a

HOWLLM

f

fr

ft

ts

t

mass

am

IWV

mth

th

d

?

tra

yar

f

ngpr

cedure

Ms

n

Tra

23

1.2TrainingLLMs

KeystepstotrainLLMs

●Traininglargelanguagemodelsisacomplex,

multi-stepprocessthatrequirescarefulplanningandexecution.

24

1.2TrainingLLMs

a.DataPreparation

●DataisthefoundationofLLMs.

●“GarbageIn,GarbageOut”:

Poordataqualitycanleadtobiased,

inaccurate,orunreliablemodeloutputs.

●High-qualitydatacanleadtoaccurate,coherent,andreliableoutputs.

Figure:Modelperformancedecreasesignificantly

withhighdataerrorproportion[7]

[7]Srivastava,Ankit,PiyushMakhija,andAnujGupta."NoisyTextData:Achilles’HeelofBERT."ProceedingsoftheSixthWorkshoponNoisyUser-generatedText(W-NUT2020).2020.

25

1.2TrainingLLMs

Quality:Accuratelyrepresentthedomainandlanguagestyle,factuallycorrectandfreefromerrors.

a●.DataPreparation

●Examples:

LowQuality

HighQuality

Problem

Hearedeveloper

Heisdeveloper

GrammaticalError

Thisgameislit!Thxforyourattn!

Thisgameisawesome!Thanksforyourattention!

Slangsand

Abbreviations

Onlymencandoengineering

Bothmenandwomencandoengineering

Unfairandinaccurate

26

1.2TrainingLLMs

a.DataPreparation

●Diversity:Representawidevarietyof

languages,domains,andcontextstoimprovegeneralization.

●Somelanguageshavelimitedavailabilityoflinguisticdata,tools,andresourcescomparedtomorewidelyspokenlanguages.

Figure:

/blog/teaching-ai-to-translate-100s-of-spoken-and-written-languages-in-real-time/

27

1.2TrainingLLMs

a.DataPreparation

●DataCleaning-QualityFiltering:

○Noise/OutlierHandling:Identifyingandremovingnoisyorirrelevantdatathatcoulddistortthemodel’sperformance.

○Normalization:Ensuringthatthedataisconsistentandstandardizedacrossdifferentsources.

○Chunking/Pruning:Breakinglargedatasetsintomanageablepieces.

○Deduplication:Removingduplicateentriestoavoidredundantinformationinthetrainingset.

28

1.2TrainingLLMs

a.DataPreparation

reducingbiasinthedataandreducestereotypesinmodeloutputs.

●DataCleaning-EthicalFiltering:

○B(yǎng)iasMitigation:Identifyingand

○ToxicityReduction:Removingharmfulortoxiccontentfromthedataset.

○Privacy:Excludingpersonallyidentifiableinformation(PII)

orsensitivedata.

○Faithfulness:Removinginaccuratedata,preventingmisinformation.

29

1.2TrainingLLMs

a.DataPreparation

●DataValidation-DataFormat&DataIntegrity:

○DataFormat:Ensuringthatthedatafollowsa

specificstructureorformatthatiscompatiblewiththemodel.

○DataIntegrity:Validatingthatthedataiscomplete,reliable,andaccuratefortraining.

30

1.2TrainingLLMs

a.DataPreparation

standards

●DataValidation-EthicalValidity:

○Privacy:Ensuringthedatamaintainsprivacy

throughouttheprocess.

○Fairness:Checkingthatthedataisbalancedanddoesn’tintroduceunfairbias.

○AccuracyandConsistency:Ensuringthatthedataisaccurateacrossdifferentsourcesandconsistentthroughoutthe

dataset.

○Toxicity:Verifyingthattoxicorharmfuldatahasbeen

removedandnosuchdataremains.

31

1.2TrainingLLMs

b.Training/Fine-tuningconfiguration

●LLMsmodelstructureselection:

○Transformers-basedarchitecture

○Structurestoselectfrom:

■Encoder-only(BERTs)

■Decoder-only(GPTs,LLaMA)

■Encoder-Decoder(T5,BART)

●Considerations:

○Pre-trainedorFrom-Scratch

○Modelsizeandcomplexity

○Keyelements:learningrate,contextlength,numberofattentionheads,etc.

32

1.2TrainingLLMs

b.Training/Fine-tuningconfiguration

●HyperparameterTuning:

○Hyperparametertuningisaboutfine-tuningthemodel’ssettingstogetthebestpossibleperformances

○Tuningstrategy:

■GridSearch:Tryallpossiblecombinationsofpre-definedhyperparameters

■RandomSearch:Samplehyperparametervaluesfromsearchspace

■BayesianOptimization:Buildaprobabilisticmodeloftheobjectivefunctionandusesthismodeltoselectthemostpromisinghyperparameter

■Hyperband(SuccessiveHalving):Assigndifferentresourcestoeachsetofhyperparametersandprogressivelyeliminatestheworst-performingones.

33

1.2TrainingLLMs

c.InstructionTuning

●Afine-tuningtechniqueforLLMsonalabeledsetofinstructionpromptsandoutputsofvariedtasksanddomainsinsimilarinstructionformat.

●Themodelistaughttofollowtheinstruction,thusimprovingitsgeneralizationonunseentasksanddomains.

34

1.2TrainingLLMs

c.InstructionTuning

●Mightintroducebiasbyteachingmodelpotentialstereotypesingiveninstruction.

Unintentionallyintroducegenderbias!!Exploitmodel’sracialbias!!

35

1.2TrainingLLMs

d.Alignmentwithhuman

●ReinforcementLearningfromHumanFeedback:

○Incorporatehumanfeedbacktotherewardsfunction.

○SotheLLMscanperformtasksmorealignedwithhumanvaluessuchashelpfulness,honesty,andharmlessness.

36

1.2TrainingLLMs

d.Alignmentwithhuman

●ReinforcementLearningfromHumanFeedback:

○Dealwithbiaspotentiallygeneratedbymodelby

steeringmodeltowardshuman-preferenceresponses.

○However,there’sstillachanceofunfairnessintroducedinhuman-feedback.

37

1.3BiassourcesinLLMs

38

1.3BiassourcesinLLMs

a.Trainingdatabias:

●HistoricalBias:Datamightbemissing,incorrectlyrecordedfordiscriminatedgroups,ortheunfairtreatmentoftheminoritycouldpotentiallybereflectedbyLLMs

39

1.3BiassourcesinLLMs

a.Trainingdatabias:

●DataDisparity:DissimilaritybetweendifferentdemographicgroupsintrainingdatasetcouldleadtounfairnessunderstandofLLMstothosegroups.

40

1.3BiassourcesinLLMs

b.Embeddingbias

●Wordrepresentationsvectormightexhibitbiasdemonstratedbycloserdistancetosensitivewords(i.e.genders-she/he)

●Leadtobiasesindownstreamtaskstrainedfromtheseembeddings

41

1.3BiassourcesinLLMs

c.Labelbias

●Arisesfromthesubjectivejudgmentsofhumanannotatorswhoprovidelabelsorannotationsfortrainingdata.

●CanoccurduringvariousphasesofLLMstraining:

○DataLabelling

○InstructionTuning

○RLHF

42

1.3Terminologies

LLMsClassification

Terminologies

FairnessNotions

43

1.3Terminologies

a.LLMsClassification:

LargeLanguageModels

Large-sized

LargeLanguageModels

Medium-sized

LargeLanguageModels

44

44

1.3Terminologies

a.LLMsClassification:Medium-sizedvsLarge-sizedLLMs

●Pretrainbasemodel

●Upto10billionparameters

●Utilizedfine-tuningtoperformtasks

●●●●

Pretrainbasemodel

HundredsofbillionparametersUniversalcapability

Utilizeprompt-basedtechniques(InstructionTuning,RLHF)

45

1.3Terminologies

Medium-sizedLLMs

Large-sizedLLMs

NumberofParameters

Fewerthan10billionparameters

Fromtenstohundredsofbillionsofparameters

Fine-tuningApproach

Fine-tunedforspecifictasksordomains

Prompt-based:InstructionTuning,RLHF

Capabilities

Specializedperformanceintargetedapplications

Universallanguagecapabilities,versatileacrossvarioustasks

InteractionStyle

Task-specificinteractionsafterfine-tuning:Textgeneration,Classification,etc.

Naturalcommunicationandpromptingwithoutextensivefine-tuning

Ethical

Alignment

Limitedbythescopeoffine-tuning

EnhancedethicalalignmentthroughmethodslikeRLHF

Applicability

Applicabletowiderangeofscale

Verylargedatacentersonly

Deployment

Canbehostedlocallyandprivately

RelyoncallingAPItodatacenters

Accessibility

Canbeinspectedforembeddings,innerstructureandoutputs

Canonlyaccessinputpromptsandoutputs

46

1.3Terminologies

b.Fairnessterminologies:

deprivedandfavoredgroups

●Sensitiveattribute:Anattributerelatedtothedemographicinformationthatcanbediscriminatedagainstor

not.

●Deprivedgroup:Referstopeoplewiththeirsensitiveattributediscriminatedagainst.

○Forexample:women,physicaldisability,immigrants,low-incomebackground,etc.

●Favoredgroup:Individualswhosesensitiveattributearenotdiscriminated.

●Rejected:Theeventthatanindividualfromonegroup(deprivedorfavored)beingdeniedforalegalrightorbenefit.

●Granted:Theeventthatanindividualfromonegroup(deprivedorfavored)beingallowedforalegalrightorbenefit.

47

1.3Terminologies

●Sensitiveattribute:Race

●Deprivedgroup:blackpeople

●Favoredgroup:whitepeople.

●Rejected:Blackpeople’sjokeisbeingrefusedtotalkabout.

●Granted:Whitepeople’sjokeistreatednormally

Source:GPT-4,10/2024

48

Section2

QuantifyingbiasinLLMs

49

Content

●Quantifyingbiasinmedium-sizedLLMs

○Intrinsicbias

○Extrinsicbias

●Quantifyingbiasinlarge-sizedLLMs

○DemographicRepresentation

○StereotypicalAssociation

○CounterfactualFairness

○PerformanceDisparities

50

2.QuantifyingbiasinLLMs

Thissectionisgroundedinourfairness

definitionsinLLMssurvey[8].

[8]Doan,ThangViet,ZhiboChu,ZichongWang,andWenbinZhang."FairnessDefinitionsinLanguageModelsExplained."arXivpreprintarXiv:2407.18454(2024).

51

Section2.1

Quantifyingbiasinmedium-sizedLLMs

52

2.1.Quantifyingbiasinmedium-sizedLLMs

53

2.1.Quantifyingbiasinmedium-sizedLLMs

54

2.1.Quantifyingbiasinmedium-sizedLLMs

55

2.1.Quantifyingbiasinmedium-sizedLLMs

56

2.1.Quantifyingbiasinmedium-sizedLLMs

●Classification:

○Intrinsicbiasinembedding

○Extrinsicbiasinoutput.

57

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias

●Definition:

○Intrinsicbias(a.k.a.upstreambiasorrepresentationalbias)referstotheinherentbiasespresentintheoutputrepresentationgenerated.

○Arisefromthevastcorpusduringtheinitialpre-trainingphase.

●Classification:

○Similarity-basedbias

○Probability-basedbias

58

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Similarity-basedbias

●Definition:

○B(yǎng)iasthatarisefromthewaydifferentwords/phrasesarerelatedintheembeddingspace.

○Suitableforstaticembedding.

59

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding

WordEmbeddingAssociationTest(WEAT)[9]measuresstereotypicalbiasesinwordembeddings,inspiredbytheImplicitAssociationTest[10].

●ImplicitAssociationTest:apsychologicaltestusedtomeasureparticularbiasesbyassessinghowquicklyindividualsassociatedifferentconcepts.

60

[9]AylinCaliskan,JoannaJBryson,andArvindNarayanan.2017.Semanticsderivedautomaticallyfromlanguagecorporacontainhuman-likebiases.Science356,6334(2017),183–186.

[10]Greenwald,A.G.,McGhee,D.E.,&Schwartz,J.L.(1998).Measuringindividualdifferencesinimplicitcognition:theimplicitassociationtest.Journalofpersonalityandsocialpsychology,74(6),1464.

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding

WordEmbeddingAssociationTest(WEAT)

●Keycomponents:

○Targetwords:

■X:E.g.,male(“man",“boy",etc.)

■Y:E.g.,female(“woman",“girl",etc.)

○Attributewords:

■A:E.g.,career(“engineer",“scientist",etc.)

■B:E.g.,family(“home",“parents",etc.)

○Associationscore:

■wherethecosinesimilarityscoreisanalogoustoreactiontimeintheIAT.

61

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding

WordEmbeddingAssociationTest(WEAT)

●Teststatistics:

○Wheres(w,A,B)istheassociationscoreofwordw

○XandYaretwosetsoftargetwords

○AandBaretwosetsofattributewords

,XassociateswithA,YassociateswithB,XassociateswithB,YassociateswithA

62

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding

SentenceEmbeddingAssociationTest(SEAT)[11]extendsWEATbyusingsentenceembeddings.

●Template:Thisisa[term].

●Targetsentences:

○X:Thisisaprogrammer,Thisisadoctor,...

○Y:Thisisanurse,Thisisateacher,...

●Attributesentences:

○A:Thisisaman,Thisisaboy,...

○B(yǎng):Thisisawoman,Thisisagirl,...

63

[11]May,C.,Wang,A.,Bordia,S.,Bowman,S.R.andRudinger,R.,2019.OnMeasuringSocialBiasesinSentenceEncoders.InProceedingsofthe2019ConferenceoftheNorth.AssociationforComputationalLinguistics.

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding

SentenceEmbeddingAssociationTest(SEAT)[11]extendsWEATbyusingsentenceembeddings.

●Template:Thisisa[term].

●Targetsentences:

○X:Thisisaprogrammer,Thisisadoctor,...

○Y:Thisisanurse,Thisisateacher,...

●Attributesentences:

○A:Thisisaman,Thisisaboy,...

○B(yǎng):Thisisawoman,Thisisagirl,...

64

[11]May,C.,Wang,A.,Bordia,S.,Bowman,S.R.andRudinger,R.,2019.OnMeasuringSocialBiasesinSentenceEncoders.InProceedingsofthe2019ConferenceoftheNorth.AssociationforComputationalLinguistics.

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding

●Limitation:

○Assumptionthateachwordhasauniqueembedding.

■Inconsistentresultforembeddinggeneratedusingcontextualmethods.

65

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias

●Definition:Biasesthatareevidentinthelikelihooddistributionsgeneratedbythemodel.

●Categories:

○MaskedTokenMetrics

○Pseudo-Log-LikelihoodMetrics

66

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias

●MasktokenpredictioninTransformer[12]:

67

[12]Ghazvininejad,M.,Levy,O.,Liu,Y.andZettlemoyer,L.,2019,November.Mask-Predict:ParallelDecodingofConditionalMaskedLanguageModels.InProceedingsofthe2019ConferenceonEmpiricalMethodsinNaturalLanguageProcessingandthe9thInternationalJointConferenceonNaturalLanguageProcessing(EMNLP-IJCNLP)(pp.6112-6121).

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-MaskedTokenMetrics

●Definition:Comparethedistributionsofpredictedmaskedwordsintwosentencesthatinvolvedifferentsocialgroups.

68

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-MaskedTokenMetrics

Log-ProbabilityBiasScore(LPBS)[13]measuresbiasincontextualembeddingmodels(e.g.,BERT)usingthenormalizationofprobabilities.

●Motivation:Filteroutanydefaultpreferencesthemodelmayhavetowardgenderedtermsbasedonsentence

structure.

69

[13]Kurita,K.,Vyas,N.,Pareek,A.,Black,A.W.andTsvetkov,Y.,2019,August.MeasuringBiasinContextualizedWordRepresentations.InProceedingsoftheFirstWorkshoponGenderBiasinNaturalLanguageProcessing(pp.166-172).

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-MaskedTokenMetrics

Log-ProbabilityBiasScore(LPBS)[13]measuresbiasincontextualembeddingmodels(e.g.,BERT)usingthenormalizationofprobabilities.

●Motivation:Filteroutanydefaultpreferencesthemodelmayhavetowardgenderedtermsbasedonsentence

structure.

70

[13]Kurita,K.,Vyas,N.,Pareek,A.,Black,A.W.andTsvetkov,Y.,2019,August.MeasuringBiasinContextualizedWordRepresentations.InProceedingsoftheFirstWorkshoponGenderBiasinNaturalLanguageProcessing(pp.166-172).

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-MaskedTokenMetrics

Log-ProbabilityBiasScore(LPBS)[13]measuresbiasincontextualembeddingmodelsusingthenormalizationofprobabilities.

●Motivation:Filteroutanydefaultpreferencesthemodelmayhavetowardgenderedtermsbasedonsentence

structure.

71

[13]Kurita,K.,Vyas,N.,Pareek,A.,Black,A.W.andTsvetkov,Y.,2019,August.MeasuringBiasinContextualizedWordRepresentations.InProceedingsoftheFirstWorkshoponGenderBiasinNaturalLanguageProcessing(pp.166-172).

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-Pseudo-log-likelihood

●Definition:

○Assessthelikelihoodofasentencebeingastereotypeoranti-stereotypebyestimatingtheconditionalprobabilityofthesentencegiveneachwordinthesentence.

○AnLMthatsatisfiesthesemetricsshouldselectstereotypeandanti-stereotypesentenceswiththesamelikelihood.

72

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-Pseudo-log-likelihood

Pseudo-log-likelihood(PLL)[14]isthefoundationalmetricforthismethod.

●Formula:

○Sentence

○isthepre-trainedparameterofLM.

[14]Salazar,J.,Liang,D.,Nguyen,T.Q.,&Kirchhoff,K.(2020,July).MaskedLanguageModelScoring.In

Proceedingsofthe58thAnnualMeetingoftheAssociationforComputationalLinguistics(pp.2699-2712).73

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-Pseudo-log-likelihood

Pseudo-log-likelihood(PLL)

74

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-CrowS-PairsScore

CrowS-PairsScore(CPS)[15]leveragesPLLtoevaluatethemodel’spreferenceforstereotypicalsentencesusing

theunmodifiedtokens.

●Forasentence:

○ModifiedtokensM

○UnmodifiedtokensU

○S=MUU

●Motivation:Theimbalanceinfrequencyofmodifiedtokens.

75

[15]Nangia,N.,Vania,C.,Bhalerao,R.,&Bowman,S.(2020,November).CrowS-Pairs:AChallengeDatasetforMeasuringSocialBiasesinMaskedLanguageModels.InProceedingsofthe2020ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP)(pp.1953-1967).

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-CrowS-PairsScore

CrowS-PairsScore(CPS)[15]leveragesPLLtoevaluatethemodel’spreferenceforstereotypicalsentencesusing

theunmodifiedtokens.

●Formula:

○SentenceS=MUU

○isthepre-trainedparameterofLM.

76

[15]Nangia,N.,Vania,C.,Bhalerao,R.,&Bowman,S.(2020,November).CrowS-Pairs:AChallengeDatasetforMeasuringSocialBiasesinMaskedLanguageModels.InProceedingsofthe2020ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP)(pp.1953-1967).

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-CrowS-PairsScore

CrowS-PairsScore(CPS)

77

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-AllUnmaskedLikelihood

AllUnmaskedLikelihood(AUL)[16]expandsthePLLandCPSbyconsideringalltokenswhencalculatingconditionalprobability.

●Formula:

●Motivation:Lossofinformation.

[16]MasahiroKanekoandDanushkaBollegala.2022.Unmaskingthemask–evaluatingsocialbiasesinmasked

languagemodels.InProceedingsoftheAAAIConferenceonArtificialIntelligence,Vol.36.11954–11962.78

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-Pseudo-log-likelihood

AllUnmaskedLikelihood(AUL)

79

2.1.Quantifyingbiasinmedium-sizedLLMs

a)Intrinsicbias-Probability-basedbias-Pseudo-log-likelihood

80

2.1.Quantifyingbiasinmedium-sizedLLMs

b)Extrinsicbias

●Definition:

○DisparityinaLLM'sperformanceacrossdifferentdownstreamtasks

○Potentiallyleadingtounequaloutcomesinreal-worldapplications

●Downstreamtaskclassification:

○Classificationtasks

○Generationtasks

81

2.1.Quantifyingbiasinmedium-sizedLLMs

b)Extrinsicbias

82

2.1.Quantifyingbiasinmedium-sizedLLMs

b)Extrinsicbias-Classification-basedbias-TextClassification

Definition:Thedifferenceinoutcomesfortextsinvolvingdifferentvaluesofsensitiveattributes(e.g.,gender).

●Example:Bias-in-Bios[17]datasetassessesthecorrelationbetweengenderandoccupation.

83

[17]De-Arteaga,M.,Romanov,A.,Wallach,H.,Chayes,J.,Borgs,C.,Chouldechova,A.,...&Kalai,A.T.(2019,January).Biasinbios:Acasestudyofsemanticrepresentationbiasinahigh-stakessetting.InproceedingsoftheConferenceonFairness,Accountability,andTransparency(pp.120-128).

2.1.Quantifyingbiasinmedium-sizedLLMs

b)Extrinsicbias-Classification-basedbias-TextClassification

●Fortwogroupsand82:

●Foreachoccupationy:

○arepredictedandtargetlabels

○Gisthebinarygender

84

2.1.Quantifyingbiasinmedium-sizedLLMs

b)Extrinsicbias-Classification-basedbias-NLI

●Definition:

○TheLM’stendencytodeviatefromneutralpredictionsduetogender-specificwords.

○NLIisataskofdeterminingwhetherthegiven“hypothesis”and“premise”logicallyfollow(entailment-e)orunfollow(contradiction-c)orareundetermined(neutral-n)toeachother.

●Example:Bias-NLI[18]withspecifictemplate:“The[subject][verb][a/an][object]”

[18]SunipaDev,TaoLi,JeffMPhillips,andVivekSrikumar.2020.Onmeasuringandmitigatingbiasedinferencesofwordembeddings.InProceedingsoftheAAAIConferenceonArtificialIntelligence,Vol.34.7659–7666.

85

2.1.Quantifyingbiasinmedium-sizedLLMs

b)Extrinsicbias-Classification-basedbias-NLI

●Definition:

○TheLM’stendencytodeviatefromneutralpredictionsduetogender-specificwords.

○NLIisataskofdeterminingwhetherthegiven“hypothesis”and“premise”logicallyfollow(entailment-e)orunfollow(contradiction-c)orareundetermined(neutral-n)toeachother.

●Example:Bias-NLI[18]withspecifictemplate:“The[subject][verb][a/an][object]”

[18]SunipaDev,TaoLi,JeffMPhillips,and

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論