![大語言模型公平性Fairness in LLMs CIKM Tutorial-Fairness in Large Language Models in Three Hours_第1頁](http://file4.renrendoc.com/view14/M08/28/12/wKhkGWcl6smARlmmAAE7wU_gqCM955.jpg)
![大語言模型公平性Fairness in LLMs CIKM Tutorial-Fairness in Large Language Models in Three Hours_第2頁](http://file4.renrendoc.com/view14/M08/28/12/wKhkGWcl6smARlmmAAE7wU_gqCM9552.jpg)
![大語言模型公平性Fairness in LLMs CIKM Tutorial-Fairness in Large Language Models in Three Hours_第3頁](http://file4.renrendoc.com/view14/M08/28/12/wKhkGWcl6smARlmmAAE7wU_gqCM9553.jpg)
![大語言模型公平性Fairness in LLMs CIKM Tutorial-Fairness in Large Language Models in Three Hours_第4頁](http://file4.renrendoc.com/view14/M08/28/12/wKhkGWcl6smARlmmAAE7wU_gqCM9554.jpg)
![大語言模型公平性Fairness in LLMs CIKM Tutorial-Fairness in Large Language Models in Three Hours_第5頁](http://file4.renrendoc.com/view14/M08/28/12/wKhkGWcl6smARlmmAAE7wU_gqCM9555.jpg)
版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
FairnessinLargeLanguageModelsinThreeHours
ThangVietDoanZichongWangNhatNguyenMinhHoangWenbinZhang
1
Thistutorialisgroundedinour
surveysandestablished
benchmarks,
allavailableasopen-source
resources:
/LavinWong/Fair
ness-in-Large-Language-Model
2
WARNING:
Thefollowingslidescontainsexamplesofmodelbiasand
evaluationwhichareoffensiveinnature.
3
LargeLanguageModelsarefascinating!
UnprecedentedLanguage
DiverseApplications
BreakingLanguageand
Capabilities
AcrossIndustries
KnowledgeBoundaries
4
Buttheyarenotperfect!
Source:GPT-4,10/2024
LLMsexhibitunfairnessin
theiranswers!
5
Buttheyarenotperfect!
Source:GPT-4,10/2024
LLMsexhibitunfairnessin
theiranswers!
Emergencyneedtohandlebiasin
LLMs’behavior!
6
BiasmitigatinginLLMsisdifferent
Howbiasisformed
Howtomeasureunfairness
Whatmethodscanbeappliedtomitigatebias
WhatarethetoolsformeasuringandmitigatingbiasWhyismitigatingbiaschallenged
IN
LARGE
LANGUAGEMODELS
7
BiasmitigatinginLLMsisdifferent
Howbiasisformed
Howtomeasureunfairness
Whatmethodscanbeappliedtomitigatebias
WhatarethetoolsformeasuringandmitigatingbiasWhyismitigatingbiaschallenged
IN
LARGE
LANGUAGEMODELS
Webuiltaroadmaptoexplorethesequestions!
8
Roadmap
Section1:BackgroundonLLMs
Section2:QuantifyingbiasinLLMs
Section3:MitigatingbiasinLLMs
Section4:ResourcesforevaluatingbiasinLLMs
Section5:Challengesandfuturedirections
9
Section1:BackgroundonLLMs
10
Content
?ReviewthedevelopmenthistoryofLLMs
?TrainingprocedureofLLMs,howitachievesuchcapabilities
?ExplorethebiassourcesinLLMs
11
1.1HistoryofLLMs
Thissectionisgroundedinour
introductiontoLLMssurvey[1].
[1]Wang,Zichong,Chu,Zhibo,Doan,ThangViet,Ni,Shiwen,Yang,Min,Zhang,Wenbin.“History,
development,andprinciplesoflargelanguagemodels:anintroductorysurvey."AIandEthics(2024):1-17.
12
1.1HistoryofLLMs
a.LanguageModels
●EarlierStages:
StatisticalLMs->NeuralLMs
●N-grams[2]:
●Forexample:
[2]Jurafsky,Dan;Martin,JamesH.(7January2023)."N-gramLanguageModels".SpeechandLanguage
Processing(PDF)(3rdeditiondrafted.).Retrieved24May2022.13
1.1HistoryofLLMs
a.LanguageModels
●EarlierStages:
StatisticalLMs->NeuralLMs
●Word2Vec[3,4]:
14
[3]MikolovT,ChenK,CorradoG,DeanJ(2013)Efficientestimationofwordrepresentationsinvectorspace.In:ProceedingsofICLRWorkshop2013
[4]MikolovT,SutskeverI,ChenK,CorradoG,DeanJ(2013)Distributedrepresentationsofwordsandphrasesandtheircompositionality.AdvNeuralInfProcessSyst26:1
1.1HistoryofLLMs
a.LanguageModels
●EarlierStages:
StatisticalLMs->NeuralLMs
●RNN[5]:
15
[5]A.Graves,A.-r.MohamedandG.Hinton,"Speechrecognitionwithdeeprecurrentneuralnetworks,"2013IEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing,Vancouver,BC,Canada,2013,pp.6645-6649,doi:10.1109/ICASSP.2013.6638947.
1.1HistoryofLLMs
a.LanguageModels
●Drawbacks:
○Poorgeneralization
○Lackoflong-termdependence
○Recurrentcomputation
○Difficultincapturing
complexlinguistic
propertiesandphenomena
16
1.1HistoryofLLMs
UntilTransformers[6]…
[6]Vaswani,A."Attentionisallyouneed."AdvancesinNeuralInformationProcessingSystems(2017).
17
1.1HistoryofLLMs
b.LargeLanguageModels
●UntilTransformers:
○Self-Attention:Long-RangeDependencies
18
1.1HistoryofLLMs
b.LargeLanguageModels
●UntilTransformers:
○Multi-headAttention:
ContextualizedWordRepresentations
19
1.1HistoryofLLMs
b.LargeLanguageModels
●UntilTransformers:
○ParallelizationandScalability
20
1.1HistoryofLLMs
b.LargeLanguageModels
●Transformersrevolutionizedthenaturallanguageprocessinglandscape!
●ResultsinamassivebloomingeraofLLMs:GPT,BERT,LLaMA,Claudeandmoretogo!
●Broadapplicationsacrossdomains:
○Education
○Healthcare
○Technology
○Andsoon…
21
WLLM
g
ag
ar
a
H
f
fr
ft
ts
t
massIV
am
mth
th
d
?
tra
yar
22
g
ag
ar
a
HOWLLM
f
fr
ft
ts
t
mass
am
IWV
mth
th
d
?
tra
yar
f
ngpr
cedure
Ms
n
Tra
23
1.2TrainingLLMs
KeystepstotrainLLMs
●Traininglargelanguagemodelsisacomplex,
multi-stepprocessthatrequirescarefulplanningandexecution.
24
1.2TrainingLLMs
a.DataPreparation
●DataisthefoundationofLLMs.
●“GarbageIn,GarbageOut”:
Poordataqualitycanleadtobiased,
inaccurate,orunreliablemodeloutputs.
●High-qualitydatacanleadtoaccurate,coherent,andreliableoutputs.
Figure:Modelperformancedecreasesignificantly
withhighdataerrorproportion[7]
[7]Srivastava,Ankit,PiyushMakhija,andAnujGupta."NoisyTextData:Achilles’HeelofBERT."ProceedingsoftheSixthWorkshoponNoisyUser-generatedText(W-NUT2020).2020.
25
1.2TrainingLLMs
Quality:Accuratelyrepresentthedomainandlanguagestyle,factuallycorrectandfreefromerrors.
a●.DataPreparation
●Examples:
LowQuality
HighQuality
Problem
Hearedeveloper
Heisdeveloper
GrammaticalError
Thisgameislit!Thxforyourattn!
Thisgameisawesome!Thanksforyourattention!
Slangsand
Abbreviations
Onlymencandoengineering
Bothmenandwomencandoengineering
Unfairandinaccurate
26
1.2TrainingLLMs
a.DataPreparation
●Diversity:Representawidevarietyof
languages,domains,andcontextstoimprovegeneralization.
●Somelanguageshavelimitedavailabilityoflinguisticdata,tools,andresourcescomparedtomorewidelyspokenlanguages.
Figure:
/blog/teaching-ai-to-translate-100s-of-spoken-and-written-languages-in-real-time/
27
1.2TrainingLLMs
a.DataPreparation
●DataCleaning-QualityFiltering:
○Noise/OutlierHandling:Identifyingandremovingnoisyorirrelevantdatathatcoulddistortthemodel’sperformance.
○Normalization:Ensuringthatthedataisconsistentandstandardizedacrossdifferentsources.
○Chunking/Pruning:Breakinglargedatasetsintomanageablepieces.
○Deduplication:Removingduplicateentriestoavoidredundantinformationinthetrainingset.
28
1.2TrainingLLMs
a.DataPreparation
reducingbiasinthedataandreducestereotypesinmodeloutputs.
●DataCleaning-EthicalFiltering:
○B(yǎng)iasMitigation:Identifyingand
○ToxicityReduction:Removingharmfulortoxiccontentfromthedataset.
○Privacy:Excludingpersonallyidentifiableinformation(PII)
orsensitivedata.
○Faithfulness:Removinginaccuratedata,preventingmisinformation.
29
1.2TrainingLLMs
a.DataPreparation
●DataValidation-DataFormat&DataIntegrity:
○DataFormat:Ensuringthatthedatafollowsa
specificstructureorformatthatiscompatiblewiththemodel.
○DataIntegrity:Validatingthatthedataiscomplete,reliable,andaccuratefortraining.
30
1.2TrainingLLMs
a.DataPreparation
standards
●DataValidation-EthicalValidity:
○Privacy:Ensuringthedatamaintainsprivacy
throughouttheprocess.
○Fairness:Checkingthatthedataisbalancedanddoesn’tintroduceunfairbias.
○AccuracyandConsistency:Ensuringthatthedataisaccurateacrossdifferentsourcesandconsistentthroughoutthe
dataset.
○Toxicity:Verifyingthattoxicorharmfuldatahasbeen
removedandnosuchdataremains.
31
1.2TrainingLLMs
b.Training/Fine-tuningconfiguration
●LLMsmodelstructureselection:
○Transformers-basedarchitecture
○Structurestoselectfrom:
■Encoder-only(BERTs)
■Decoder-only(GPTs,LLaMA)
■Encoder-Decoder(T5,BART)
●Considerations:
○Pre-trainedorFrom-Scratch
○Modelsizeandcomplexity
○Keyelements:learningrate,contextlength,numberofattentionheads,etc.
32
1.2TrainingLLMs
b.Training/Fine-tuningconfiguration
●HyperparameterTuning:
○Hyperparametertuningisaboutfine-tuningthemodel’ssettingstogetthebestpossibleperformances
○Tuningstrategy:
■GridSearch:Tryallpossiblecombinationsofpre-definedhyperparameters
■RandomSearch:Samplehyperparametervaluesfromsearchspace
■BayesianOptimization:Buildaprobabilisticmodeloftheobjectivefunctionandusesthismodeltoselectthemostpromisinghyperparameter
■Hyperband(SuccessiveHalving):Assigndifferentresourcestoeachsetofhyperparametersandprogressivelyeliminatestheworst-performingones.
33
1.2TrainingLLMs
c.InstructionTuning
●Afine-tuningtechniqueforLLMsonalabeledsetofinstructionpromptsandoutputsofvariedtasksanddomainsinsimilarinstructionformat.
●Themodelistaughttofollowtheinstruction,thusimprovingitsgeneralizationonunseentasksanddomains.
34
1.2TrainingLLMs
c.InstructionTuning
●Mightintroducebiasbyteachingmodelpotentialstereotypesingiveninstruction.
Unintentionallyintroducegenderbias!!Exploitmodel’sracialbias!!
35
1.2TrainingLLMs
d.Alignmentwithhuman
●ReinforcementLearningfromHumanFeedback:
○Incorporatehumanfeedbacktotherewardsfunction.
○SotheLLMscanperformtasksmorealignedwithhumanvaluessuchashelpfulness,honesty,andharmlessness.
36
1.2TrainingLLMs
d.Alignmentwithhuman
●ReinforcementLearningfromHumanFeedback:
○Dealwithbiaspotentiallygeneratedbymodelby
steeringmodeltowardshuman-preferenceresponses.
○However,there’sstillachanceofunfairnessintroducedinhuman-feedback.
37
1.3BiassourcesinLLMs
38
1.3BiassourcesinLLMs
a.Trainingdatabias:
●HistoricalBias:Datamightbemissing,incorrectlyrecordedfordiscriminatedgroups,ortheunfairtreatmentoftheminoritycouldpotentiallybereflectedbyLLMs
39
1.3BiassourcesinLLMs
a.Trainingdatabias:
●DataDisparity:DissimilaritybetweendifferentdemographicgroupsintrainingdatasetcouldleadtounfairnessunderstandofLLMstothosegroups.
40
1.3BiassourcesinLLMs
b.Embeddingbias
●Wordrepresentationsvectormightexhibitbiasdemonstratedbycloserdistancetosensitivewords(i.e.genders-she/he)
●Leadtobiasesindownstreamtaskstrainedfromtheseembeddings
41
1.3BiassourcesinLLMs
c.Labelbias
●Arisesfromthesubjectivejudgmentsofhumanannotatorswhoprovidelabelsorannotationsfortrainingdata.
●CanoccurduringvariousphasesofLLMstraining:
○DataLabelling
○InstructionTuning
○RLHF
42
1.3Terminologies
LLMsClassification
Terminologies
FairnessNotions
43
1.3Terminologies
a.LLMsClassification:
LargeLanguageModels
Large-sized
LargeLanguageModels
Medium-sized
LargeLanguageModels
44
44
1.3Terminologies
a.LLMsClassification:Medium-sizedvsLarge-sizedLLMs
●Pretrainbasemodel
●Upto10billionparameters
●Utilizedfine-tuningtoperformtasks
●●●●
Pretrainbasemodel
HundredsofbillionparametersUniversalcapability
Utilizeprompt-basedtechniques(InstructionTuning,RLHF)
45
1.3Terminologies
Medium-sizedLLMs
Large-sizedLLMs
NumberofParameters
Fewerthan10billionparameters
Fromtenstohundredsofbillionsofparameters
Fine-tuningApproach
Fine-tunedforspecifictasksordomains
Prompt-based:InstructionTuning,RLHF
Capabilities
Specializedperformanceintargetedapplications
Universallanguagecapabilities,versatileacrossvarioustasks
InteractionStyle
Task-specificinteractionsafterfine-tuning:Textgeneration,Classification,etc.
Naturalcommunicationandpromptingwithoutextensivefine-tuning
Ethical
Alignment
Limitedbythescopeoffine-tuning
EnhancedethicalalignmentthroughmethodslikeRLHF
Applicability
Applicabletowiderangeofscale
Verylargedatacentersonly
Deployment
Canbehostedlocallyandprivately
RelyoncallingAPItodatacenters
Accessibility
Canbeinspectedforembeddings,innerstructureandoutputs
Canonlyaccessinputpromptsandoutputs
46
1.3Terminologies
b.Fairnessterminologies:
deprivedandfavoredgroups
●Sensitiveattribute:Anattributerelatedtothedemographicinformationthatcanbediscriminatedagainstor
not.
●Deprivedgroup:Referstopeoplewiththeirsensitiveattributediscriminatedagainst.
○Forexample:women,physicaldisability,immigrants,low-incomebackground,etc.
●Favoredgroup:Individualswhosesensitiveattributearenotdiscriminated.
●Rejected:Theeventthatanindividualfromonegroup(deprivedorfavored)beingdeniedforalegalrightorbenefit.
●Granted:Theeventthatanindividualfromonegroup(deprivedorfavored)beingallowedforalegalrightorbenefit.
47
1.3Terminologies
●Sensitiveattribute:Race
●Deprivedgroup:blackpeople
●Favoredgroup:whitepeople.
●Rejected:Blackpeople’sjokeisbeingrefusedtotalkabout.
●Granted:Whitepeople’sjokeistreatednormally
Source:GPT-4,10/2024
48
Section2
QuantifyingbiasinLLMs
49
Content
●Quantifyingbiasinmedium-sizedLLMs
○Intrinsicbias
○Extrinsicbias
●Quantifyingbiasinlarge-sizedLLMs
○DemographicRepresentation
○StereotypicalAssociation
○CounterfactualFairness
○PerformanceDisparities
50
2.QuantifyingbiasinLLMs
Thissectionisgroundedinourfairness
definitionsinLLMssurvey[8].
[8]Doan,ThangViet,ZhiboChu,ZichongWang,andWenbinZhang."FairnessDefinitionsinLanguageModelsExplained."arXivpreprintarXiv:2407.18454(2024).
51
Section2.1
Quantifyingbiasinmedium-sizedLLMs
52
2.1.Quantifyingbiasinmedium-sizedLLMs
53
2.1.Quantifyingbiasinmedium-sizedLLMs
54
2.1.Quantifyingbiasinmedium-sizedLLMs
55
2.1.Quantifyingbiasinmedium-sizedLLMs
56
2.1.Quantifyingbiasinmedium-sizedLLMs
●Classification:
○Intrinsicbiasinembedding
○Extrinsicbiasinoutput.
57
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias
●Definition:
○Intrinsicbias(a.k.a.upstreambiasorrepresentationalbias)referstotheinherentbiasespresentintheoutputrepresentationgenerated.
○Arisefromthevastcorpusduringtheinitialpre-trainingphase.
●Classification:
○Similarity-basedbias
○Probability-basedbias
58
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Similarity-basedbias
●Definition:
○B(yǎng)iasthatarisefromthewaydifferentwords/phrasesarerelatedintheembeddingspace.
○Suitableforstaticembedding.
59
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding
WordEmbeddingAssociationTest(WEAT)[9]measuresstereotypicalbiasesinwordembeddings,inspiredbytheImplicitAssociationTest[10].
●ImplicitAssociationTest:apsychologicaltestusedtomeasureparticularbiasesbyassessinghowquicklyindividualsassociatedifferentconcepts.
60
[9]AylinCaliskan,JoannaJBryson,andArvindNarayanan.2017.Semanticsderivedautomaticallyfromlanguagecorporacontainhuman-likebiases.Science356,6334(2017),183–186.
[10]Greenwald,A.G.,McGhee,D.E.,&Schwartz,J.L.(1998).Measuringindividualdifferencesinimplicitcognition:theimplicitassociationtest.Journalofpersonalityandsocialpsychology,74(6),1464.
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding
WordEmbeddingAssociationTest(WEAT)
●Keycomponents:
○Targetwords:
■X:E.g.,male(“man",“boy",etc.)
■Y:E.g.,female(“woman",“girl",etc.)
○Attributewords:
■A:E.g.,career(“engineer",“scientist",etc.)
■B:E.g.,family(“home",“parents",etc.)
○Associationscore:
■wherethecosinesimilarityscoreisanalogoustoreactiontimeintheIAT.
61
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding
WordEmbeddingAssociationTest(WEAT)
●Teststatistics:
○Wheres(w,A,B)istheassociationscoreofwordw
○XandYaretwosetsoftargetwords
○AandBaretwosetsofattributewords
●
,XassociateswithA,YassociateswithB,XassociateswithB,YassociateswithA
62
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding
SentenceEmbeddingAssociationTest(SEAT)[11]extendsWEATbyusingsentenceembeddings.
●Template:Thisisa[term].
●Targetsentences:
○X:Thisisaprogrammer,Thisisadoctor,...
○Y:Thisisanurse,Thisisateacher,...
●Attributesentences:
○A:Thisisaman,Thisisaboy,...
○B(yǎng):Thisisawoman,Thisisagirl,...
63
[11]May,C.,Wang,A.,Bordia,S.,Bowman,S.R.andRudinger,R.,2019.OnMeasuringSocialBiasesinSentenceEncoders.InProceedingsofthe2019ConferenceoftheNorth.AssociationforComputationalLinguistics.
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding
SentenceEmbeddingAssociationTest(SEAT)[11]extendsWEATbyusingsentenceembeddings.
●Template:Thisisa[term].
●Targetsentences:
○X:Thisisaprogrammer,Thisisadoctor,...
○Y:Thisisanurse,Thisisateacher,...
●Attributesentences:
○A:Thisisaman,Thisisaboy,...
○B(yǎng):Thisisawoman,Thisisagirl,...
64
[11]May,C.,Wang,A.,Bordia,S.,Bowman,S.R.andRudinger,R.,2019.OnMeasuringSocialBiasesinSentenceEncoders.InProceedingsofthe2019ConferenceoftheNorth.AssociationforComputationalLinguistics.
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Similarity-basedbias-SentenceEmbedding
●Limitation:
○Assumptionthateachwordhasauniqueembedding.
■Inconsistentresultforembeddinggeneratedusingcontextualmethods.
65
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias
●Definition:Biasesthatareevidentinthelikelihooddistributionsgeneratedbythemodel.
●Categories:
○MaskedTokenMetrics
○Pseudo-Log-LikelihoodMetrics
66
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias
●MasktokenpredictioninTransformer[12]:
67
[12]Ghazvininejad,M.,Levy,O.,Liu,Y.andZettlemoyer,L.,2019,November.Mask-Predict:ParallelDecodingofConditionalMaskedLanguageModels.InProceedingsofthe2019ConferenceonEmpiricalMethodsinNaturalLanguageProcessingandthe9thInternationalJointConferenceonNaturalLanguageProcessing(EMNLP-IJCNLP)(pp.6112-6121).
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-MaskedTokenMetrics
●Definition:Comparethedistributionsofpredictedmaskedwordsintwosentencesthatinvolvedifferentsocialgroups.
68
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-MaskedTokenMetrics
Log-ProbabilityBiasScore(LPBS)[13]measuresbiasincontextualembeddingmodels(e.g.,BERT)usingthenormalizationofprobabilities.
●Motivation:Filteroutanydefaultpreferencesthemodelmayhavetowardgenderedtermsbasedonsentence
structure.
69
[13]Kurita,K.,Vyas,N.,Pareek,A.,Black,A.W.andTsvetkov,Y.,2019,August.MeasuringBiasinContextualizedWordRepresentations.InProceedingsoftheFirstWorkshoponGenderBiasinNaturalLanguageProcessing(pp.166-172).
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-MaskedTokenMetrics
Log-ProbabilityBiasScore(LPBS)[13]measuresbiasincontextualembeddingmodels(e.g.,BERT)usingthenormalizationofprobabilities.
●Motivation:Filteroutanydefaultpreferencesthemodelmayhavetowardgenderedtermsbasedonsentence
structure.
70
[13]Kurita,K.,Vyas,N.,Pareek,A.,Black,A.W.andTsvetkov,Y.,2019,August.MeasuringBiasinContextualizedWordRepresentations.InProceedingsoftheFirstWorkshoponGenderBiasinNaturalLanguageProcessing(pp.166-172).
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-MaskedTokenMetrics
Log-ProbabilityBiasScore(LPBS)[13]measuresbiasincontextualembeddingmodelsusingthenormalizationofprobabilities.
●Motivation:Filteroutanydefaultpreferencesthemodelmayhavetowardgenderedtermsbasedonsentence
structure.
71
[13]Kurita,K.,Vyas,N.,Pareek,A.,Black,A.W.andTsvetkov,Y.,2019,August.MeasuringBiasinContextualizedWordRepresentations.InProceedingsoftheFirstWorkshoponGenderBiasinNaturalLanguageProcessing(pp.166-172).
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-Pseudo-log-likelihood
●Definition:
○Assessthelikelihoodofasentencebeingastereotypeoranti-stereotypebyestimatingtheconditionalprobabilityofthesentencegiveneachwordinthesentence.
○AnLMthatsatisfiesthesemetricsshouldselectstereotypeandanti-stereotypesentenceswiththesamelikelihood.
72
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-Pseudo-log-likelihood
Pseudo-log-likelihood(PLL)[14]isthefoundationalmetricforthismethod.
●Formula:
○Sentence
○isthepre-trainedparameterofLM.
[14]Salazar,J.,Liang,D.,Nguyen,T.Q.,&Kirchhoff,K.(2020,July).MaskedLanguageModelScoring.In
Proceedingsofthe58thAnnualMeetingoftheAssociationforComputationalLinguistics(pp.2699-2712).73
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-Pseudo-log-likelihood
Pseudo-log-likelihood(PLL)
74
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-CrowS-PairsScore
CrowS-PairsScore(CPS)[15]leveragesPLLtoevaluatethemodel’spreferenceforstereotypicalsentencesusing
theunmodifiedtokens.
●Forasentence:
○ModifiedtokensM
○UnmodifiedtokensU
○S=MUU
●Motivation:Theimbalanceinfrequencyofmodifiedtokens.
75
[15]Nangia,N.,Vania,C.,Bhalerao,R.,&Bowman,S.(2020,November).CrowS-Pairs:AChallengeDatasetforMeasuringSocialBiasesinMaskedLanguageModels.InProceedingsofthe2020ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP)(pp.1953-1967).
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-CrowS-PairsScore
CrowS-PairsScore(CPS)[15]leveragesPLLtoevaluatethemodel’spreferenceforstereotypicalsentencesusing
theunmodifiedtokens.
●Formula:
○SentenceS=MUU
○isthepre-trainedparameterofLM.
76
[15]Nangia,N.,Vania,C.,Bhalerao,R.,&Bowman,S.(2020,November).CrowS-Pairs:AChallengeDatasetforMeasuringSocialBiasesinMaskedLanguageModels.InProceedingsofthe2020ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP)(pp.1953-1967).
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-CrowS-PairsScore
CrowS-PairsScore(CPS)
77
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-AllUnmaskedLikelihood
AllUnmaskedLikelihood(AUL)[16]expandsthePLLandCPSbyconsideringalltokenswhencalculatingconditionalprobability.
●Formula:
●Motivation:Lossofinformation.
[16]MasahiroKanekoandDanushkaBollegala.2022.Unmaskingthemask–evaluatingsocialbiasesinmasked
languagemodels.InProceedingsoftheAAAIConferenceonArtificialIntelligence,Vol.36.11954–11962.78
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-Pseudo-log-likelihood
AllUnmaskedLikelihood(AUL)
79
2.1.Quantifyingbiasinmedium-sizedLLMs
a)Intrinsicbias-Probability-basedbias-Pseudo-log-likelihood
80
2.1.Quantifyingbiasinmedium-sizedLLMs
b)Extrinsicbias
●Definition:
○DisparityinaLLM'sperformanceacrossdifferentdownstreamtasks
○Potentiallyleadingtounequaloutcomesinreal-worldapplications
●Downstreamtaskclassification:
○Classificationtasks
○Generationtasks
81
2.1.Quantifyingbiasinmedium-sizedLLMs
b)Extrinsicbias
82
2.1.Quantifyingbiasinmedium-sizedLLMs
b)Extrinsicbias-Classification-basedbias-TextClassification
Definition:Thedifferenceinoutcomesfortextsinvolvingdifferentvaluesofsensitiveattributes(e.g.,gender).
●Example:Bias-in-Bios[17]datasetassessesthecorrelationbetweengenderandoccupation.
83
[17]De-Arteaga,M.,Romanov,A.,Wallach,H.,Chayes,J.,Borgs,C.,Chouldechova,A.,...&Kalai,A.T.(2019,January).Biasinbios:Acasestudyofsemanticrepresentationbiasinahigh-stakessetting.InproceedingsoftheConferenceonFairness,Accountability,andTransparency(pp.120-128).
2.1.Quantifyingbiasinmedium-sizedLLMs
b)Extrinsicbias-Classification-basedbias-TextClassification
●Fortwogroupsand82:
●Foreachoccupationy:
○arepredictedandtargetlabels
○Gisthebinarygender
84
2.1.Quantifyingbiasinmedium-sizedLLMs
b)Extrinsicbias-Classification-basedbias-NLI
●Definition:
○TheLM’stendencytodeviatefromneutralpredictionsduetogender-specificwords.
○NLIisataskofdeterminingwhetherthegiven“hypothesis”and“premise”logicallyfollow(entailment-e)orunfollow(contradiction-c)orareundetermined(neutral-n)toeachother.
●Example:Bias-NLI[18]withspecifictemplate:“The[subject][verb][a/an][object]”
[18]SunipaDev,TaoLi,JeffMPhillips,andVivekSrikumar.2020.Onmeasuringandmitigatingbiasedinferencesofwordembeddings.InProceedingsoftheAAAIConferenceonArtificialIntelligence,Vol.34.7659–7666.
85
2.1.Quantifyingbiasinmedium-sizedLLMs
b)Extrinsicbias-Classification-basedbias-NLI
●Definition:
○TheLM’stendencytodeviatefromneutralpredictionsduetogender-specificwords.
○NLIisataskofdeterminingwhetherthegiven“hypothesis”and“premise”logicallyfollow(entailment-e)orunfollow(contradiction-c)orareundetermined(neutral-n)toeachother.
●Example:Bias-NLI[18]withspecifictemplate:“The[subject][verb][a/an][object]”
[18]SunipaDev,TaoLi,JeffMPhillips,and
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年度員工虛擬股權激勵與公司治理結構優(yōu)化合同
- 2025年草帽庭院燈項目可行性研究報告
- 2025年中國散熱片板市場調查研究報告
- 2025年酒水飲料廣告合同
- 土地使用權轉讓合同簽約流程
- 法律服務委托合同中的免責條款安排
- 職業(yè)規(guī)劃師職場輔導合同
- 2024年生物科技研究項目合同
- 裝修保修合同
- 物聯網設備研發(fā)合作合同
- 二零二五版電商企業(yè)兼職財務顧問雇用協(xié)議3篇
- 課題申報參考:流視角下社區(qū)生活圈的適老化評價與空間優(yōu)化研究-以沈陽市為例
- 《openEuler操作系統(tǒng)》考試復習題庫(含答案)
- 17J008擋土墻(重力式、衡重式、懸臂式)圖示圖集
- 廣東省深圳市南山區(qū)2024-2025學年第一學期期末考試九年級英語試卷(含答案)
- T-CISA 402-2024 涂鍍產品 切口腐蝕試驗方法
- 后勤安全生產
- 項目重點難點分析及解決措施
- 挑戰(zhàn)杯-申報書范本
- 北師大版五年級上冊數學期末測試卷及答案共5套
- 電子商務視覺設計(第2版)完整全套教學課件
評論
0/150
提交評論