![2024如何衡量生成式人工智能(GenAI)對軟件編碼和單元測試的影響研究報告_第1頁](http://file4.renrendoc.com/view10/M01/1E/26/wKhkGWejUG-AEZ3HAADCbUbAeYA147.jpg)
![2024如何衡量生成式人工智能(GenAI)對軟件編碼和單元測試的影響研究報告_第2頁](http://file4.renrendoc.com/view10/M01/1E/26/wKhkGWejUG-AEZ3HAADCbUbAeYA1472.jpg)
![2024如何衡量生成式人工智能(GenAI)對軟件編碼和單元測試的影響研究報告_第3頁](http://file4.renrendoc.com/view10/M01/1E/26/wKhkGWejUG-AEZ3HAADCbUbAeYA1473.jpg)
![2024如何衡量生成式人工智能(GenAI)對軟件編碼和單元測試的影響研究報告_第4頁](http://file4.renrendoc.com/view10/M01/1E/26/wKhkGWejUG-AEZ3HAADCbUbAeYA1474.jpg)
![2024如何衡量生成式人工智能(GenAI)對軟件編碼和單元測試的影響研究報告_第5頁](http://file4.renrendoc.com/view10/M01/1E/26/wKhkGWejUG-AEZ3HAADCbUbAeYA1475.jpg)
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)
文檔簡介
HowtomeasuretheimpactofGenAIonsoftwarecodingandunittesting?
STNETNOC
01
02
03
04
05
10
11
12
Introduction
HowwillGenAIimpactsoftwareprogramming?
Whyismeasurementimportant?
ChallengesinmeasuringGenAIimpact
Establishingameasurementprotocol
Result:Keyinsightsfromrealsize
experimentationswithmeasurement
Conclusion
Abouttheauthors
Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
AsGenerativeAI(GenAI)continuestotouchalmosteveryaspectofourdailylives,itsimpacton
softwaredesign,coding,andunittestingisboth
inevitableandexciting.Butwhatarethoseexpectedimpacts?Howcanyoumeasurethem?Howcanyoudefineaprovenmeasurementprotocol?Andwhat
dorealsizeexperimentationsreveal?
GenAIcanassistsoftwareengineerswhentheyusedesignandcodingto
transformuserstoriesintosoftware.Itcanbeharnessedtocreatedesignoutputssuchasuserinterfacemockups,entitymodels,andinterfaces.Thiscanleadtoasignificantproductivityimprovement,withoutcompromisingquality.Butthefullbenefitscanonlybefeltifameasurementprocessis
implementedbyexperts.
1Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
HowwillGenAI
impactsoftware
programming?
Firstofall,whatimpactwillGenAIhaveon
softwareprogramming,accordingtobusinesses
andorganizations?OurlatestCapgeminiResearch
Institute1reportshowsthat61%oforganizationsseeenablingmoreinnovativework,suchasdevelopingnewsoftwarefeaturesandservices,astheleadingbenefitGenAI.Closebehindareimprovingsoftwarequality
(49%)andincreasingproductivity(40%).Organizationsareutilizingtheseproductivitygainsoninnovativeworksuchasdevelopingnewsoftwarefeatures(50%)and
upskilling(47%).Veryfewaimtoreduceheadcount(4%).2
GenAIispoisedtoredefineconventionalprogrammingpracticesbyshiftingthefocusfromcodingtopromptengineeringandcodeproofreading.Thisisexpressed
perfectlybyAndrejKarpathy,anOpenAIcomputerscientist,whorecentlysaid:“thehottestnew
programminglanguageisEnglish3.”
Whatdoesthismeaninpractice?Asanexample,
softwareengineerscanuseplainlanguagetodescribetheintendedfunctionalityofasoftwarefeature,
thenreview,update,andvalidatethegenerated
output.Therearemanyotherexamples,suchasauto-completionofcode,generatingcodeforunittesting,
(retro)documentation,andcodemigrationfromone
languagetoanother.Ofcourse,GenAIisalreadyvaluedbydevelopersbecauseitsupportsthemduringcoding.Itcaneithersuggestcleancodedirectlyorevaluate
existingcodetoimprovesoftwarequalityifitidentifiesissues.
Softwarequalitycanbetracedbacktotheearlytestingphase,wheretheunittestcasesand/orrelatedtest
datasetsfailtoincludethefullspectrumofpossibleuserinputsandscenarios.GenAIcanassistdevelopersinwritingmorecompleteunittestcases,inwhich
userstoriesprovidepromptengineeringcontextfor
maximumrelevance.Itcangenerateamassiveamountofsyntheticinformationthatcloselyresemblesreal
worlddatatoensurehighunittestscoverage.
AlthoughadoptionofGenAIforsoftwareengineeringisstillinitsearlystageswith9in10organizationsyet
toscale,27%oforganizationsarerunningGenAIreal
sizeexperimentations,and11%havestartedleveragingGenAIintheirsoftwarefunction.GenAIisexpected
toplayakeyroleinaugmentingsoftwareworkforcewithbetterexperience,toolsandplatforms,and
governance,assistinginmorethan25%ofsoftwaredesign,development,andtestingworkby2026.4
[1]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024
[2]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024
[3]
/karpathy/status/1617979122625712128?lang=en-GB
[4]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024
2Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
Whyis
measurementimportant?
Inthefast-pacedandever-evolvinglandscapeof
moderntechnology,makinginformeddecisionsis
crucialtosuccess.However,extractingmeaningful
insightscanbeadauntingtaskinaworldinundated
withdata.Therefore,establishingameasurement
frameworkisessential.Itservesasanavigationalaidinavastseaofinformation,guidingteamsfromrawdatatoactionabledecisions.
MeasuringtheperformanceofGenAIensuresthatit
meetsthedesiredobjectives,whetherthat’simprovingefficiency,enhancingaccuracy,orreducingcosts.It
alsohelpsidentifyareasforimprovement,guiding
furtherdevelopmentandoptimization.Anditprovidesaccountability,demonstratingthevalueandreturnoninvestment(ROI)tostakeholders.
Atthesametime,measurementallowsustoquantifyattributes,whichenablesustocompare,analyze,andunderstandthingsmoreeffectively.Italsoallowsforprogresstrackingandperformanceevaluation,andprovidesdata-driveninsightsthatinformdecision-
makingprocesses.
3Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
Challengesin
measuring
GenAIimpact
“Whatgetsmeasured,getsmanaged”mightbean
oldsayingbutitringstrueinthenewparadigmsin
whichGenAIisunfolding.It’sallwellandgoodto
implement,butmeasurementiscrucial.However,
measuringproductivityisinherentlycomplexdueto
themultifacetednatureofdevelopmentworkaspartofthesoftwaredevelopmentlifecycle(SDLC),the
dynamicandevolvingenvironmentinwhichitoccurs,anditsinherentsubjectivityandintangibility.Effectivemeasurementrequiresaholisticapproachthat
considersqualitativeandquantitativefactors,includingcontext-specificconsiderations.
Measuringsoftwarequalityisachallengeasitencompassesmultipledimensions,including
functionality,performance,reliability,usability,
maintainability,security,andscalability.Assessing
qualityrequiresconsideringthesediverseaspects,
eachwithitsownsetofmetricsandcriteria.Anotherchallengeisthatthedifferentstakeholdershave
differentpriorities,whetherthey’recustomers,businesses,architects,developers,testers,orinoperations.
Feedbackfromthesoftwareengineerswhowill
beusingGenAIonadailybasisalsoneedstobe
considered.ThisisanimportanttopicasGenAIhasanimpactonthedevelopmentenvironmentandthewaytheywork.
Nearlynineinten(86%)oflargeorganizations,with
annualrevenuegreaterthan$50billion,haveadopted(piloted/scaled)GenAIascomparedto23%oftheirsmallercounterparts,withanannualrevenuebetween$1-5billion.5
[5]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024
4Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
Establishingameasurementprotocol
Nowlet’sfocusonhowtodefineandimplementa
practicalmeasurementprotocoltogetaclearviewontheimpactofGenAIincoding,andunittestingaspartofbespokeapplicationdevelopment.
Almosthalfoforganizationsinoursurvey(48%)havenodefinedmetricstogaugethesuccessofGenAIusein
softwareengineering.Wealsofoundthatthereseemstobenostandardwayofmeasuringproductivity.6
Oursurveyrevealsanimportantfactaboutcommonlyusedmetrics.Whilethey’resuitableforregular
productivitymeasures,suchastimetodeployorto
resolveissues,theydonotfullycapturethebenefits
ofGenAI.Especiallyonnon-conventionalmeasures
ofproductivity,suchasemployeesatisfaction.These
arebettercapturedbymetricsframeworkslikeDORAandSPACE.7However,DORAandSPACEareyetto
gaintraction,astheyarecostlyandtime-consuming
toimplement.Thisfindingindicatesthatasetof
metricsincludingKPIsforvelocity,quality,security,anddeveloperexperiencecanproveuseful.8
[6]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024
[7]DORA–throughsuchmetricsasleadtimeforchanges,deploymentfrequency,meantimetorecoveryandchangefailurerate–measureshowwellanorganizationbalancesspeedandstability.TheSPACEsetofmetricstriestocomprehensivelyassessteamdynamicsanddeveloperexperience.Itbalancestheassessmentoftechnicaloutputwiththewellbeingofdevelopers,whichtraditionalmetricsfailtodo.
[8]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024
5Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
MostorganizationsshowimprovementfromuseofGenAIwhenmeasuredusinglesspopular,butmoreholistic,productivitymetricsframeworks,suchasSPACEandDevOpsresearchandassessment(DORA):
Metricspredominantlyusedvs.MetricsimpactedthroughuseofGenerativeAI
Sprintandreleaseburndown/teamvelocity
Releasecycle/deploymenttimeChangefailurerate
PullrequestresolutiontimeCodecommitfrequency
DORA(DevOpsResearchandAssessment)...
32%
71%
NumberofuserstorypointscompletedCodechurn
SPACEmetrics
21%
30%
19%
16%
72%73%
0%20%40%60%80%
Keymetricsusedinyourorganization
MetricsthathaveshownapositiveimpactduetoGenAIusage
ma42%
61%
mm41%
mm39%
51%52%
62%
38%
37%
69%
Themeasurementprotocoloffersawell-definedprocessthatcreatescomprehensible,comparableandreliableresults.
6Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
Components–Whatdoesthemeasurementprotocolconsistof?
?Teamsorganization:Organizeteamsforsignificantandactionableresults,usingdifferentpatternssuchasparallelteams,shadowteams,ormulti-pyramidteams
?Measurementapproach:Establishthetimelineandtheprocessforconductingthemeasurement,includingthepreparation,thebaseline,and
theexecution.
?Measurementmetrics:IdentifythemetricsformeasuringtheimpactofGenAIonsoftware
engineering,suchascodingvelocity,codequality,codesecurity,anddeveloperexperience.
?Measurementtooling:Agreethetoolsforcollectingandanalyzingthemetrics,suchasSonarQube,CAST,Jira,ordevelopersurveys.
?Measurementreporting:Selectthetemplatesandformatsforpresentingandcommunicatingthemeasurementresults,bothindetailedandexecutivelevels.
?Prerequisitesandsuccessfactors:Putinplace
theconditionsandfactorsthatneedtobemetandconsideredforasolidandconsistentmeasurement,suchasteamstability,duration,backlog,technology,tooling,legal,andcybersecurity.
?Normalizationprocess:Definehowtomanage
instabilityandvariabilityduringtheexperimentationexecution,suchaschangesinteamsize,
capacity,orcomplexity,andhowtoadjustthemetricsaccordingly.
?Qualitativefeedback:Createamechanismto
harnessinformationontheholisticexperiences
ofdevelopersintheformofsurveysorverbatimreports.Beawarethatnegativeexperiencesoftenprovidethebestlearnings.
Metrics-Whatdatawillbeevaluatedduringtherealsizeexperimentations?
Besidesestablishingandenablingthecomponentsit’sessentialtodefinethemetricsthatwillbeevaluated
duringtherealsizeexperimentations.Thesearethe
fundamentaldatapointstobemeasuredandanalyzed.
?Velocity:Thisshouldbemeasuredondifferentlevels,asit’sthemostimportantmetric.
.Codingvelocity:Thisisthekeyindicatorofa
team’sproductivity(codingandunittesting)andistypicallymeasuredintermsofimplementedstorypoints.
.Codingvelocityperdevelopercapability/
seniority:Calculatedasthetimetakento
complete“X”storypointswithandwithoutGenAIassistanceasperdevelopercapability(good,average,belowaverage)
.Codingvelocitypercomplexity:Calculatedas
userstorypointsrequiredwithandwithoutGenAIassistanceasperstorycomplexity(simple,
medium,andcomplex)
?Unittestingcoverage:Essentialmetricforassessingthequalityandreliabilityofsoftware.Tokeepit
simplewefocusoninstructioncoverage(C0)asthisismeasuredbymostofthetools.
?Codeefficiency:Measuresthepotential
performanceandscalabilitybottlenecksinsoftware.Tokeepitsimplewefocusonstaticcodeanalysisandnotruntimeanalysis(forexample,withprofilers).
Thisisnotanindustrystandard,butametricourclientsfindinvaluable.
?Codesecurity:Determinestheriskofvulnerabilityissuesandprobabilityofbreachesforanapplication.
?Codesmells:Referstoindicatorsofpoororproblematiccodethatmayrequireattentionorrefactoring.
?Codeduplication:Highlightsthepresenceof
identicalorsimilarcodefragmentsindifferentpartsofacodebase.WithGenAIit’smorelikelytocreatecodeduplicates.
7Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
Teamorganization–Whatworksbest?
Executingthemeasurementprotocolrequiresanengagementwithaproperuserstorybacklog.
Buthowmanyteamsshouldbeinvolvedinthe
measurementprocess?Inourview,it’salwaysbettertohavemultipleteamstominimizethehumanfactor.
?SingleTeam:Sequentialexecutionofbacklogofsameuserstorysize/complexitywithandwithoutGenAIassistancebyoneteam.
?MultipleTeams:ParallelexecutionofsamebacklogwithandwithoutGenAIassistancebyatleasttwoteams.
Teamsetup-Whatistheoptimalmix?
Theseniorityorcapabilitiesofateamareimportantfornormalization.Therefore,it’smandatorytoknowwhatkindofteammixisworkingonthedefinedbacklog.
?Seniorpyramidofteammembers:Highlyskilledandcapableteam.Noneedforcoaching,mentoring,or
detailedcodereviews.Inanidealworld,thisisthegoldstandard.
?Well-balancedpyramidofteammembers:Good
mixofseniorsandjuniors.Coaching,mentoring,andreviewingareundertakenbyseniorteammembersinparalleltodailywork.
?Juniorpyramidofteammembers:Amajority
ofjuniors.Thisdemandsafocusoncoaching,
mentoring,andreviewing,astherearejustafewseniorteammembers.
SingleTeam
ExistingprojectteamsupportedbyPilotcoreteam
SequentialexecutionofrequirementsofsamesizecomplexitywithandwithoutAl.
TOPilotBaselineReportReportReportReportReport
SameTeam
(WithCopilot)
SameTeam
(Withaddt'lTool)
SameTeam
(Withaddt'lTool)
ExistingTeam
(w/oAI)
6weeks
(3sprints)
Legal&LicensesClearance
MultipleTeamsApproach
ParallelexecutionofsamerequirementswithandwithoutAlAugmentation
TOPilotBaselineContinuousReporting
SameTeam(addt'lTool)
SameTeam(addt'l)
SameTeam(WithCopilot)
ExistingTeam(w/oAI).
Team1(Balanced-w/oAl)Team1(Balanced-withCopilot)Team1(Balanced-withaddt'|Tool)
Legal&LicensesClearance
Team2(Freshers-withCopilot)Team2(withaddt'lTool)SameTeam(addt'l)
8Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
Process–Whatarethekeyconsiderations?
Onceallthecomponentshavebeendefined,aprocess
asthecodingvelocity,thecodequality,thecode
isneededtoensurehighqualityresultsandtoreduce
security,andthedeveloperexperience,withoutGen
sideeffectsduetoestimationinaccuracyandthe
AIassistance.
humanfactor.
?
ExecutesprintsandreleaseswithGenAIassistance,
?Createtheteamsorganizationandthe
usingtheselectedGenAItoolsandfollowingthe
experimentationscope,includingtheusecases,the
bestpracticesandguidelines.
backlog,andthetechnologystack.
?
Collectthemetricsandthefeedbackduringand
?Definethetimelineandthemeasurementapproach,
aftertherealsizeexperimentationexecution,using
includingtheduration,thephases,andtheprocess.
themeasurementtoolsandthesurveys.
?Validatetheprerequisitesandthesuccessfactors,suchastheteamstability,thelegalclearance,the
?
Checkandnormalizethemetricsandthefeedback,usingthenormalizationprocessandtheformulas.
cybersecurityapproval,andthetoolingsetup.
?
Consolidateandreportthemeasurement
?Conductabaselineaudittounderstandthecurrent
results,usingthetemplatesandtheformats,and
situationofthesoftwareengineeringmetrics,such
highlightingthekeyinsightsandfindings.
Timeline
.>
?De?neteamsorganizationandpilotscope
?De?netimeline
?Validatepre-requisitesvalidation(tool
procurement,legalandsecurityapprovals)
?Prepareteamexecution(toolonboarding,
demos,toolset)
PreparationMeasurementExecution
Calculateproductivity,quality&securitymetrics
Conductabaseline(Productivity,Quality,Securitymetricsand
Developerfeedback)
RunCodeQualityMetricsTools(e.g.CAST,SonarQube)
Executereleasesandsprints
Collectmetricsandfeedbacks
CalculateTeamVelocity(e.g,fromAgileBoard,Jira)
SampleReport
Checkandnormalizemetricsandfeedbacks
ConsolidatedareportForeachrelease(intrarelease,cross-re-
lease,acrossteams)
[9]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024
9Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
Result:Keyinsightsfromrealsize
experimentationswithmeasurement
AtCapgeminiweranseveralrealsizeexperimentations,
bothinternallyandexternallytogetherwith
clients.Theresultsarepositiveandencouraging,assummarizedhere.
?Developersloveit:Thequalitativefeedbackfromteammembershasbeenconsistentlypositive.Theylovethenewtoolsandthedifferentwayofworking.Asstatedinourreport,generativeAItoolscanhelpjuniorprofessionalslearnfasterandcomeupto
speedquickly,whiletheyallowseniorprofessionalstofocusongroomingjuniorsbyensuringtheir
learningandretention,solvingcomplexissues,andcollaboratingwithbusiness.
?Velocitygoesup:Eachexperimenthasitsown
contextandspecifics,butoverall,weseea10-30%
improvementincodingandunittestingaspartofthesoftwaredevelopmentlifecycle.
?Benefitsforall:Figuresclearlyshowthatjuniorpyramidteamsbenefitmost,butseniorpyramidteamsalsobenefitmeasurably.
?Codequalityiskey:Weseenodegradationincodequalityasmeasuredbystaticcodeanalysisand
manualcodereviews.
?Documentationavailable:Asthe‘boring’documentationtaskwastakenoverbytheco-realsizeexperimentations,therelevantmetricsimproved.
?Unittestinplace:Thegenerationofunittestsimprovedoveralltestcodecoverage,whichisanimportantmetricforfunctionalcorrectness.
[9]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024
10Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?
Conclusion
Measurement:
ThejourneyfromdatatodecisionsintherealmofGenAIhingesontherobustnessofmeasurementframeworks.Theseframeworksserveasthe
backboneofeffectiveevaluationandbenchmarking,enablingustoquantifytheimpactofAItools
ac
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 人教版數(shù)學(xué)八年級下冊《章前引言及 加權(quán)平均數(shù)》聽評課記錄1
- 人教部編版八年級道德與法治上冊:7.1《關(guān)愛他人》聽課評課記錄2
- 蘇教版小學(xué)二年級下冊數(shù)學(xué)口算題
- 七年級生物教學(xué)計劃
- 工程建設(shè)項目招標(biāo)代理合同范本
- 2025年度二零二五年度食堂檔口租賃合同與食品安全宣傳教育協(xié)議
- 農(nóng)機合作社入社協(xié)議書范本
- 二零二五年度智能駕駛技術(shù)聘用駕駛員安全合作協(xié)議書
- 2025年度船舶買賣合同中的船舶交易市場分析及預(yù)測
- 2025年度員工公寓租賃補貼協(xié)議范本
- 2025年上半年東莞望牛墩鎮(zhèn)事業(yè)單位招考(10人)易考易錯模擬試題(共500題)試卷后附參考答案
- 2025年礦山開采承包合同實施細(xì)則4篇
- 2025年度茶葉品牌加盟店加盟合同及售后服務(wù)協(xié)議
- 氧氣、乙炔工安全操作規(guī)程(3篇)
- 建筑廢棄混凝土處置和再生建材利用措施計劃
- 集裝箱知識培訓(xùn)課件
- 某縣城區(qū)地下綜合管廊建設(shè)工程項目可行性實施報告
- 《架空輸電線路導(dǎo)線舞動風(fēng)偏故障告警系統(tǒng)技術(shù)導(dǎo)則》
- 2024年計算機二級WPS考試題庫
- JJF(京) 92-2022 激光標(biāo)線儀校準(zhǔn)規(guī)范
- 廣東省廣州黃埔區(qū)2023-2024學(xué)年八年級上學(xué)期期末數(shù)學(xué)試卷(含答案)
評論
0/150
提交評論