2024如何衡量生成式人工智能(GenAI)對軟件編碼和單元測試的影響研究報告_第1頁
2024如何衡量生成式人工智能(GenAI)對軟件編碼和單元測試的影響研究報告_第2頁
2024如何衡量生成式人工智能(GenAI)對軟件編碼和單元測試的影響研究報告_第3頁
2024如何衡量生成式人工智能(GenAI)對軟件編碼和單元測試的影響研究報告_第4頁
2024如何衡量生成式人工智能(GenAI)對軟件編碼和單元測試的影響研究報告_第5頁
已閱讀5頁,還剩26頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

HowtomeasuretheimpactofGenAIonsoftwarecodingandunittesting?

STNETNOC

01

02

03

04

05

10

11

12

Introduction

HowwillGenAIimpactsoftwareprogramming?

Whyismeasurementimportant?

ChallengesinmeasuringGenAIimpact

Establishingameasurementprotocol

Result:Keyinsightsfromrealsize

experimentationswithmeasurement

Conclusion

Abouttheauthors

Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

AsGenerativeAI(GenAI)continuestotouchalmosteveryaspectofourdailylives,itsimpacton

softwaredesign,coding,andunittestingisboth

inevitableandexciting.Butwhatarethoseexpectedimpacts?Howcanyoumeasurethem?Howcanyoudefineaprovenmeasurementprotocol?Andwhat

dorealsizeexperimentationsreveal?

GenAIcanassistsoftwareengineerswhentheyusedesignandcodingto

transformuserstoriesintosoftware.Itcanbeharnessedtocreatedesignoutputssuchasuserinterfacemockups,entitymodels,andinterfaces.Thiscanleadtoasignificantproductivityimprovement,withoutcompromisingquality.Butthefullbenefitscanonlybefeltifameasurementprocessis

implementedbyexperts.

1Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

HowwillGenAI

impactsoftware

programming?

Firstofall,whatimpactwillGenAIhaveon

softwareprogramming,accordingtobusinesses

andorganizations?OurlatestCapgeminiResearch

Institute1reportshowsthat61%oforganizationsseeenablingmoreinnovativework,suchasdevelopingnewsoftwarefeaturesandservices,astheleadingbenefitGenAI.Closebehindareimprovingsoftwarequality

(49%)andincreasingproductivity(40%).Organizationsareutilizingtheseproductivitygainsoninnovativeworksuchasdevelopingnewsoftwarefeatures(50%)and

upskilling(47%).Veryfewaimtoreduceheadcount(4%).2

GenAIispoisedtoredefineconventionalprogrammingpracticesbyshiftingthefocusfromcodingtopromptengineeringandcodeproofreading.Thisisexpressed

perfectlybyAndrejKarpathy,anOpenAIcomputerscientist,whorecentlysaid:“thehottestnew

programminglanguageisEnglish3.”

Whatdoesthismeaninpractice?Asanexample,

softwareengineerscanuseplainlanguagetodescribetheintendedfunctionalityofasoftwarefeature,

thenreview,update,andvalidatethegenerated

output.Therearemanyotherexamples,suchasauto-completionofcode,generatingcodeforunittesting,

(retro)documentation,andcodemigrationfromone

languagetoanother.Ofcourse,GenAIisalreadyvaluedbydevelopersbecauseitsupportsthemduringcoding.Itcaneithersuggestcleancodedirectlyorevaluate

existingcodetoimprovesoftwarequalityifitidentifiesissues.

Softwarequalitycanbetracedbacktotheearlytestingphase,wheretheunittestcasesand/orrelatedtest

datasetsfailtoincludethefullspectrumofpossibleuserinputsandscenarios.GenAIcanassistdevelopersinwritingmorecompleteunittestcases,inwhich

userstoriesprovidepromptengineeringcontextfor

maximumrelevance.Itcangenerateamassiveamountofsyntheticinformationthatcloselyresemblesreal

worlddatatoensurehighunittestscoverage.

AlthoughadoptionofGenAIforsoftwareengineeringisstillinitsearlystageswith9in10organizationsyet

toscale,27%oforganizationsarerunningGenAIreal

sizeexperimentations,and11%havestartedleveragingGenAIintheirsoftwarefunction.GenAIisexpected

toplayakeyroleinaugmentingsoftwareworkforcewithbetterexperience,toolsandplatforms,and

governance,assistinginmorethan25%ofsoftwaredesign,development,andtestingworkby2026.4

[1]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024

[2]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024

[3]

/karpathy/status/1617979122625712128?lang=en-GB

[4]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024

2Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

Whyis

measurementimportant?

Inthefast-pacedandever-evolvinglandscapeof

moderntechnology,makinginformeddecisionsis

crucialtosuccess.However,extractingmeaningful

insightscanbeadauntingtaskinaworldinundated

withdata.Therefore,establishingameasurement

frameworkisessential.Itservesasanavigationalaidinavastseaofinformation,guidingteamsfromrawdatatoactionabledecisions.

MeasuringtheperformanceofGenAIensuresthatit

meetsthedesiredobjectives,whetherthat’simprovingefficiency,enhancingaccuracy,orreducingcosts.It

alsohelpsidentifyareasforimprovement,guiding

furtherdevelopmentandoptimization.Anditprovidesaccountability,demonstratingthevalueandreturnoninvestment(ROI)tostakeholders.

Atthesametime,measurementallowsustoquantifyattributes,whichenablesustocompare,analyze,andunderstandthingsmoreeffectively.Italsoallowsforprogresstrackingandperformanceevaluation,andprovidesdata-driveninsightsthatinformdecision-

makingprocesses.

3Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

Challengesin

measuring

GenAIimpact

“Whatgetsmeasured,getsmanaged”mightbean

oldsayingbutitringstrueinthenewparadigmsin

whichGenAIisunfolding.It’sallwellandgoodto

implement,butmeasurementiscrucial.However,

measuringproductivityisinherentlycomplexdueto

themultifacetednatureofdevelopmentworkaspartofthesoftwaredevelopmentlifecycle(SDLC),the

dynamicandevolvingenvironmentinwhichitoccurs,anditsinherentsubjectivityandintangibility.Effectivemeasurementrequiresaholisticapproachthat

considersqualitativeandquantitativefactors,includingcontext-specificconsiderations.

Measuringsoftwarequalityisachallengeasitencompassesmultipledimensions,including

functionality,performance,reliability,usability,

maintainability,security,andscalability.Assessing

qualityrequiresconsideringthesediverseaspects,

eachwithitsownsetofmetricsandcriteria.Anotherchallengeisthatthedifferentstakeholdershave

differentpriorities,whetherthey’recustomers,businesses,architects,developers,testers,orinoperations.

Feedbackfromthesoftwareengineerswhowill

beusingGenAIonadailybasisalsoneedstobe

considered.ThisisanimportanttopicasGenAIhasanimpactonthedevelopmentenvironmentandthewaytheywork.

Nearlynineinten(86%)oflargeorganizations,with

annualrevenuegreaterthan$50billion,haveadopted(piloted/scaled)GenAIascomparedto23%oftheirsmallercounterparts,withanannualrevenuebetween$1-5billion.5

[5]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024

4Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

Establishingameasurementprotocol

Nowlet’sfocusonhowtodefineandimplementa

practicalmeasurementprotocoltogetaclearviewontheimpactofGenAIincoding,andunittestingaspartofbespokeapplicationdevelopment.

Almosthalfoforganizationsinoursurvey(48%)havenodefinedmetricstogaugethesuccessofGenAIusein

softwareengineering.Wealsofoundthatthereseemstobenostandardwayofmeasuringproductivity.6

Oursurveyrevealsanimportantfactaboutcommonlyusedmetrics.Whilethey’resuitableforregular

productivitymeasures,suchastimetodeployorto

resolveissues,theydonotfullycapturethebenefits

ofGenAI.Especiallyonnon-conventionalmeasures

ofproductivity,suchasemployeesatisfaction.These

arebettercapturedbymetricsframeworkslikeDORAandSPACE.7However,DORAandSPACEareyetto

gaintraction,astheyarecostlyandtime-consuming

toimplement.Thisfindingindicatesthatasetof

metricsincludingKPIsforvelocity,quality,security,anddeveloperexperiencecanproveuseful.8

[6]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024

[7]DORA–throughsuchmetricsasleadtimeforchanges,deploymentfrequency,meantimetorecoveryandchangefailurerate–measureshowwellanorganizationbalancesspeedandstability.TheSPACEsetofmetricstriestocomprehensivelyassessteamdynamicsanddeveloperexperience.Itbalancestheassessmentoftechnicaloutputwiththewellbeingofdevelopers,whichtraditionalmetricsfailtodo.

[8]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024

5Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

MostorganizationsshowimprovementfromuseofGenAIwhenmeasuredusinglesspopular,butmoreholistic,productivitymetricsframeworks,suchasSPACEandDevOpsresearchandassessment(DORA):

Metricspredominantlyusedvs.MetricsimpactedthroughuseofGenerativeAI

Sprintandreleaseburndown/teamvelocity

Releasecycle/deploymenttimeChangefailurerate

PullrequestresolutiontimeCodecommitfrequency

DORA(DevOpsResearchandAssessment)...

32%

71%

NumberofuserstorypointscompletedCodechurn

SPACEmetrics

21%

30%

19%

16%

72%73%

0%20%40%60%80%

Keymetricsusedinyourorganization

MetricsthathaveshownapositiveimpactduetoGenAIusage

ma42%

61%

mm41%

mm39%

51%52%

62%

38%

37%

69%

Themeasurementprotocoloffersawell-definedprocessthatcreatescomprehensible,comparableandreliableresults.

6Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

Components–Whatdoesthemeasurementprotocolconsistof?

?Teamsorganization:Organizeteamsforsignificantandactionableresults,usingdifferentpatternssuchasparallelteams,shadowteams,ormulti-pyramidteams

?Measurementapproach:Establishthetimelineandtheprocessforconductingthemeasurement,includingthepreparation,thebaseline,and

theexecution.

?Measurementmetrics:IdentifythemetricsformeasuringtheimpactofGenAIonsoftware

engineering,suchascodingvelocity,codequality,codesecurity,anddeveloperexperience.

?Measurementtooling:Agreethetoolsforcollectingandanalyzingthemetrics,suchasSonarQube,CAST,Jira,ordevelopersurveys.

?Measurementreporting:Selectthetemplatesandformatsforpresentingandcommunicatingthemeasurementresults,bothindetailedandexecutivelevels.

?Prerequisitesandsuccessfactors:Putinplace

theconditionsandfactorsthatneedtobemetandconsideredforasolidandconsistentmeasurement,suchasteamstability,duration,backlog,technology,tooling,legal,andcybersecurity.

?Normalizationprocess:Definehowtomanage

instabilityandvariabilityduringtheexperimentationexecution,suchaschangesinteamsize,

capacity,orcomplexity,andhowtoadjustthemetricsaccordingly.

?Qualitativefeedback:Createamechanismto

harnessinformationontheholisticexperiences

ofdevelopersintheformofsurveysorverbatimreports.Beawarethatnegativeexperiencesoftenprovidethebestlearnings.

Metrics-Whatdatawillbeevaluatedduringtherealsizeexperimentations?

Besidesestablishingandenablingthecomponentsit’sessentialtodefinethemetricsthatwillbeevaluated

duringtherealsizeexperimentations.Thesearethe

fundamentaldatapointstobemeasuredandanalyzed.

?Velocity:Thisshouldbemeasuredondifferentlevels,asit’sthemostimportantmetric.

.Codingvelocity:Thisisthekeyindicatorofa

team’sproductivity(codingandunittesting)andistypicallymeasuredintermsofimplementedstorypoints.

.Codingvelocityperdevelopercapability/

seniority:Calculatedasthetimetakento

complete“X”storypointswithandwithoutGenAIassistanceasperdevelopercapability(good,average,belowaverage)

.Codingvelocitypercomplexity:Calculatedas

userstorypointsrequiredwithandwithoutGenAIassistanceasperstorycomplexity(simple,

medium,andcomplex)

?Unittestingcoverage:Essentialmetricforassessingthequalityandreliabilityofsoftware.Tokeepit

simplewefocusoninstructioncoverage(C0)asthisismeasuredbymostofthetools.

?Codeefficiency:Measuresthepotential

performanceandscalabilitybottlenecksinsoftware.Tokeepitsimplewefocusonstaticcodeanalysisandnotruntimeanalysis(forexample,withprofilers).

Thisisnotanindustrystandard,butametricourclientsfindinvaluable.

?Codesecurity:Determinestheriskofvulnerabilityissuesandprobabilityofbreachesforanapplication.

?Codesmells:Referstoindicatorsofpoororproblematiccodethatmayrequireattentionorrefactoring.

?Codeduplication:Highlightsthepresenceof

identicalorsimilarcodefragmentsindifferentpartsofacodebase.WithGenAIit’smorelikelytocreatecodeduplicates.

7Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

Teamorganization–Whatworksbest?

Executingthemeasurementprotocolrequiresanengagementwithaproperuserstorybacklog.

Buthowmanyteamsshouldbeinvolvedinthe

measurementprocess?Inourview,it’salwaysbettertohavemultipleteamstominimizethehumanfactor.

?SingleTeam:Sequentialexecutionofbacklogofsameuserstorysize/complexitywithandwithoutGenAIassistancebyoneteam.

?MultipleTeams:ParallelexecutionofsamebacklogwithandwithoutGenAIassistancebyatleasttwoteams.

Teamsetup-Whatistheoptimalmix?

Theseniorityorcapabilitiesofateamareimportantfornormalization.Therefore,it’smandatorytoknowwhatkindofteammixisworkingonthedefinedbacklog.

?Seniorpyramidofteammembers:Highlyskilledandcapableteam.Noneedforcoaching,mentoring,or

detailedcodereviews.Inanidealworld,thisisthegoldstandard.

?Well-balancedpyramidofteammembers:Good

mixofseniorsandjuniors.Coaching,mentoring,andreviewingareundertakenbyseniorteammembersinparalleltodailywork.

?Juniorpyramidofteammembers:Amajority

ofjuniors.Thisdemandsafocusoncoaching,

mentoring,andreviewing,astherearejustafewseniorteammembers.

SingleTeam

ExistingprojectteamsupportedbyPilotcoreteam

SequentialexecutionofrequirementsofsamesizecomplexitywithandwithoutAl.

TOPilotBaselineReportReportReportReportReport

SameTeam

(WithCopilot)

SameTeam

(Withaddt'lTool)

SameTeam

(Withaddt'lTool)

ExistingTeam

(w/oAI)

6weeks

(3sprints)

Legal&LicensesClearance

MultipleTeamsApproach

ParallelexecutionofsamerequirementswithandwithoutAlAugmentation

TOPilotBaselineContinuousReporting

SameTeam(addt'lTool)

SameTeam(addt'l)

SameTeam(WithCopilot)

ExistingTeam(w/oAI).

Team1(Balanced-w/oAl)Team1(Balanced-withCopilot)Team1(Balanced-withaddt'|Tool)

Legal&LicensesClearance

Team2(Freshers-withCopilot)Team2(withaddt'lTool)SameTeam(addt'l)

8Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

Process–Whatarethekeyconsiderations?

Onceallthecomponentshavebeendefined,aprocess

asthecodingvelocity,thecodequality,thecode

isneededtoensurehighqualityresultsandtoreduce

security,andthedeveloperexperience,withoutGen

sideeffectsduetoestimationinaccuracyandthe

AIassistance.

humanfactor.

?

ExecutesprintsandreleaseswithGenAIassistance,

?Createtheteamsorganizationandthe

usingtheselectedGenAItoolsandfollowingthe

experimentationscope,includingtheusecases,the

bestpracticesandguidelines.

backlog,andthetechnologystack.

?

Collectthemetricsandthefeedbackduringand

?Definethetimelineandthemeasurementapproach,

aftertherealsizeexperimentationexecution,using

includingtheduration,thephases,andtheprocess.

themeasurementtoolsandthesurveys.

?Validatetheprerequisitesandthesuccessfactors,suchastheteamstability,thelegalclearance,the

?

Checkandnormalizethemetricsandthefeedback,usingthenormalizationprocessandtheformulas.

cybersecurityapproval,andthetoolingsetup.

?

Consolidateandreportthemeasurement

?Conductabaselineaudittounderstandthecurrent

results,usingthetemplatesandtheformats,and

situationofthesoftwareengineeringmetrics,such

highlightingthekeyinsightsandfindings.

Timeline

.>

?De?neteamsorganizationandpilotscope

?De?netimeline

?Validatepre-requisitesvalidation(tool

procurement,legalandsecurityapprovals)

?Prepareteamexecution(toolonboarding,

demos,toolset)

PreparationMeasurementExecution

Calculateproductivity,quality&securitymetrics

Conductabaseline(Productivity,Quality,Securitymetricsand

Developerfeedback)

RunCodeQualityMetricsTools(e.g.CAST,SonarQube)

Executereleasesandsprints

Collectmetricsandfeedbacks

CalculateTeamVelocity(e.g,fromAgileBoard,Jira)

SampleReport

Checkandnormalizemetricsandfeedbacks

ConsolidatedareportForeachrelease(intrarelease,cross-re-

lease,acrossteams)

[9]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024

9Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

Result:Keyinsightsfromrealsize

experimentationswithmeasurement

AtCapgeminiweranseveralrealsizeexperimentations,

bothinternallyandexternallytogetherwith

clients.Theresultsarepositiveandencouraging,assummarizedhere.

?Developersloveit:Thequalitativefeedbackfromteammembershasbeenconsistentlypositive.Theylovethenewtoolsandthedifferentwayofworking.Asstatedinourreport,generativeAItoolscanhelpjuniorprofessionalslearnfasterandcomeupto

speedquickly,whiletheyallowseniorprofessionalstofocusongroomingjuniorsbyensuringtheir

learningandretention,solvingcomplexissues,andcollaboratingwithbusiness.

?Velocitygoesup:Eachexperimenthasitsown

contextandspecifics,butoverall,weseea10-30%

improvementincodingandunittestingaspartofthesoftwaredevelopmentlifecycle.

?Benefitsforall:Figuresclearlyshowthatjuniorpyramidteamsbenefitmost,butseniorpyramidteamsalsobenefitmeasurably.

?Codequalityiskey:Weseenodegradationincodequalityasmeasuredbystaticcodeanalysisand

manualcodereviews.

?Documentationavailable:Asthe‘boring’documentationtaskwastakenoverbytheco-realsizeexperimentations,therelevantmetricsimproved.

?Unittestinplace:Thegenerationofunittestsimprovedoveralltestcodecoverage,whichisanimportantmetricforfunctionalcorrectness.

[9]CapgeminiResearchInstitute“Turbochargingsoftware”,June2024

10Measurementprotocol:WhataretheexpectedimpactsofGenAIonsoftwarecoding,andunittesting?

Conclusion

Measurement:

ThejourneyfromdatatodecisionsintherealmofGenAIhingesontherobustnessofmeasurementframeworks.Theseframeworksserveasthe

backboneofeffectiveevaluationandbenchmarking,enablingustoquantifytheimpactofAItools

ac

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論