版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
Copyright?2019McGraw-HillEducation.Allrightsreserved.NoreproductionordistributionwithoutthepriorwrittenconsentofMcGraw-HillEducation.
CHAPTER2LABS-KEY
(Level1Header)Lab2-1Createarequestfordataextraction
Q1.GiventhatyouarenewandtryingtogetagrasponSláinte’soperations,listthreequestionsrelatedtosalesthatwouldhelpyoubeginyouranalysis.Forexample,howmanyproductsweresoldineachstate?
Open-ended–nokeyprovided.
Possibleanswers:
Whatisthehighestsellingproduct?
Howdoquantitiessoldperproductdifferacrossstates?
Whatisaveragequantityofeachproductsoldperday/perstate/permonth?
Whatisthetotalquantityofeachproductsoldperday/perstate/permonth?
Q2.Nowhypothesizetheanswerstoeachofthequestions.Remember,youranswersdon’thavetobecorrectatthispoint.Theywillhelpyouunderstandwhattypeofdatayouarelookingfor.Forexample:500inMissouri,6,000inPennsylvania,4,000inNewYork,etc.
Open-ended–nokeyprovided.
Q3.Finally,foreachquestion,identifythespecifictablesandattributesthatareneededtoansweryourquestions.Forexample,toanswerthequestionaboutstatesales,youwouldneedthe[State]attributewhichismostlikelylocatedinthe[Customer]mastertableaswellasa[QuantitySold]attributeina[Sales]table.Ifyouhadaccesstostoreordistributioncenterlocationdata,youmayalsolookfora[State]fieldthereaswell.
Open-ended–nokeyprovided.
(Level2Header)Part2:Generatearequestfordata
Nowthatyou’veidentifiedthedatayouneedforyouranalysis,completeaDataRequestForm.
OpentheDataRequestForm
Enteryourcontactinformation.
Inthedescriptionfield,identifythetablesthatyou’dliketoanalyze,alongwiththetimeperiods(e.g.pastmonth,pastyear,etc.)
Open-ended–nokeyprovided.
Selectafrequency.Inthiscasethisisa“One-offrequest”.
Enterarequestdate(today)andarequireddate(oneweekfromtoday)
Chooseaformat(spreadsheet).
FinallycompletetheTobeusedinbox(internalanalysis).
TAKEASCREENSHOT(2-1)ofyourcompletedform.
(Level2Header)Part3:Performananalysisofthedata
Q4.Takeamomentandidentifyanyattributesthatyouaremissingfromyouroriginalrequestthatwouldbenecessarytoansweryouroriginalquestionof“Howmanyproductsweresoldineachstate?”.
Open-ended–nokeyprovided.
Q5.Evaluateyouroriginalquestionsandresponses.Canyoustillanswertheoriginalquestion?
Open-ended–nokeyprovided.
Q6.IsthereanotherquestionyoucouldanswerfromthedataRachelprovided?
Possibleanswers:
Howmanysalesordershaseachemployeecreated?
HowmanysaleswerecreatedinthemonthofOctober?
Howmuchmoneywasgeneratedthroughsalesfortheentireperiod?
HowmuchmoneywasgeneratedthroughsalesforthemonthofOctober?
ENDOFLAB
(Level1Header)Lab2-2UsePivotTablestode-normalizeandanalyzethedata
(Level2Header)Part1:IdentifytheQuestions
Q1.GivenSláinte’srequest,identifythedataattributesandtablesneededtoanswerthequestion.
Sales_Subset:Product_Code,Sales_Order_Quantity_Sold
WouldalsobehelpfultohaveFGI_Product:Product_Description
(Level2Header)Part2:Masterthedata:PreparedataforanalysisinExcel
Q2.Whenwoulditbeagoodideatouseasingletable?
Anytimeallofthedatayouneedareinasingletable,thereisnoneedtoextractmorethanonetable.
Alternative2:UsetheExcelInternalDataModel
TAKEASCREENSHOT(2-2a)oftheManageRelationshipswindowwithbothrelationshipscreated.
Q3.Howcomfortableareyouwithidentifyingprimarykey-foreignkeyrelationships?
KEY:open-endedquestion,nokeyprovided
Alternative3:MergingthedataintoasingletableusingExcelQueryEditor
MaximizetheQueryEditorwindow,andTAKEASCREENSHOT(2-2b).
KEYScreenshot:
Q4.HaveyouusedtheQueryEditorinExcelbefore?Double-clickthe[Sales_Subset]queryandclickthroughthetabsontheribbon.Whichoptionsdoyouthinkwillbeusefulinthefuture?
KEY:Open-endedquestion,nokeyprovided.
Alternative4:UseSQLqueriesinAccess
TAKEASCREENSHOT(2-2c).
KEY:Screenshot
(Level2Header)Part3:PerformananalysisusingPivotTablesandQueries
TAKEASCREENSHOT(2-2d)
KEYSREENSHOT:
TAKEASCREENSHOT(2-2e)
Keyscreenshot:
TAKEASCREENSHOT(2-2f)
Keyscreenshot:
SaveyourqueryasTotal_Sales_By_Productandcloseyourdatabase.
(Level2Header)Part4:Addressandrefineyourresults
Q5.IftheownerofSláintewishestoidentifywhichproductsoldthemost,howwouldyoumakethisreportmoreuseful?
Severalpossibleanswers.Someoptionsinclude:sortingthedataorfilteringthedatatoviewonlytheproductassociatedwithhighesttotal_sales.
Q6.Ifyouwantedtoprovidemoredetail,whatotherattributeswouldbeusefultoaddasadditionalrowsorcolumnstoyourreport,orwhatotherreportswouldyoucreate?
Manypossibleanswers.AgoodoptionwouldbetoincludeDatedatafromtheSales_Subsettabletodoanalysisonwhichproductsellsmorebasedonmonthsorseasons.
(Level2Header)Part5:Communicateyourfindings
Let'smakethiseasyforotherstounderstandusingvisualizationandexplanations.
Q7.WriteabriefparagraphabouthowyouwouldinterprettheresultsofyouranalysisinplainEnglish?Forexample,whichdatapointsstandout?
Open-endedquestion,nosolutionprovided.
Q8.InChapter4we’lldiscusssomevisualizationtechniques.Describeawayyoucouldpresentthisdataasachartorgraph.
Open-endedquestion,nosolutionprovided.PossibleanswersincludePivotChartvisualizedasabarchart,usingfilters,slicers,ortimelinestomakethedatamoreinteractive.
Endoflab
(Level1Header)Lab2-3ResolvecommondataproblemsinExcelandAccess
Q1.WhatdoyouexpectwillbemajordataqualityissueswithLendingClub’sdata?
Open-endedquestion,nokeyprovided.StudentsshouldrelyonwhattheylearnedinChapter2regardingdataqualityissuestomakeassumptionsonwhatcouldcauseproblemsinthisfile.
(Level2Header)Part2:MastertheData
Q2.Giventhislistofattributes,whatconcernsdoyouhavewiththedata’sabilitytopredictanswerstothequestionsyouidentifiedinChapter1?
Open-endedquestion,nokeyprovided.
Q3.Isthereanythinginthedatathatyouthinkwillmakeanalysisdifficult?Forexample,arethereanyspecialsymbols,non-standarddata,ornumbersthatlookoutofplace?
Open-endedquestion,nokeyprovided.Thesheersizeofthedatamaystrikesomestudentsasbeingdifficulttoanalyze,aswellastheamountofblank/nullvalues.
Q4.Whatwouldyoudotocleanthedatainthisfile?
Open-endedquestion,nokeyprovided.Thenextsectionofthelab,“Let’sidentifysomeissueswiththedata…”introducesseveraloftheitemsthatneedtobecleaned(ortransformed).
Let’sidentifysomeissueswiththedata.
Therearemanyattributeswithoutanydata,andthatmaynotbenecessary.
The[int_rate]valuesarewrittenin##.##%,butanalysiswillrequire#.####
The[term]valuesincludetheword“months”,whichshouldberemovedfornumericalanalysis.
The[emp_length]valuesinclude“n/a”,“<”,“+”,“year”,and“years”,whichshouldberemovedfornumericalanalysis
Dates,including[issue_d],canbemoreusefulifweexpandthemtoshowtheday,month,andyearasseparateattributes.Datescauseissuesingeneralbecausedifferentsystemsusedifferentdateformats(e.g.1/9/2009,Jan-2009,9/1/2009forEuropeandates,etc.),sotypicallysomeconversionisnecessary.
First,removetheunwanteddata:
Saveyourfileas“Loans2007-2011.xlsx”totakeadvantageofsomeofExcel’sfeatures.
Deletethefirstrowthatsays“Notesofferedbyprospectus…”
Deletethelastfourrowsthatinclude“Totalamountfunded…”
Deletecolumnsthathavenovalues,including[id],[member_id],[url]
Repeatforanyotherblankcolumnsorunwantedattributes.
Thecolumnswiththeheadersrevol_bal_joint,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,andsec_app_mths_since_last_major_derogcanalsobedeleted.
Next,fixyournumbers:
Selectthe[int_rate]column.
IntheHometab,gototheNumbersectionandchangethenumbertypefromPercentagetoGeneralusingthedrop-downmenu.
Repeatforanyotherattributeswithpercentages.
TAKEASCREENSHOT(2-3a)ofyourpartiallycleaneddatafile.
KeyScreenshot:
Then,removeanywordsfromnumericalvalues:
Selectthe[term]column.
UseFind&Replace(Ctrl+HorHome>Editing>Find&Select>Find&Replace)tofindthewords“months”and“month”andreplacethemwithanull/blankvalue“”.Important:Besuretoincludethespacebeforethewordsandgofromthelongestvariationofthewordtotheshortest.Inthiscaseifyoureplaced“month”first,youwouldendupwithalotofvaluesthatstillhadtheletter“s”From“months”.
Nowselectthe[emp_length]columnandfindandreplacethefollowingvalues:
Originalvalue
Newvalue
naorn/a
0
<1year
0
1year
1
2years
2
3years
3
4years
4
5years
5
6years
6
7years
7
8years
8
9years
9
10+years
10
,(comma)
(blank)
ThiscanbedoneeitherwithFindandReplaceorwithaFalseVLookup.Then/acellshavenonprintablecharactersinthem,sothe=CLEANfunctionwillbeusefulforensuringthen/avaluesarefoundintheircells.
TAKEASCREENSHOT(2-3b)ofyourpartiallycleaneddatafile,showingthe[term]column.
Q5.Whydoyouthinkitisusefultoreformatandextractpartsofthedatesbeforeyouconductyouranalysis?Whatdoyouthinkwouldhappenifyoudidn’t?
Open-endedquestion,nokeyprovided.Possibleanswersincludethatifyouintendtodoanalysisonanyofthevaluesthathaven’tfirstbeencleaned,youranalysismaynotworkormaynotrunonthecompletedataset.
Q6.Didyourunintoanymajorissueswhenyouattemptedtocleanthedata?Howwouldyouresolvethose?
Open-endedquestion,nokeyprovided.
ENDOFLAB
(Level1Header)Lab2-4GeneratesummarystatisticsinExcel
Becauseeveryquestioninthislabisopen-ended,thereisnokeyprovided.
ENDOFLAB
(Level1Header)Lab2-5–CollegeScorecardExtractandDataPreparation
(Level2Header)Part2:MastertheData
Takeascreenshot(1)
ScreenshotKey:
Q1.Bylookingthroughthedatainthetextfile,whatdoyouthinkthedelimiteris?
Comma
Takeascreenshot(2)
ScreenshotKey:
Toensurethatyoucapturedallofthedatathroughtheextractionfromthe.txtfile,weneedtovalidateit.Validatethefollowingchecksums:
Youshouldhave7,704records(rows).
Comparetheattributenames(columnheaders)totheattributeslistedinthedatadictionary.Areyoumissingany,ordoyouhaveanyextras?
TheaverageSATscoreshouldbe1,059.07(thisisleavingNULLvaluesasNULL).
Q2.Inthechecksums,youvalidatedthattheaverageSATscoreforalloftherecordsis1,059.07.Whenweworkwiththedatamorerigorously,severaltestswillrequireustotransformNULLvalues.IfyouweretotransformtheNULLSATvaluesinto0,whatwouldhappentotheaverage(woulditstaythesame,decrease,orincrease)?
Theaveragewoulddecrease
Howwouldthatchangetotheaverageimpactthewayyouwouldinterpretthedata?
ItwouldinaccuratelyrepresentaverylowSATaverageacrossallschools(CorrectAnswer)
Doyouthinkit’sagoodideatoreplaceNULLvalueswith0sinthiscase?
No
ToavoidtheissueswithNULL,blanks,and0s,wewillremovealloftherecordsthatcontainNULLvaluesineitherSAT_AVGorC150_4.Doso.
Performa=COUNT()toverifytheamountofrecordsthatremainafterremovingallrecordsassociatedwithNULLvaluesinSAT_AVGorC150_4.1,271recordsshouldremain.
Takeascreenshot(3)
KeyScreenshot:
Yourdataisnowreadyforthetestplan.Thislabwillcontinueinchapter3.
ENDOFLAB
(Level1Header)Lab2-6ComprehensiveCase:Dillard’sStoreData:HowtoCreateanE-RDiagram
QuestionsforParts1-3areallopen-ended,nokeyprovided.
(Level2Header)Part4:AddressandRefineResults
Q3. WhatistheprimarykeyfortheTRANSACTtable?WhatistheprimarykeyfortheSKUtable?
ITEM_ID–CorrectAnswerforSKUtable
TRANSACTION_ID–CorrectAnswerforTRANSACTtable
Q4. HowdoweconnecttheSKUdatabasetotheTRANSACTtable?Howdowejointablesfromtwodifferentrelatedtables?
Tablesarejoinedbyrelatingtheforeignandprimarykeys.TheTRANSACTtablehasaforeignkeyfromtheSKUtable,sotherelationshipbetweenthetwocharactersisthejoiningofTRANSACT.ITEM_IDandSKU.ITEM_ID.
ENDOFLAB
(Level1Header)Lab2-7ComprehensiveCase:Dillard’sStoreData:HowtoPreviewDataFromTablesinaQuery
(Level2Header)Part1:IdentifytheQuestions
Q1. Howwouldaviewoftheentiredatabaseorcertaintablesoutofthatdatabaseallowustogetafeelforthedata?
Open-endedquestion,possibleanswersincludethatitisnecessarytoviewhowadatabaseisstructuredinordertoknowwhatdataisavailabletoanalyze.Viewingtheactualdatastoredintablescanhelpexplainwhatcertainattributesrepresent,especiallyifyoudon’thaveadatadictionaryorifthedatadictionaryisn’tverydescriptive.
Q2. WhattypesofdatawouldyouguessthatDillard’s,aretailstore,gatherthatmightbeuseful?HowcouldDillard’ssuppliersusethisdatatopredictfuturepurchases?
Open-endedquestion,nokeyprovided.Possibleanswersinclude:salesdata(salesorders,salesorderdates,itemssold),customerdata(whateachcustomerpurchases,wheretheylive),inventory(retailprice,cost,category),etc.
TAKEASCREENSHOTOFYOURRESULTS(2)
KEYScreenshot
Q3. Whatdoyouthink‘P’and‘R’representintheTRAN_TYPEtable?Howmighttransactionsdifferiftheyarerepresentedby‘P’or‘R’.
Answerswillvary,butPrepresentsPurchaseandRrepresentsReturn.
Q4. Whatbenefitcanyougainfromselectingonlythetopfewrowsofyourdata,particularlyfromalargedataset?
Answerswillvary,butsomepossiblesolutionsincludegettingaquickglanceatthedatawithouthavingtowaitforthequerytorunifit’salargedataset.
(Level1Header)Lab2-8ComprehensiveCase:Dillard’sStoreData:ConnectingExceltoaSQLDatabase
Q1. WhatcanyoudoinExcelthatismuchmoredifficulttodoinotherdatamanagementprograms?
Open-endedquestion,nokeyprovided.Possibleanswersmightbebasedaroundstudents’generalfamiliaritywithExcel–it’seasierforthemtoworkwiththanPython,R,orSQL,forexample,becauseofitsfriendlierinterface.
Q2. BecausemostaccountantsarefamiliarwithExcel,namethreedatamanagementfunctionsyoucandoeasierinExcelthananyotherprogram?Howdoesthatfamiliarityhelpyouwithyouranalysis?
Open-endedquestion,nokeyprovided.PossibleanswersincludePivotTables,Tables,andvisualizingdata(it’sarguablewhetherthesefunctionsaretrulyeasierinExcel,butmostofourstudentswillprobablythinkit’seasierinExcelatthispointduetohavinglessexposuretoothertools.Theiranswerstothisquestionisverydependentontheirpreviousexperience).
TakeascreenshotofthePivotTable.
Q3. ReferenceyourPivotTableandfindwhichstatehasthehighestnumberofDillard’sstores.Whichstateshavethefewest?Howmanystoresarethereacrossthecountry?
Texashasthehighestnumberofstores,NewYorkandWyominghavethelowest.Thereare313storesacrossthecountry.
Q4. CountingthenumberofstoresperstateisoneexampleofhowthedatathathasbeenloadedfromSQLServerintoExcelcanbecomeusefulinformationthroughaPivotTable.WhatareotherwaysthatyoucouldorganizetheSTOREdatainaPivotTabletocomeupwithmeaningfulinformation?
Open-endedquestion,possibleanswersincludedrillingdownintoDivisionandCity.Tomakethedatafarmoreinteresting,joininginothertablescouldprovidemeaningfulanalysis(salesperstore,salesperstate,etc.)
Q5.JoinsaremadebasedontheirPrimaryKey–ForeignKeyrelationship.LookingattheERDiagramorthedataset,whichtwocolumnsformtherelationshipbetweentheTRANSACTandSTOREtables?
Transact.Store=Store.Store
Q6.Lookingatthefirstseveralrowsofdata,comparetheamountsinORIG_PRICE,SALE_PRICE,TRAN_AMT.Whatdoyouthinktran_amtrepresents?
Thetotaltransactionamount,takingintoaccountdiscounts.
Q7.Whatarethemeansforeachoftheattributes?
ORIG_PRICE:53.99
SALE_PRICE:35.46
TRAN_AMT:27.84
Q8.ThemeanfromTRAN_AMTislowerthanthemeansforbothORIG_PRICEandSALE_PRICE,whydoyouthinkthatis?(Hint:itisnotanerror).
TheTRAN_AMTnotonlytakesintoaccountdiscounts,butalsoisnegativewhenthetransactionisareturn.
(Level2Header)Part5:AddressandRefineResults
Q9.HowdoesdoingaquerywithinExcelallowquickerandmoreefficientaccessandanalysisofthedata?
Open-endedquestion,nokeyprovided.Possibleresponsesincludenothavingtoexportthequeryresultsfromthedatabase.
Q10.Is15daysofdata
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 《金版學(xué)案》2022屆高考化學(xué)一輪復(fù)習(xí)習(xí)題-第2章-第4節(jié)-銅及其化合物-金屬材料-
- 安徽省渦陽(yáng)縣王元中學(xué)2024-2025學(xué)年第一學(xué)期七年級(jí)期末考試語(yǔ)文試卷(含答案)
- 2022《創(chuàng)新設(shè)計(jì)》高考?xì)v史大一輪復(fù)習(xí)專(zhuān)題探究提升訓(xùn)練8
- 【創(chuàng)新設(shè)計(jì)】2021高考化學(xué)總復(fù)習(xí)(江西版)作業(yè)本:第11章-課時(shí)5-基本營(yíng)養(yǎng)物質(zhì)-有機(jī)高分子化合物
- 酒店餐飲部工作總結(jié)
- 三年級(jí)數(shù)學(xué)(上)計(jì)算題專(zhuān)項(xiàng)練習(xí)附答案
- 《東方漢院項(xiàng)目介紹》課件
- 【2021屆備考】2021屆全國(guó)名校數(shù)學(xué)試題分類(lèi)解析匯編(12月第四期)K單元概率
- 《zlk查房資料》課件
- 社區(qū)心理衛(wèi)生1(心身疾病)
- 期末試卷(試題)-2024-2025學(xué)年滬教版三年級(jí)上冊(cè)數(shù)學(xué)
- 句子成分及句子基本結(jié)構(gòu)(共32張PPT)
- 水上拋石護(hù)坡施工方案
- 燃?xì)忮仩t房和直燃機(jī)房防爆問(wèn)題
- 物料提升機(jī)基礎(chǔ)方案
- 840dsl常用參數(shù)
- 員工入職體檢表
- 連續(xù)油管鉆井技術(shù)課件
- 企業(yè)員工培訓(xùn)的現(xiàn)狀及對(duì)策的研究
- 以童謠文化建設(shè)推進(jìn)學(xué)校辦學(xué)特色實(shí)踐探究
- 中學(xué)數(shù)學(xué)課堂教學(xué)評(píng)價(jià)表
評(píng)論
0/150
提交評(píng)論