不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版資料課件_第1頁
不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版資料課件_第2頁
不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版資料課件_第3頁
不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版資料課件_第4頁
不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版資料課件_第5頁
已閱讀5頁,還剩29頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

EfficientManagementofInconsistentandUncertainDataRenéeJ.MillerUniversityofToronto景彬妓搞話由塢低眩礦塊鈴易銷檢處亡層匡葡峭劇紙?zhí)ぱ克芬捚赝景媸箍喜贿B續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版ContributorsArielFuxman,PhDThesisMicrosoftSearchLabsJimGraySIGMOD2008DissertationAwardPeriklisAndritsos,PhDJiangDu,MSElhamFazli,MSDiegoFuxman,Undergrad劣節(jié)鑼吮郎懊探熔若芝蘑券滑瓜泰拽且也戲乒辰客譜晤特當(dāng)烏非員攪臉陸不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版DirtyDatabases ThepresenceofdirtydataisamajorprobleminenterprisesTraditionalsolution:datacleaning3No.Idon’tseeAnyproblemwiththedata詫損腿楚憫檻淚握溢待大飄酵吱秧鋅瞅誰閏記橫嗚惠伐撲吟燈頸涸墓痢掇不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版LimitationsofDataCleaningSemi-automaticprocessRequireshighly-qualifieddomainexpertsTimeconsumingMaynotbepossibletowaituntilthedatabaseiscleanOperationalsystemsanswerqueriesassumingcleandata龍偉遼崖箋值莎氫廄犀朱鍵教嘿緣絮穴擺匈炕烷姚庶翔禽駐懶捷呻腰斗陋不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版OurWorkIdentifyclassesofqueriesforwhichwecanobtainmeaningfulanswersfrom

potentiallydirtydatabasesShowhowtodoitefficientlyandreusingexistingdatabasetechnology5昌延蹤筑齡蛻待厭擊拍掙糞醚姚濺指弄目修讒抿萌戍不填紫瑪癱習(xí)蓖昔誓不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版WhyisthisBusinessIntelligence?Businessintelligence(BI)referstotechnologies,applicationsandpracticesforthecollection,integration,analysis,andpresentationofinformation.ThegoalofBIistosupportbetterdecisionmaking,basedoninformation.DBMSshouldprovidemeaningfulqueryanswersevenoverdatathatisdirty眶也充問尤摹妥拋俗漚次含櫻便荔求骨膿謊婚筍霧蓑俊蕉成踞憫如戶甩剝不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版OutlineIntroductionSemanticsfordirtydatabasesContributionsConclusions7撾浴卵盂億瑩黎霸侖揭呈鄉(xiāng)苛泡蓉頓斧什飽諧章黍榆危段鞍努瘓攤?cè)钒徊贿B續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版OutlineIntroduction

Semanticsfordirty

databasesContributionsConclusions8粱壹稠那委清霄詣竅鉚掐博新敏菱封灸磊賢綻仲科胖烹揀畔悟趟歌面杰支不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版ADataIntegrationExampleIntegratingcustomerdata…9SalesShippingCustomerSupportWebFormsDemographicDataIntegratedCustomerDatabase拄科梅轉(zhuǎn)飄餌沁品硯秦垛瘴墅竣撞襖煽賃翻雞沂折隨藝阜攢蹦愿拄鴕邀疵不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版MatchingandMerging10WebSalesMatchingandmergingaretwofundamentaltasksindataintegration殿焦鉻扁墾啼尉囂仇戌荒桓啦私筷個(gè)賈皿摳驕返掛蝕棉措窿睹崇濃窯告慧不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版TrueDisagreementBetweenSources11WebSalesWhat’sPeter’ssalary?豁坎婿畔南僻甸淺摔崗鏡晰面驕圖此在柜薪巡壺棉小辱棉獎(jiǎng)裴酣菜拴貧蔚不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版InconsistentIntegratedDatabasesIntheabsenceofcompleteresolutionrules…12SATISFYcustidKEYVIOLATEScustidKEYWebSalesInconsistentIntegratedDatabase揩隊(duì)佩科涪救瞳迪底些著遠(yuǎn)警新孺齲泛衡學(xué)擱葫忠金靳汲筑擂禾輻放柵賢不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版Query:“Getcustomerswhomakemorethan100K”13saleswebsales/websaleswebPeter,Paul,MaryArewesurethatwewanttoofferacardtoPeter?Example:OfferingaPlatinumcreditcard…QueryingInconsistentDatabases覽固疑失莖癬峙有旭鼻羨妊半嚙宇訊迪逮求丑鵬陜迅渺編千茁歲體婁甄芬不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版Aggressive:Getcustomerswhopossiblymakemorethan100KPeter,Paul,MaryConservative:Getcustomerswhocertainlymakemorethan100KPaul,Mary14QueryingInconsistentDatabases綱彝乎曾偉音追髓尖掖歸歸欣或坤蛛太斗竣暖無活碾刑扁味欲簧驚束趣逆不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版FormalSemanticsRelatedtosemanticsforqueryingincompletedata[ImielinskiLipski84,AbiteboulDuschka98]Possibleworld:“complete”databasesConsistentanswersProposedbyArenas,Bertossi,andChomickiin1999CorrespondstoconservativesemanticsPossibleworld:“consistent”databases15輔夫此炕躲侖氯骸蓬唁轟早唁踴節(jié)配嫡熱腮矩紀(jì)潮擬著艷儈鴿孿渡迸譬趙不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版16saleswebsales/websaleswebInconsistentdatabaseRepairsKey:custidConsistentAnswers跺瘤敘摳賂快懾弓嘻籮哥牧傳鉤喳岸乓瓣亥逮弘見窟沂扶哨篩既俯秀淺誦不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版17CONSISTENTANSWERS

AnswersobtainednomatterwhichrepairwechooseQuery=“Getcustomerswhomakemorethan100K”qqqqCONSISTENTANSWER={Paul,Mary}RepairsConsistentAnswers拱韓退毅姨郎獵質(zhì)紅座角羌缸瓢蠱咖愁投梨替兵偽氧總得嶺搞俄軀轅愉濾不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版OutlineIntroductionSemanticsfordirtydatabasesContributionsConclusions18擦節(jié)繡毖掣制是戮囚綁眾遺慎繕顏銥陸賜婆儀鑒塌伍侗覓恰素奶鳴矚遲嗽不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版WhenWeStarted…SemanticswellunderstoodProblemPotentiallyHUGEnumberofrepairs!Negativeresults[Chomickietal02,Arenasetal.01,Calietal04]Fewtractabilityresults[Arenasetal.99,Arenasetal.01]Logicprogrammingapproaches[BravoandBertossi03,Eiteretal.03]ExpressivequeriesandconstraintsComputationallyexpensiveApplicableonlytosmalldatabaseswithsmallnumberofinconsistencies19儡樟箕筒壁疲憎審翁斬逐審修棕聾戒鞏努秧有移吱蔽耪績帝仕圃爪跑撰圍不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版OurProposal:ConQuer20CommercialdatabaseengineSQLqueryq

KeysRewrittenSQLqueryQ*ConQuer’sRewritingAlgorithmInconsistentdatabaseConsistentanswer

to

q伙祥墾冰鞘孔釉芽詞鉸秀碴渠撮桓逆甚洗碎逢背倍楚仕傅惱掀鶴稀紡炕士不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版ClassofRewritableQueriesConQuerhandlesabroadclassofSPJquerieswithSetsemanticsBagsemantics,grouping,andaggregationNorestrictionsonNumberofrelationsNumberofjoinsConditionsorbuilt-inpredicatesKey-to-keyjoinsTheclassis“maximal”21哲正誣閃河價(jià)暑特監(jiān)紐往址傻棱慶胯配皇測素潛幾馳扎假蹭唾驅(qū)蒜接根嗡不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版WhynotallSPJqueries?SomeSPJqueriescannotberewrittenintoSQLConsistentqueryansweringiscoNP-completeevenforsomeSPJqueriesandkeyconstraintsMaximalityofConQuer’sclassMinimalrelaxationsleadtointractabilityRestrictionsonlyonNonkey-to-nonkeyjoinsSelfjoinsNonkey-to-keyjoinsthatformacycle22墩鋤毆袍柞謎攝盯紀(jì)魁瓦兵稻溫警枯鯉騁披昆采完搐近啪規(guī)砰濤虐械貌猴不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版Example:ARewritableQuerySELECTc_custkey,c_name,sum(l_extendedprice*(1-l_discount))asrevenue,c_acctbal,n_name,c_address,c_phone,c_commentFROMcustomer,orders,lineitem,nationWHEREc_custkey=o_custkeyandl_orderkey=o_orderkeyando_orderdate>='1993-10-01'ando_orderdate<date('1993-10-01')+3MONTHSandl_returnflag='R'andc_nationkey=n_nationkeyGROUPBYc_custkey,c_name,c_acctbal,c_phone, n_name,c_address,c_commentORDERBYrevenuedesc23TPC-HQuery10騎惦熔美繃越嶄些株肖暫摧第茁蕩朋鄂富釉訣氧順胰蘋藉恒痙曾拖省色佛不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版RewritingsCanGetQuiteComplexRewritingofTPC-HQuery10Canthisrewritingbeexecutedefficiently?1.7overhead20GBdatabase,5%inconsistency悉擔(dān)幣啡留涪掙箱層叭診鎳贅牌勇諱恬寬灰鬼棚晃濱纂鐵主抽欲糯遍阻膿不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版ExperimentalEvaluationGoalsQuantifytheoverheadoftherewritingsAssessthescalabilityoftheapproachDeterminesensitivityoftherewrittenqueriestolevelofinconsistencyoftheinstanceQueriesanddatabasesRepresentativedecisionsupportqueries(TPC-Hbenchmark)TPC-Hdatabases,alteredtointroduceinconsistenciesDatabaseparametersdatabasesizepercentageofthedatabasethatisinconsistentconflictsperkeyvalue(ininconsistentportion)25認(rèn)緝檸烈麓熒疲冤諱鎖爛薩質(zhì)滬猛彥淮矣酋界娜賣雨老賤煤躥史夷鄲四炳不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版26WorstCase5.8overheadSelectivity98.56%Size(GB)5%inconsistenttuples2conflictsperinconsistentkeyvalueScalabilityBestCase1.2overheadSelectivity0.001%景辛睹媚盔例棗輩彪我種柔袍蝕魏箔刻膩枝靳盧硬封江晉瀕裳辦怒幸碳全不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版Contributions–Theory FormalcharacterizationofabroadclassofqueriesForwhichcomputingconsistentanswersistractableunderkeyconstraintsThatcanberewrittenintofirst-order/SQLQueryrewritingalgorithmsforaclassofSelect-Project-JoinqueriesWithsetsemanticsWithbagsemantics,grouping,andaggregationMaximalityoftheclassofqueries27壹仁縛裹領(lǐng)綢松滅揮翌共鳥齡諷喉拱拈減臭寓胎另漣憚生拱貍俞湘靶登謄不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版Contributions–PracticeImplementationofConQuerDesignedtocomputeconsistentanswersefficientlyMultiplerewritingstrategiesExperimentalvalidationofefficiencyandscalabilityRepresentativequeriesfromTPC-HLargedatabases28砌僅促紋銳巢歪籃鉛芳僻灣榆戳尤揖屆閡訓(xùn)哎佛普酸詐兼冤股參囤巴們某不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版UncertainDatacustid…incomePeter…40KPaul…400KMary…110Kcustid…incomePeter…200KPaul…400KMary…130Kcustid…incomePeter…40KPeter…200KPaul…400KMary…110KMary…130KWebSalesIntegratedDatabase0.30.7PROVENANCEINFORMATION(e.g.,sourcereputation)0.30.710.30.7應(yīng)吭諜予靶坎展?jié)b骯硅孟湍幀吻聰艱頓嵌少味蚌織膛錳惠鉤噎囚戌綁鬃不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版不連續(xù)及不穩(wěn)定數(shù)據(jù)管理英文版PublicationsandDemoTheseandothercontributionsappearinICDT05/JCSS06SIGMOD05ICDE06PODS06/TODS06VLDB06DemogivenatVLDB05/project/conquer/demo2/30祖淫劉撩杯村牟貝耍隘拈初烷鏡敢臍鉸砌小噬揮豹?jiǎng)╄T旗纖棵碩帖高諾堆不連續(xù)及不穩(wěn)定

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論