版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、OutlineIntroductionBackgroundDistributed Database DesignDatabase IntegrationSchema MatchingSchema MappingSemantic Data ControlDistributed Query ProcessingMultimedia Query ProcessingDistributed Transaction ManagementData ReplicationParallel Database SystemsDistributed Object DBMSPeer-to-Peer Data Man
2、agementWeb Data Management Current IssuesProblem DefinitionGiven existing databases with their Local Conceptual Schemas (LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS)GCS is also called mediated schemaBottom-up design processIntegration AlternativesPhysical integrationSource
3、databases integrated and the integrated database is materializedData warehousesLogical integrationGlobal conceptual schema is virtual and not materializedEnterprise Information Integration (EII)Data Warehouse ApproachBottom-up DesignGCS (also called mediated schema) is defined firstMap LCSs to this
4、schemaAs in data warehousesGCS is defined as an integration of parts of LCSsGenerate GCS and map LCSs to this GCSGCS/LCS RelationshipLocal-as-viewThe GCS definition is assumed to exist, and each LCS is treated as a view definition over itGlobal-as-viewThe GCS is defined as a set of views over the LC
5、SsDatabase Integration ProcessRecall Access ArchitectureDatabase Integration IssuesSchema translationComponent database schemas translated to a common intermediate canonical representationSchema generationIntermediate schemas are used to create a global conceptual schemaSchema TranslationWhat is the
6、 canonical data model?RelationalEntity-relationshipDIKEObject-orientedARTEMISGraph-orientedDIPE, TranScm, COMA, CupidPreferable with emergence of XMLNo common graph formalismMapping algorithmsThese are well-knownSchema GenerationSchema matchingFinding the correspondences between multiple schemasSche
7、ma integrationCreation of the GCS (or mediated schema) using the correspondencesSchema mappingHow to map data from local databases to the GCSImportant: sometimes the GCS is defined first and schema matching and schema mapping is done against this target GCSRunning ExampleEMP(ENO, ENAME, TITLE)PROJ(P
8、NO, PNAME, BUDGET, LOC, CNAME)ASG(ENO, PNO, RESP, DUR)PAY(TITLE, SAL)RelationalE-R ModelSchema MatchingSchema heterogeneityStructural heterogeneityType conflictsDependency conflictsKey conflictsBehavioral conflictsSemantic heterogeneityMore important and harder to deal withSynonyms, homonyms, hypern
9、ymsDifferent ontologyImprecise wordingSchema Matching (contd)Other complicationsInsufficient schema and instance informationUnavailability of schema documentationSubjectivity of matchingIssues that affect schema matchingSchema versus instance matchingElement versus structure level matchingMatching c
10、ardinalitySchema Matching ApproachesLinguistic Schema MatchingUse element names and other textual information (textual descriptions, annotations) May use external sources (e.g., Thesauri)SC1.element-1 SC2.element-2, p,sElement-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p hold
11、s with a similarity value of sSchema levelDeal with names of schema elementsHandle cases such as synonyms, homonyms, hypernyms, data type similaritiesInstance levelFocus on information retrieval techniques (e.g., word frequencies, key terms)“Deduce” similarities from theseLinguistic MatchersUse a se
12、t of linguistic (terminological) rulesBasic rules can be hand-crafted or may be discovered from outside sources (e.g., WordNet)Predicate p and similarity value s hand-crafted specified, discovered may be computed or specified by an expert after discoveryExamplesuppercase names lower case names, true
13、, 1.0uppercase names capitalized names, true, 1.0capitalized names lower case names, true, 1.0DB1.ASG DB2.WORKS_IN, true, 0.8Automatic Discovery of Name SimilaritiesAffixesCommon prefixes and suffixes between two element name stringsN-gramsComparing how many substrings of length n are common between
14、 the two name stringsEdit distanceNumber of character modifications (additions, deletions, insertions) that needs to be performed to convert one string into the otherSoundex codePhonetic similarity between names based on their soundex codesAlso look at data typesData type similarity may suggest stro
15、nger relationship than the computed similarity using these methods or to differentiate between multiple strings with same valueN-gram Example3-grams of string “Responsibility” are the following:Res sibibi espbip spoili ponlitonsitynsi3-grams of string “Resp” areResesp3-gram similarity: 2/12 = 0.17Ed
16、it Distance ExampleAgain consider “Responsibility” and “Resp”To convert “Responsibility” to “Resp”Delete characters “o”, “n”, “s”, “i”, “b”, “i”, “l(fā)”, “i”, “t”, “y”To convert “Resp” to “Responsibility”Add characters “o”, “n”, “s”, “i”, “b”, “i”, “l(fā)”, “i”, “t”, “y”The number of edit operations requir
17、ed is 10Similarity is 1 (10/14) = 0.29Constraint-based MatchersData always have constraints use themData type informationValue rangesExamplesRESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity = 0.19 (low)If they come from the same domain, this may increase their similarity v
18、alueENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-RENO and WORKER.NUMBER may have type INTEGER while PROJECT.NUMBER may have STRINGConstraint-based Structural MatchingIf two schema elements are structurally similar, then there is a higher likelihood that they represent the same conceptStr
19、uctural similarity:Same properties (attributes)“Neighborhood” similarityUsing graph representationThe set of nodes that can be reached within a particular path length from a node are the neighbors of that nodeIf two concepts (nodes) have similar set of neighbors, they are likely to represent the sam
20、e conceptLearning-based Schema MatchingUse machine learning techniques to determine schema matchesClassification problem: classify concepts from various schemas into classes according to their similarity. Those that fall into the same class represent similar conceptsSimilarity is defined according t
21、o features of data instancesClassification is “l(fā)earned” from a training setLearning-based Schema MatchingCombined Schema Matching ApproachesUse multiple matchersEach matcher focuses on one area (name, etc)Meta-matcher integrates these into one predictionIntegration may be simple (take average of sim
22、ilarity values) or more complex (see Fagins work)Schema IntegrationUse the correspondences to create a GCSMainly a manual process, although rules can helpBinary Integration MethodsN-ary Integration MethodsSchema MappingMapping data from each local database (source) to GCS (target) while preserving semantic consistency as defined in both source and target.Data warehouses actual translationData integration systems discover mappings that ca
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 《信息產(chǎn)業(yè)》課件
- 證券結(jié)構(gòu)化產(chǎn)品協(xié)議三篇
- 《球墨鑄鐵直埋熱水管道技術(shù)規(guī)程》公示稿
- 校園美術(shù)作品長(zhǎng)廊建設(shè)規(guī)劃計(jì)劃
- 典當(dāng)服務(wù)相關(guān)行業(yè)投資規(guī)劃報(bào)告范本
- 工具臺(tái)車相關(guān)項(xiàng)目投資計(jì)劃書
- 情感教育與道德認(rèn)知的結(jié)合計(jì)劃
- 增強(qiáng)幼兒園團(tuán)隊(duì)建設(shè)的策略計(jì)劃
- 青少年犯罪預(yù)防的保安策略計(jì)劃
- 理財(cái)規(guī)劃師課件(綜合案例分析)
- 怎樣保護(hù)我們的眼睛(教學(xué)設(shè)計(jì))-2024-2025學(xué)年五年級(jí)上冊(cè)綜合實(shí)踐活動(dòng)教科版
- 合肥新華書店招聘筆試題庫(kù)2024
- 2024-2030年全球與中國(guó)石墨匣缽市場(chǎng)現(xiàn)狀動(dòng)態(tài)及投資價(jià)值評(píng)估報(bào)告
- 腎穿刺活檢術(shù)的準(zhǔn)備和操作方法教學(xué)提綱
- 20以內(nèi)的加法口算練習(xí)題4000題 210
- 讀書分享課件:《一句頂一萬(wàn)句》
- GB 17927-2024家具阻燃性能安全技術(shù)規(guī)范
- 公墓管理制度模板
- 補(bǔ)簽考勤管理制度
- 30萬(wàn)噸級(jí)原油碼頭工程施工組織設(shè)計(jì)(沉箱重力墩式棧橋碼頭)
- 地力培肥合同協(xié)議書
評(píng)論
0/150
提交評(píng)論