




版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
知識(shí)圖譜的集成 計(jì)算機(jī)科學(xué)與軟件新技術(shù)國(guó)CCKS2016講習(xí)班,提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata 2Semantic SemanticWebwasathoughtfromTimBerners- GiveformalmeaningstoWebinformation– Web1.0(page)àWeb2.0(social)àWeb3.0(awebof SemanticWebiscommonformats integrationandcombinationofdrawnfromdiverselanguages recordinghowthedatarelatestoreal-worldobjects3 RDF
謂主 賓
LayerTheworldisnotmadeofstrings,butismadeofthings4Linkeddata 數(shù)據(jù)/關(guān)聯(lián)數(shù)據(jù) AsarealizationofSemantic LinkedDatareferstoacollectionofinterrelated Usedforlarge-scaleintegrationof,reasoningon,dataonthe LinkeddataUseURIstonameUseHTTPURIs(canbeProvideusefulinformationusingopenWebstandards(e.g.Includelinkstootherrelated5Linkedopendata(LOD)1,000+
lifesocial
6Knowledge KnowledgeGraphisaknowledgebaseusedby toenhanceitssearchengine’ssearchresultswithsemantic-searchinformationgatheredfromawidevarietyofsources?知識(shí)圖譜是使用的一個(gè)知識(shí)庫(kù), 亦可看作是一張巨大的圖,節(jié)點(diǎn)表實(shí)體或概念,邊則由屬性或關(guān)系 除了關(guān) (部分)真實(shí)世界的一個(gè)模 引入領(lǐng)域相關(guān)的 指定術(shù)語(yǔ)的含義(語(yǔ)義 使用合適的邏輯來(lái)形 描述 HeartisamuscularorganispartofthecirculatoryI.Horrocks.Ontologiesandthesemanticweb:thestorysofar. 大規(guī)模知識(shí)庫(kù)/圖譜規(guī)英文:4百萬(wàn)個(gè)實(shí)體,5億個(gè)RDF三元125種1千萬(wàn)個(gè)實(shí)體,1.2億個(gè)RDF三元4千萬(wàn)個(gè)實(shí)體,10億個(gè)RDF三元 知識(shí)圖譜6億個(gè)實(shí)體,35億條RDF三元WolframAlpha計(jì)算知識(shí)引擎,CMUNELL,知心,搜狗知立9知識(shí)圖譜的技術(shù)族知識(shí)體已有知識(shí) 知識(shí)圖譜提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Sincelonglongtimes SyntacticSchema- e.g.,“WeiHu”vs.Schema- Terminological e.g.,“notebook”vs.Data-entityData-entity Pragmatic OntheSemantic Datahasexplicitsemantics,richlinks,Ontology Thepopularityofontologiesisrapidlygrowing,andthenumberofontologiescontinuesincreasing Ontology Theprocessofdeterminingcorrespondencesbetween 本體匹配即發(fā)現(xiàn)一個(gè)三元組????????>,包括一個(gè)源本體??,一個(gè)目標(biāo)本體??’,以及一個(gè)映射單元的集合??={??1??2????}。其中,????表示一個(gè)基本的映射元,可以寫成????=<????,??????>的四元 ????為映射單元的標(biāo)識(shí)符,用于唯一標(biāo)識(shí)該四元 ??,??’分別為??,??’中的術(shù) ??表示????’之間的相似度,滿足??//另外,可以有??表示??,??’之間的關(guān)系,常見(jiàn)的關(guān)系有等本體匹配:消除模式 (驅(qū)動(dòng)的)Stateofthe語(yǔ)言學(xué)特征 本體中術(shù)語(yǔ)的語(yǔ)言學(xué)描 本地名(localnForanameNinanamespaceidentifiedbyaURII,thenamespacenameisI.ForanameNthatisnotinanamespace,thenamespacenamehasnovalue.Definition:IneithercasethelocalnameisN.n -->local 注釋 其他:foaf:name、dc:title語(yǔ)言學(xué)特征 本體語(yǔ)言學(xué)特征使用現(xiàn)狀的調(diào) 本地名使用多,有一些 注 鄰居未充 詞典查詢耗√√√√√√類√√機(jī)器學(xué)√排序、S-類√ Edit 指兩個(gè)字串之間,由一個(gè)轉(zhuǎn)成另一個(gè)所需的最少編輯操作次 編輯操作包括替換、插入、刪 一般來(lái)說(shuō),編輯距離越小,兩個(gè)字串的相似度越 I-Sub:??????(??1,??2)=????????(??1,??2)?????????(??1,??2)+??????????????(??1, biggestcommonsubstringtwo thelengthofunmatchedresultedfrominitialmatching 術(shù)語(yǔ)的語(yǔ)言學(xué)描述:本地名 、注 結(jié)點(diǎn)的語(yǔ)言學(xué)描述:前向鄰居的語(yǔ)言學(xué)描 術(shù)語(yǔ)的鄰居:主語(yǔ)鄰居、謂語(yǔ)鄰居、賓語(yǔ)鄰 術(shù)語(yǔ)的虛擬文檔:自身+???????? =???????? +??3???????? +??1 向量空間模型:TF-Stringsimilaritymetrics Lessthantwowordsperlabel:Jaro- Twoormorewordsper Synonyms:SoftJaccard,withLevensteinbase Nosynonyms:SoftJaccard,withLevensteinbase Lessthantwowordsperlabel:TF- Twoormorewordsper Synonyms:SoftTF-IDF,withJaro-Winklerbase DifferentLanguages:SoftTF-IDF,withJaro-Winklerbase Other:SoftTF-IDF,withJaro-Winklerbase結(jié)構(gòu)特征 Intuition:termsoftwodistinctontologiesaresimilarwhenadjacenttermsarennSimilarity?^_`??, =?^??, +
ij,k,lcl,k,ir
?^(??e,??e)g??(??e,??e,(??,?^(??q,??q)g??(??q,??q,(??,實(shí)例數(shù)據(jù) Machine Jointprobability Instance Content Name Meta Relaxation
搜索引擎 distance sbetween -basedsimilarity????????, =maxlog????,log?? ?log??(??,log???min{log????,log ?? isthenumber hitsforthesearchterm ?? isthenumber hitsforthesearchterm ????, isthenumber hitsforthetupleofsearchterms?? ??isthenumberofwebpagesindexed (??≈10`x)Ontologymatching Falcon- New Alotof(semi-)automaticalgorithmsand Mostareonlyapplicableforsmall ManyapplicationsrequirematchingBIG Medicineandbiology:GALEN,FMA, Agricultureandfood:AGROVOC, Librarycollections:Brinkman, Commonknowledge:DBpedia,
≥10K Adivide-and-conquer1.ontologypartitioningà2.blockmatchingà3.termRunningNewdirectionsnHolisticontologynIncreasingamountofdataàsimultaneouslymatchingnInput:asetΩ={??1,…,????}ofontologieswith??>2nOutput:??=??12∪??13∪??23∪?nGuaranteetofindalwaysthesameAglobaloptimal Limitationofpairwise ??isconsideredasalocalsolutiondependingoftheorderwhichtheontologymatchingiscarried e.g.??12∪??`}~≠??13∪??`~}≠??23∪??}~Holisticontology Extending um-weightedgraphmatchingproblemwithconstraints(cardinality,structuralandcoherence Threetypesof Class,objectproperty,data Representvirtualconnectionsbetweenthesametypesof Haveweightstorepresentsimilaritiesbetweenthe Correspondences(1:1)with umweight?à Linearconstraints:binary Classdecision disjoint 提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Entity SemanticWebdatahavereachedascaleinbillionsof Manydifferententitiesrefertothesamereal-world TypicallydenotedbyURIs,fromdistributeddata e.g.Wei? Entitylinkage:linkdifferententitiesthatrefertothesame a.k.a.coreferenceresolution,entitymatching,recordlinkage Theentitymatchingproblemwasoriginallydefinedin1959by beetal.andwasformalizedbyFelligiandSuntertenyearslater Outof31BRDFstatements,lessthan500Marelinksacross 實(shí) 的識(shí) 數(shù) 的消 消除描述這些標(biāo)識(shí)符RDF數(shù)據(jù)之StateStateofthe Stateofthe InLOD,millionsofentitieshavealreadybeen However,potentialcandidatesarestill Current owl:sameAs,inversefunctionalpropertiesSimilaritycomputation(alsointhedatabase ComparepropertiesandvaluesofEquivalence AnRDFtriple:???,??,???∈(??∪??)×??×(??∪??∪ Same-asrelation: ???,owl:sameAs,???à???,???∈??and???,???∈ Inversefunctionalproperty(IFP)relation: IFP:avaluecanonlybethevalueofthispropertyforasingle e.g.,??1,foaf:mbox,??,??2,foaf:mbox,??à???1,??2?∈??and???2,??1?∈ Functionalproperty(FP)relation: Cardinalityrelation: owl:cardinality/owl:maxCardinality= ??=??∪??∪??∪??+,??isanequivalenceSimilarity Similarity LinkSimilarity 問(wèn)題一般為以下形??,?? ????????,??>??,??∈??,??∈?? ??和??是兩個(gè)字符串集合,??是相似度 時(shí)間復(fù)雜度為:??(??}??}) 現(xiàn)有的常規(guī)的方法是“過(guò)濾—驗(yàn)證”框 過(guò)濾階段:使用各種過(guò)濾方法縮小候選集大 常見(jiàn)方 All-Pairs,ED-Join,PPJoin,PassJoin Na?vepairwise:??}pairwise 1,000businesslistingseachfrom1,000differentcitiesacrossthe 1trillioncomparisons,11.6days(ifeachcomparisonis1 Mentionsfromdifferentcitiesareunlikelytobe Blockingcriterion: 1billioncomparisons,16minutes(ifeachcomparisonis1 Hashbased Pairwisesimilarity/neighborhoodbasedblocking Simpleblocking:invertedMachine Alinkage Learning Genetic ActiveSelectslinkcandidatestobelabeledbyaAhumanexpertlabelstheselectedlinkascorrectorincorrectThegeneticprogrammingalgorithmevolvesthepopulationoflinkagerules InLOD,millionsofentitieshavealreadybeen However,potentialcandidatesarestill Current Atpresent,probablymissmanypotentialSimilarity Toimprove,machine Time-consuming,labor-intensivetobuildalarge-scaletrainingDefinitionDefinition1.LetUbethesetofentitiesinasetDofdatasources.Given,theentitylinkageforuistoqueryaofforwhicharelationεwhereεlinksalltheentitiesinUthatrefertothesameobjectasudoes,arecoreferentwithHowtocombine?Oursolution: Query-drivenentity UseSearch/browsing–asystemknows“whattolink”onlyatqueryyzesmallportionsofaverylargedatasettoansweron-demandOurAutomaticallyinfersemanticallyentitiesbasedonOWL/SKOS
Output:aof
an
1Builda(Initializetraining
LabeledSomepropertiestousetogether
External
LearnUnresolved
Assumptions:(1)coreferententitiessharesimilarproperty-valuepairs;(2)afewproperty-valuepairsaremoreimportantforlinkingentitiesRunning
“Nanjing“32N“118E“Nanjing“Nan-ching”“Nanjing”“32N“118E“117W“32NSome Discriminabilityofaproperty Property Non-coreferententity intermsofcoreferent Discriminabilityofavalue Discriminabilityofaprop-value>100>100RDF>2Same-asIFPFP2 BillionTriplesChallenge(BTC) Testing Top-50in364thousandquery8Music/54323 Evaluationprocedureand 30graduates,2judges+1arbitrator/link,Fleiss’sκ=0.8(sufficient Precision&relativerecall RR=correctlinksinonesystem/totalcorrectuniquelinksinall umiteration= Discriminabilitythreshold= Linkage Runningtimeon5,000samples:avg.11.3linksin OntologyAlignmentEvaluation ISWCworkshopsincen Ontologymatching&instancematching提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Metadataisvitaltomultimediacontent Search,browsing,management Large-scaleLODarepublishedand Makeuseofsuchrichsourceof Existingmultimediametadatamodelsanddonotprovideformaltypicallyfocusonasinglemedia EXIF patiblewithMPEG- Differentmediatypesco-existinamultimedia Amoviemayhaveathememusicanda Aunified,well-definedontology(withits stoothers)neededtogain Challenge:LinkandintegrateheterogeneousAmotivingBeautyandthe Low-levelmetadata:runtime,location LOD:LinkedMDB,DBpedia Differentontologies(terms),different linkedmdb:filmlinkedmdb:directorlinkedmdb:11264 "BeautyandtheBeast"
andentitylinkageBeautyBeautyandtheruntime"91min." location
"BeautyandtheBeast" "...isa1991Americananimated
"BeautyandtheBeast"
Our CAMO:enrichmultimediametadataviaintegratingSelectDBpediaasthemediationandmatchwithLinkDBpediaentitieswithother andaggregatetheirIncorporatelegacyrelationaldatabases Moreover,provideamobileappforbrowsingandmultimediacontentonAndroid AssesstheadvantagesofintegratingLODintomultimediaSystemClient-ServerServer TheDBpedia3.6ontologyas Global-as-Viewsolutionof Music:DBpedia,DBTune, Movie:DBpedia,LinkedMDB,
Client Android-basedmobile Integratewithamultimedia Search&browsemultimediaSystemSearch,browseand
InstancemobileInstanceJohnrelationalEntityOntologyDataMatchingontologieswith DifferentLODsourceshavedifferentpreferenceson DBpedia,Musicontology Falcon-AO:anautomaticontologymatching Extend knowledgetosupportsynonym trackvs. StructuralStructural
4 Linguisticmatching:V-Doc(TF-IDF)&I-Sub(edit Structuralmatching:GMO(similarityLinkingentitieswith EntitylinkagehelpsmergealldescriptionsindifferentsourcesthattothesamemultimediaTrainingTraining
2
{p1,{p1,p3}?c1vs.{p5,p6}?c3{p1,p2}?c1vs.{p3,p4}?c2Instancelinkage?
Trainingset Negativeexamples:donotholdequivalencerelation Class-baseddiscriminativeproperty Information OnlineIntegratinglegacyrelational Therearestillagreatdealoflegacydatastoredin SomedatainLODaregeneratedfromtheirrelational123123 Element e.g.,entitytableandrelationship Element Instance
similartoontologymatchingandentitylinkage TwoUsabilityandeffectivenessofthemobileIntegrationaccuracyinthe User(1)Usability& 3comparative : : :WikipediaAndroid 6testing 50 10 22 18Usability& SystemUsabilityScale(SUS)&post-task
Post-task yzetheresultaccordingtothetypologyoftheIntegrationOntology 78 incl.18RDB
Entity 60thousand 100samplesper10110Lessons CAMOleveragesontologymatchingandentitylinkagefordataintegrationandsupportsuserstobrowseandsearchmultimediacontentonmobiledevices LessonsOntologymatters:trade-offbetweenexpressivenessandeaseofDataintegrationquality:humancomputation+machineMobileappdesign:conciseness,rankingscheme,user- FutureGeneratecomplex sforsemanticqueryExtendtouser-generatedNLP提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicati
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 7 呼風(fēng)喚雨的世紀(jì) 教學(xué)設(shè)計(jì)-2024-2025學(xué)年統(tǒng)編版語(yǔ)文四年級(jí)上冊(cè)
- 2025年輪胎動(dòng)平衡試驗(yàn)機(jī)項(xiàng)目發(fā)展計(jì)劃
- Unit 1 Life choices Writing Workshop教學(xué)設(shè)計(jì) 2024-2025學(xué)年高中英語(yǔ)北師大版必修第一冊(cè)
- 開(kāi)店合股合同范本
- 2-1《改造我們的學(xué)習(xí)》教學(xué)設(shè)計(jì) 2023-2024學(xué)年統(tǒng)編版高中語(yǔ)文選擇性必修中冊(cè)
- 10 能源開(kāi)發(fā)與利用 教學(xué)設(shè)計(jì)-2023-2024學(xué)年科學(xué)六年級(jí)下冊(cè)青島版
- 2023-2024學(xué)年人教版高中信息技術(shù)必修一第四章第二節(jié)《利用智能工具解決問(wèn)題》教學(xué)設(shè)計(jì)
- 7的乘法口訣(教學(xué)設(shè)計(jì))-2024-2025學(xué)年數(shù)學(xué)二年級(jí)上冊(cè)蘇教版
- 5 觀察物體(一)(教學(xué)設(shè)計(jì))-2024-2025學(xué)年二年級(jí)上冊(cè)數(shù)學(xué)人教版
- 2 我們有精神 ( 教學(xué)設(shè)計(jì))2023-2024學(xué)年統(tǒng)編版道德與法治一年級(jí)下冊(cè)
- 風(fēng)電場(chǎng)升壓站培訓(xùn)課件
- 志愿服務(wù)證明(多模板)
- 小區(qū)門窗拍攝方案
- 初中歷史期中考試分析報(bào)告
- 企業(yè)反商業(yè)賄賂法律法規(guī)培訓(xùn)
- 2023合同香港勞工合同
- 材料化學(xué)課件
- 智能傳感器芯片
- -《多軸數(shù)控加工及工藝》(第二版)教案
- 智能交通概論全套教學(xué)課件
- 生物醫(yī)學(xué)工程倫理 課件全套 第1-10章 生物醫(yī)學(xué)工程與倫理-醫(yī)學(xué)技術(shù)選擇與應(yīng)用的倫理問(wèn)題
評(píng)論
0/150
提交評(píng)論