




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
知識(shí)圖譜架構(gòu)知識(shí)圖譜一般架構(gòu):[來源自百度百科]復(fù)旦大學(xué)知識(shí)圖譜架構(gòu):早期知識(shí)圖譜架構(gòu)知識(shí)圖譜一般架構(gòu):[來源自百度百科]架構(gòu)討論早期知識(shí)圖譜架構(gòu)知識(shí)抽取實(shí)體概念抽取實(shí)體概念映射關(guān)系抽取質(zhì)量評(píng)估KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014A
sampler
of
research
problems?????????????Growth:
knowledge
graphs
are
incomplete!
Link
prediction:
add
relations
Ontology
matching:
connect
graphs
Knowledge
extraction:
extract
new
entities
and
relations
from
web/textValidation:
knowledge
graphs
are
not
always
correct!
Entity
resolution:
merge
duplicate
entities,
split
wrongly
merged
ones
Error
detection:
remove
false
assertionsInterface:
how
to
make
it
easier
to
access
knowledge?
Semantic
parsing:
interpret
the
meaning
of
queries
Question
answering:
compute
answers
using
the
knowledge
graphIntelligence:
can
AI
emerge
from
knowledge
graphs?
Automatic
reasoning
and
planning
Generalization
and
abstraction9關(guān)系抽取定義:常見手段:語義模式匹配[頻繁模式抽取,基于密度聚類,基于語義相似性]層次主題模型[弱監(jiān)督]KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Methods
and
techniques???Supervised
modelsSemi-supervised
modelsDistant
supervision2.
Entity
resolution?Single
entity
methods?Relational
methods3.
Link
prediction????Rule-based
methodsProbabilistic
modelsFactorization
methodsEmbedding
models80Notinthistutorial:
?Entityclassification?Group/expertdetection?Ontologyalignment?Objectranking 1.Relationextraction:KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014?
Extracting
semantic
relations
between
sets
of
[grounded]
entities?Numerous
variants:?????Undefined
vs
pre-determined
set
of
relationsBinary
vs
n-ary
relations,
facet
discoveryExtracting
temporal
informationSupervision:
{fully,
un,
semi,
distant}-supervisionCues
used:
only
lexical
vs
full
linguistic
features82Relation
Extraction
Kobe
BryantLA
LakersplayForthe
franchise
player
ofonce
again
savedman
of
the
match
forthe
Lakers”his
team”Los
Angeles”“KobeBryant,“Kobe“KobeBryant?KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Supervised
relation
extraction?Sentence-level
labels
of
relation
mentions??"Apple
CEO
Steve
Jobs
said.."
=>
(SteveJobs,
CEO,
Apple)"Steve
Jobs
said
that
Apple
will.."
=>
NIL?Traditional
relation
extraction
datasets???ACE
2004MUC-7Biomedical
datasets
(e.g
BioNLP
clallenges)??Learn
classifiers
from
+/-
examplesTypical
features:
context
words
+
POS,
dependency
path
betweenentities,
named
entity
tags,
token/parse-path/entity
distance83KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Semi-supervised
relation
extraction?Generic
algorithm(遺傳算法)1.2.3.4.5.Start
with
seed
triples
/
golden
seed
patternsExtract
patterns
that
match
seed
triples/patternsTake
the
top-k
extracted
patterns/triplesAdd
to
seed
patterns/triplesGo
to
2?????Many
published
approaches
in
this
category:
Dual
Iterative
Pattern
Relation
Extractor
[Brin,
98]
Snowball
[Agichtein
&
Gravano,
00]
TextRunner
[Banko
et
al.,
07]
–
almost
unsupervisedDiffer
in
pattern
definition
and
selection86founderOfKDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distantly-supervised
relation
extraction88???Existing
knowledge
base
+
unlabeled
text
generate
examples
Locate
pairs
of
related
entities
in
text
Hypothesizes
that
the
relation
is
expressedGoogle
CEO
Larry
Page
announced
that...Steve
Jobs
has
been
Apple
for
a
while...Pixar
lost
its
co-founder
Steve
Jobs...I
went
to
Paris,
France
for
the
summer...GoogleCEO
capitalOfLarryPageFrance
AppleCEO
PixarSteve
JobsDistant
supervision:
modeling
hypotheses
Typical
architecture:
1.
Collect
many
pairs
of
entities
co-occurring
in
sentences
from
text
corpus
2.
If
2
entities
participate
in
a
relation,
several
hypotheses:1.All
sentences
mentioning
them
express
it
[Mintz
et
al.,
09]
“Barack
Obama
is
the
44th
and
current
President
of
the
US.”
(BO,
employedBy,
USA)
89KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Sentence-level
features●●●●●Lexical:
words
in
between
and
around
mentions
and
their
parts-of-speech
tags
(conjunctive
form)Syntactic:
dependency
parse
path
between
mentions
along
withside
nodesNamed
Entity
Tags:
for
the
mentionsConjunctions
of
the
above
features
Distant
supervision
is
used
on
to
lots
of
data
sparsity
of
conjunctive
forms
not
an
issue92Distant
supervision:
modeling
hypotheses
Typical
architecture:
1.
Collect
many
pairs
of
entities
co-occurring
in
sentences
from
text
corpus
2.
If
2
entities
participate
in
a
relation,
several
hypotheses:1.2.All
sentences
mentioning
them
express
it
[Mintz
et
al.,
09]At
least
one
sentence
mentioning
them
express
it
[Riedel
et
al.,
10]
“Barack
Obama
is
the
44th
and
current
President
of
the
US.”
(BO,
employedBy,
USA)
“Obama
flew
back
to
the
US
on
Wednesday.”
(BO,
employedBy,
USA)
95KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distant
supervision:
modeling
hypotheses
Typical
architecture:
1.
Collect
many
pairs
of
entities
co-occurring
in
sentences
from
text
corpus
2.
If
2
entities
participate
in
a
relation,
several
hypotheses:1.2.3.All
sentences
mentioning
them
express
it
[Mintz
et
al.,
09]At
least
one
sentence
mentioning
them
express
it
[Riedel
et
al.,
10]At
least
one
sentence
mentioning
them
express
it
and
2
entities
can
express
multiple
relations
[Hoffmann
et
al.,
11]
[Surdeanu
et
al.,
12]
“Barack
Obama
is
the
44th
and
current
President
of
the
US.”
(BO,
employedBy,
USA)
“Obama
flew
back
tothe
US
justWednesday.”
said.”
employedBy,
USA)
98KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014was
born
in
on
he
always
(BO,
(BO,
bornIn,KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Distant
supervision?Pros???Can
scale
to
the
web,
as
no
supervision
requiredGeneralizes
to
text
from
different
domainsGenerates
a
lot
more
supervision
in
one
iteration?Cons??Needs
high
quality
entity-matchingRelation-expression
hypothesis
can
be
wrongCan
be
compensated
by
the
extraction
model,
redundancy,
language
model?Does
not
generate
negative
examplesPartially
tackled
by
matching
unrelated
entities101KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014104
KobeBryantGasolteammatebornInplayInLeague
BlackMambaEntity
resolution
LA
Lakers
playFor
playFor
Pau35ageKobeB.
BryantVanessaL.BryantmarriedTo
1978Single
entity
resolutionRelational
entity
resolutionDEF:Weconsidertheentityresolution(ER)problem(alsoknownasdeduplication,ormerge–purge),inwhichrecordsdeterminedtorepresentthesamereal-worldentityaresuccessivelylocatedandmergedtheproblemofextracting,matching
andresolvingentitymentionsinstructuredandunstructured
dataMethodsEntityresolution/deduplication ?Multiplementionsofthesameentityiswrongandconfusing.KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Single-entity
entity
resolution??????????Entity
resolution
without
using
the
relational
context
of
entitiesMany
distances/similarities
for
single-entity
entity
resolution:
Edit
distance
(Levenshtein,
etc.)
Set
similarity
(TF-IDF,
etc.)
Alignment-based
Numeric
distance
between
values
Phonetic
Similarity
Equality
on
a
boolean
predicate
Translation-based
Domain-specific105KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014Relational
entity
resolution
–
Simple
strategies
?
Enrich
model
with
relational
features
richer
context
for
matching?Relational
features:??Value
of
edge
or
neighboring
attributeSet
similarity
measures?????Overlap/JaccardAverage
similarity
between
set
membersAdamic/Adar:
two
entities
are
more
similar
if
they
share
more
items
that
areoverall
less
frequentSimRank:
two
entities
are
similar
if
they
are
related
to
similar
objectsKatz
score:
two
entities
are
similar
if
they
are
connected
by
shorter
paths114
KobeBryant1978teammatebornInplayForplayInLeague
BlackMamba
LA
LakersplayFor35agePauGasolKDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014
KobeBryant1978teammatebornInplayForplayInLeague
BlackMamba
LA
LakersplayFor
35agePauGasolRelational
entity
resolution
–
Advanced
strategies?????Dependency
graph
approaches
[Dong
et
al.,
05]Relational
clustering
[Bhattacharya
&
Getoor,
07]Probabilistic
Relational
Models
[Pasula
et
al.,
03]Markov
Logic
Networks
[Singla
&
Domingos,
06]Probabilistic
Soft
Logic
[Broecheler
&
Getoor,
10]115KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014LINK
PREDICTION116KDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014117
KobeBryantLink
prediction
NY
Knicks
PauGasolteammateplayInLeagueteamInLeagueopponentplayForLA
Lakers
playFor
?
Add
knowledge
from
existing
graph?
No
external
source
?
Reasoning
within
the
graph1.
Rule-based
methods2.
Probabilistic
models3.
Factorization
models4.
Embedding
modelsKDD
2014
Tutorial
on
Constructing
and
Mining
Web-scale
Knowledge
Graphs,
New
York,
August
24,
2014First
Order
Inductive
Learner
?
FOIL
learns
function-free
Horn
clauses:???118Gasolgiven
positive
negative
examples
of
a
concepta
set
of
background-knowledge
predicatesFOIL
inductively
generates
a
logical
rule
for
the
concept
that
cover
all
+
and
no
-
LA
LakersplayFor
playFor
Pauteammate(x,y)∧
playFor(y,z)
?
playFor(x,z)
teammate
Kobe
Bryant?
Computationally
expensive:
huge
search
space
large,
costly
Horn
clauses?
Must
add
constraints
high
precision
but
low
recall?
Inductive
Logic
Programming:
deterministic
and
potentially
problematicKDD
2014
Tutorial
on
Constr
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 5到9章概括,昆蟲記
- 4年級(jí)上冊(cè)英語小報(bào)人教版第3單元
- 洛陽鏟施工方案
- 盤龍區(qū)施工方案
- 2025年浙江金融職業(yè)學(xué)院?jiǎn)握新殬I(yè)適應(yīng)性測(cè)試題庫(kù)參考答案
- 2025年海南職業(yè)技術(shù)學(xué)院?jiǎn)握新殬I(yè)傾向性測(cè)試題庫(kù)完整
- 2025年梧州職業(yè)學(xué)院?jiǎn)握新殬I(yè)適應(yīng)性測(cè)試題庫(kù)匯編
- 2025年重慶市廣安市單招職業(yè)適應(yīng)性測(cè)試題庫(kù)附答案
- 2025年鄂爾多斯職業(yè)學(xué)院?jiǎn)握新殬I(yè)傾向性測(cè)試題庫(kù)帶答案
- 北斗星基增強(qiáng)系統(tǒng)空間信號(hào)接口規(guī)范 第2部分:雙頻增強(qiáng)服務(wù)信號(hào)BDSBAS-B2a 征求意見稿
- TCCIIP 001-2024 綠色低碳園區(qū)標(biāo)準(zhǔn)
- GB/T 20972.2-2025石油天然氣工業(yè)油氣開采中用于含硫化氫環(huán)境的材料第2部分:抗開裂碳鋼、低合金鋼和鑄鐵
- 美團(tuán)供應(yīng)鏈管理案例分析
- 2025廣東深圳證券交易所及其下屬單位信息技術(shù)專業(yè)人員招聘筆試參考題庫(kù)附帶答案詳解
- 陜西省西安市西咸新區(qū)2024年九年級(jí)下學(xué)期中考一模數(shù)學(xué)試題(含答案)
- 2025年內(nèi)蒙古烏蘭察布盟單招職業(yè)適應(yīng)性測(cè)試題庫(kù)新版
- 2025年宜春幼兒師范高等??茖W(xué)校單招職業(yè)傾向性測(cè)試題庫(kù)含答案
- 《鈉離子電池產(chǎn)業(yè)發(fā)展白皮書》
- 全國(guó)交管12123駕駛證學(xué)法減分考試題附答案
- 2025中考作文預(yù)測(cè)
- 油氣田開發(fā)專業(yè)危害因素辨識(shí)與風(fēng)險(xiǎn)防控
評(píng)論
0/150
提交評(píng)論