專題論壇大數(shù)據(jù)課件

上傳人：q*** IP屬地：貴州上傳時(shí)間：2022-11-05 格式：PPT 頁(yè)數(shù)：87 大?。?0.93MB 積分：25 舉報(bào) 版權(quán)申訴

已閱讀5頁(yè)，還剩82頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

專題論壇大數(shù)據(jù)課件Big

Data

Smart

Model:Beauty

and

the

BeastProf.

Yike

GuoDepartment

ComputingImperial

College

LondonBigDatavsSmartModel:Prof.Model

Mathematical

Representation

SimplifiedPhysical

World

Modelling

essential

and

inseparable

part

all

scientific

activity.

scientific

model

seeks

represent

empirical

objects,

phenomena,

and

physical

processes

logical

and

objective

way

understand

the

world

object

(called

target

T),

modelM

simplified

mathematical

representation

it.

Model

the

result

abstraction

from

observations

made,

and

it’s

used

give

prediction

Human

SensorHuman

Machine

Human

Machine.Model:MathematicalRepresentNo

Model

Perfect:

Inherent

Uncertainty

These

targets

consist

set

continuous

phenomena

(in

both

time

and

space),

and

they

typically

produce

rich

signals.

Because

the

continuity

both

time

and

space

target,

the

signals

are

principle

infinite.

But

observations

(

e.g.

sensor

readings

)

are

made

discrete

points

time

and

space,

they

are

incomprehensive,

and

approximate,

which

brings

the

“uncertainty”.

Overfitting

Underfitting:

When

learning

model

from

observations,

such

learning

nonlinear

regression

model,

need

choose

the

parameters

such

Considering

the

fact

that

the

information

from

observations

partial

hard

make

perfect

choice

Such

imperfectness

causes

the

problem

model

error,

underfitting

(small

and

overfitting

(large

k).?

Simplification:

From

observations,

project

from

multi-dimensional

world

simplified

model

with

significant

reduced

dimensionality

focus

the

features

properties

are

interested

in.Nonlinearregression:

K-order

polynomialNoModelIsPerfect:?SimplGeorge

Box

(statistician)

“All

models

are

wrong,

but

some

areuseful.”

Only

models,

from

cosmological

equations

theories

humanbehavior,

seemed

able

consistently,

imperfectly,

explain

the

worldaround

us.

1980Peter

Norvig

(Google)

"All

models

are

wrong,

and

increasinglyyou

can

succeed

without

them."

2008Chris

Anderson

(Wired)

There

now

better

way.

Petabytesallow

say:

"Correlation

enough."

can

stop

looking

for

models.We

can

analyze

the

data

without

hypotheses

about

what

might

show.

Wecan

throw

the

numbers

into

the

biggest

computing

clusters

the

world

hasever

seen

and

let

statistical

algorithms

find

patterns

where

science

cannot.(The

Data

Deluge

Makes

the

Scientific

Method

Obsolete)20124So,

Why

Model

?GeorgeBox(statistician)The

Google

ArgumentAt

the

petabyte

scale,

information

not

matter

simple

three-

and

four-dimensionaltaxonomy

and

order

but

dimensionally

agnostic

statistics.

calls

for

entirely

differentapproach,

one

that

requires

lose

the

tether

data

something

that

can

visualizedin

its

totality.

forces

view

data

mathematically

first

and

establish

context

for

later.For

instance,

Google

conquered

the

advertising

world

with

nothing

than

appliedmathematics.

didn't

pretend

know

anything

about

the

culture

and

conventions

ofadvertising

—

just

assumed

that

better

data,

with

better

analytical

tools,

would

win

the

day.And

Google

was

right.Google's

founding

philosophy

that

don't

know

why

this

page

better

than

thatone:

the

statistics

incoming

links

say

is,

that's

good

enough.

semantic

orcausal

analysis

required.

That's

why

Google

can

translate

languages

without

actually"knowing"

them

(given

equal

corpus

data,

Google

can

translate

Klingon

into

Farsi

aseasily

can

translate

French

into

German).

And

why

can

match

ads

contentwithout

any

knowledge

assumptions

about

the

ads

the

content.TheGoogleArgumentAtthepetaModel

Free

Sensor

Informatics

Query

Driventime10am10am

..10amid12..7temp

…

29Database

Table

raw-dataSensorNetwork3.

Write

output

file/back

the

database4.

Write

data

processing

tools

process/aggregate

the

output

(maybe

using

User1.

Extract

all

readings

into

file2.

Run

MATLAB/R/other

data

processing

tools

DB)

Decide

new

data

acquire

RepeatModel-free

sensing

treats

the

sensory

system

database,

and

sensing

querying

fetch

data

from

physicalworld.

One

the

leading

vendors

[Crossbow]

bundling

query

processor

with

their

devices.ModelFreeSensorInformaticsWikisensing

Model

Free

Sensor

Informatics

SystemBased

Big

Data

ArchitectureWikisensing:AModelFreeSenModel

Free

Sensing

Super

Inefficient?

Data

misrepresentation

without

model?

Latent

information

missing

without

model?

High

demand

computation/storage

without

model?

Require

too

much

interoperability

between

sensorsand

analyticsModelFreeSensingisSuperInBayesian:

Data

Not

the

Enemy

Models

Rather

aGreat

Supporter!Bayesian

probability

formalism

that

allows

reason

about

beliefs

models

underconditions

uncertainty

based

the

observations

(data)

.If

have

observed

that

particular

event

has

happened,

such

Britain

coming

10th

themedal

table

the

2004

Olympics,

then

there

uncertainty

about

it.However,

suppose

the

statement

“Britain

sweeps

the

boards

2012

London

Olympics,winning

than

Gold

Medals!“

made

before

28th

JulySince

this

statement

about

future

event,

nobody

can

state

with

any

certainty

whether

ornot

true.

Different

people

may

have

different

beliefs

the

statement

depending

theirspecific

knowledge

factors

that

might

effect

its

likelihoodThe

belief’s

the

model

were

changing

daily

based

the

performance

data

available

eachday.

the

August,

most

people’s

belief

this

model

should

almost

80%Thus,

general,

person's

subjective

belief

statement

will

depend

some

body

ofknowledge

write

this

P(a|K).

Henry's

belief

different

from

Marcel's

because

theyare

using

different

K's.

However,

even

they

were

using

the

same

they

might

still

havedifferent

beliefs

a.The

expression

P(a|K)

thus

represents

belief

measure.

Sometimes,

for

simplicity,

when

Kremains

constant

just

write

P(a),

but

you

must

aware

that

this

simplification.Bayesian:DataIsNottheEneModel

and

Data

Interaction

Bayesian

Inference10?Bayes

Rule:

Interaction

between

data

and

model?Learning

Sequence

Interactionsp(Y

)

p(Y)P(

ModelandDataInteraction:BBig

Data

Meets

Smart

Models

Bayesian

Approachtowards

Sensor

Informatics?We

need

model

the

representation

our

knowledge

far?????Data

the

observations

which

may

revise

our

belief

the

models

haveAnalysis

assessing

our

belief

and

updating

our

models

make

them

believableSensing

acquiring

needed

data

update

(enrich)

modelsModels

are

learned

from

data

(observations)

scientists

(theoretical

abstraction)

machine

(machinelearning)

Models

are

hypothesis

(

when

making

new

observation)

Models

are

knowledge

(when

established

belief)Sensor

Informatics:

Sensing

management

Managing

the

“neediness”

when

and

where

sense

Sensing

analytics

Managing

model

updating

how

enrich

models

with

observations

Reasoning

Decision

making

based

integration

trusted

models

?P(M

P(D

)

P(M)

P(D)BigDataMeetsSmartModels:

Surprising

Event

When

Observation

Does

not

Fit

Known

Model

Posterior

and

prior

(P(M|D)

P(M)

)

has

great

variance

surprise!How

great

variance?

Surprise

threshold

αKullback-Leibler

divergence:Other

methods:

signficant

level,

Chebyshev’s

Theorem,

…

From

model,

get

C(A,

(e.g.

multivariate

Gaussian

distribution)

100mm

50mmModel

consistentA:

100mmB:

500mmSurprise! SurprisingEvent:WhenanObCamera

example:

Image

Analog

Signal

->Digital

Data

Compressed

Data

InformationWhy

sensing

much

data

and

then

throw

themaway?Why

not

sensing

information

directly?Using

Compressive

Sensing

Technology

OptimizeObservations

Compressive

sensing:

Take

the

advantage

sparseness,

solve

the

under-determined

signals

with

just

small

amount

measurement.

Unobserved

behavior

(behavior

not

captured

the

current

model)

typically

sparse.Reconstruction

method:

L1-min,

Bayesian

CS.Sensing

data

enough

when

can

recover

the

need

information

through

compressive

sensing.Ψ:

Matrix

built

from

the

modelΦ:

Placement

MatrixCameraexample:Image->AnaloHow

Update

Model

–

Parameter

Estimation1Y131.03188.294245.559302.823360.088417.352474.617531.881589.146646.41DEC

2011

21:15:23NODAL

SOLUTIONSTEP=360SUB

=1TIME=1800TEMP

(AVG)RSYS=0SMN

=131.03SMX

=646.41

XEstimating

parameter

maximize

the

likelihoodof

data

given

the

model:HowtoUpdateModel–ParametModel

Example

Digital

CityModelling

City

Life

via

Causality

C(eA,

eB)

used

for

predict

current

value

location

(A)

whenanother

location

(B)

value

given

Location

physical

logical

locations

with

causality

(through

sensory

cortex)(city

areas,

Relationship

topology

(geo

topology

between

and

diffusion

Structure

)

Event:

events,

which

the

dynamics

observable

signal

f(E)

(heavyrainfall)Model:AnExampleinDigitalOntologies

are

adopted

represent

locations

relationships

R*events

and

signals

S.Diffusion:

event

e1∈

n1causes

another

event

∈

n2,when

two

nodes

n1,

arelinked.

Digital

City

Model

looking

into

the

detailsSystem

(L,

E)Model

M(T)

(G,

B)Training

for

causality

use

Bayesian

network

represent

theconditional

independencies

between

cause

and

target

variables:1.

Gaussian

Mixture

Models

(GMMs),

estimated

via

expectationmaximization

(EM)

Gaussian

Process

with

Bayesian

Inference.Ontologiesareadoptedtorepr

When

the

surprise

threshold

Diversity

detected

identify

the

incorrect

causality

C(el,

ep),

which

sparse

Compressive

sensing

approachNew

observation->

measurement

thatcould

revise

model

space

tomaximize

the

likelihood

observations

Focusing

diversityPlacementModel

Updating

Model

Driven

Sensing

Surprise

The

dynamics

model

update:

Surprise

Sensing

Model

Updating

The

goal

for

sensing:

Capturingsurprise

The

goal

analysis

RevisingmodelA

model

cannot

overfit

underfit,

when

there

diversity,

could

updated->

consistent

with

the

universe

(target) Whenthesurprise>surpriseModel

UpdateIt’s

Bayesian:

P(M,

P(D

P(M,

P(D)T:

target,

model,

top-down

parameter*

When

fixed:

P(M

P(D

P(M)

P(D)->

The

variance

between

posterior

and

prior

“surprise”->

bottom-up

attention

model

update

(data

assimilation):combining

observations

the

current

state

system

with

the

resultsfrom

model

(the

forecast)

produce

analysis.

The

model

thenadvanced

time

and

its

result

becomes

the

forecast

the

nextanalysis

cycle*

When

updated:

P(M,

P(M

?)P(?)->

top-down

attention

(alertness)

model

updateModelUpdateIt’saBayesian:PAdaptive

Observation:

Sensing

and

Numerical

ModellingCityGML

Ontology

GIS

Geometry

meshAdaptiveObservation:SensingBuilding

Initial

Model

and

Making

Prediction

bySimulationsSetting

boundary

conditions,

numerical

schemas,

model

parameters,

etc.BuildingAnInitialModelandSimulation24

Building

Case

(Fine

Mesh

–

600000

Nodes):

ProcessorsSimulation24BuildingCase(FiSimulationMoving

Vehicles

and

Scalar

Dispersions

Street

CanyonsSimulationMovingVehiclesandUsing

Sensor

Verify

the

Prediction

Results

theModel

Sensing:

Acquiring

data

get

posterior

model,

for

validate

(consistent)

update

model

P(M

P(D

P(M)

P(D)Data

sensingModelvalidateupdateUsingSensortoVerifythePreNew

WikiSensing:

Elastic

Sensing

Environment

forLarge

Scale

Sensor

Informatics?

Elastic

sensing

theory

based

Bayesian

inference?

Big

Data

architecture

for

large

scale

sensory

data

management?

Ontology

for

the

background

knowledge

management?

Model

driven

adaptive

observation

support?

Digital

City

and

digital

life

applicationsNewWikiSensing:ElasticSensiThe

architecture

the

New

WikiSensing

SystemThearchitectureoftheNewWiOntology

Used

Organise

the

Complex

knowledgemanagementUsing

ontology

represent

the

targets,

signals,sensing

methods,

measurements,

etc.Ontology

support

flexible

resolution

Upper

ontology

for

unified

operationOntoSensorOntologyUsedtoOrganisetheConclusion?

Big

data

offers

great

opportunity

for

building

smart

models?

Big

data

provides

new

methodology

for

model

research?

New

informatics

comes

from

the

coupled

integration

the

data

and

the

model

worlds?

Bayesian

theory

provides

nature

foundation

for

such

integration?

Sensor

Informatics

good

example

for

such

paradigm?

new

uniform

framework

sensor

informatics

can

developed

based

the

Bayesian

theory

wherethe

dynamics

data

and

model

capturing

the

essence

building

sensory

system?

are

developing

the

WikiSensing

system

realise

this

paradigmConclusion?BigdataoffersThank

youThankyouUnderstanding

Big

DataHaixun

WangUnderstandingBigDataHaixunWData

ExplosionMB

106

bytesa

typical

book

text

formatGB

109

bytesa

one

hour

video

about

1GB;data

produced

biologyexperiment

one

dayTB

1012

bytesastronomy

data

one

night;US

Library

Congress

has

1000

data;search

log

Bing

per

day

(2009)DataExplosionMB=106bytesaThe

Arecibo

TelescopeWorld’s

largest

radio

telescopeDiameter

305

(1,000

ft)Area

acresLocation:

Arecibo,

Puerto

RicoThe

P-ALFA

surveys800

Terabytes

yearsTheAreciboTelescopeWorld’slSoftware

Driven

Telescopefrom

few,

large,

expensive,directional

dishes

many,

small,cheap,

omni

directional

antennaea

large

number

high-speedinput

streams(2Gbps

per

antenna,

25,000antennae

area

340

indiameter)SoftwareDrivenTelescopefromData

sizeChallenge

It’s

the

data,

stupid!Data

complexityKey/value

storeColumn

storeDocument

storeGraph

SystemsDatasizeChallenge1:It’stheBig

data

drives

tomorrow’s

economy.?

The

value

big

data

lies

its

degree

ofconnectedness.?

Existing

systems

cannot

handle

richconnectedness

big

data.Bigdatadrivestomorrow’secoRDBMS

and

Rich

Relationships?

Performance

multi-way

joins

very

poor

inRDBMS?

Managing

data

rich

connectedness

requiresmulti-way

Joins

RDBMSRDBMSandRichRelationships?Trinity?

general

purpose,

distributed,

memory

graph

system?

Online

graph

query

processing?

Offline

graph

analyticsTrinity?Ageneralpurpose,dTrinity

Performance

Highlight?

Onlinequeryprocessing

:–

visiting

2.2

million

users

hop

neighborhood)

Facebook:

100ms–

foundation

for

graph-based

service,

e.g.,

entity

search?

Offlinegraphanalytics

:–

one

iteration

billion

node

graph:

60sec–

foundation

for

analytics,

e.g.,

social

analyticsTrinityPerformanceHighlight?PeopleSearchDemoPeopleSearchDemoMulti-way

Join

vs.

Graph

TraversalCompanyIncidentProblem…IDCompanyID1ID2ID…IncidentID3ID4ID…ProblemRDBMSTrinityMulti-wayJoinvs.GraphTraveChallenge

Interpretation

Big

Data?

IBM

Watson:–

Runs

2,880

cores,

terabytes

RAM,

and80kW

power?

human

brain:–

Runs

tuna

fish

sandwich

and

glass

waterChallenge2:Interpretationofansweringthe

questionunconstrainednatural

languageinferencing

&reasoningdomain

specificlanguagesimplecalculation

Human(Turing

Test)SIRI

Watson

Wolfram

AlphaGoogle/Bing?

the

Eternal

Questunderstanding

the

question

SQLcalculatoransweringthequestionunconstraTurning

the

Web

intoa

DatabaseTurningtheWeb intoWhat

you

see

when

you

look

homepage

…Haixun

WangMicrosoft

Research

AsiaEmail:

haixunw

microsoft

comTel:

+86-10-58963289Tel:

+1-914-902-0749I

joined

Microsoft

Research

Asia

2009.I

was

with

IBM

Watson

ResearchCenter

from

2000

2009.

received

theB.S.

and

M.S.

Degree

Computer

Sciencefrom

ShanghaiJiaoTongUniversity

in1994

and

1996,

the

Ph.D.

Degree

inComputer

Science

fromUniversityofCalifornia,LosAngelesin

June,

2000.WhatyouseewhenyoulookatAWhat

machine

sees

when

looks

homepage

…A

JPEG

Imagea

jpeg

Filetext

bigA

bold

fontA4

lines

textanother

dozen

lines

oftext

with

twoembedded

URLsAWhatamachineseeswhenitl專題論壇大數(shù)據(jù)課件Semantic

Web??

Number

trend

2008–

Richard

MacManus?

The

infrastructure

power

theSemantic

Web

already

here.–

Tim

Berners-Lee?

Unstructured

information

will

give

way

structuredinformation

–

paving

the

road

intelligent

computing.–

Alex

IskoldSemanticWeb??Number1tren專題論壇大數(shù)據(jù)課件More

data

beats

better

algorithmsBanko

and

Brill

2001MoredatabeatsbetteralgoritMean

translation

quality(1=incomprehensible,

perfect)English-Spanish

translation

quality,Microsoft

technical

texts2.5

23.52001200220032004200520062007Systran

Improvealgorithms,

scale

system,and

add

data!Rule-based

system

with

expensive

customizations

for

Microsoft3

MSRMT

Logos

Off-the-shelfrule-based

systemFrom

Rick

Rashid’s

talk:

It’s

data

driven

world

–

get

over

it!Meantranslationquality(1=incProbase

isA(concept,entities)isPropertyOf

(attributes)Co-occurrence

(isCEOof,

LocatedIn,etc)Concepts

(“SpanishArtists”)Entities

(“PabloPicaso”)Probase isAisPropertyOfCo-occuExplicit

vs.

Latent

Knowledge?

Abstract

representations

(such

clustersfrom

latent

analysis)

that

lack

linguisticcounterparts

are

hard

learn

validate

andtend

lose

information.?

Human

language

has

evolved

over

millennia

tohave

words

for

the

important

concepts;

let’suse

them.Halevy,

Norvig,

Pereira,

“The

Unreasonable

Effectiveness

Data”,

IEEE

Intelligent

Systems,

2009.Explicitvs.LatentKnowledge?What

interpretation?Whatisinterpretation?Add

Common

Sense

ComputingPablo

Picasso

Oct

1881SpanishAddCommonSensetoComputingPWhich

“kiki”

and

which

“bouba”?Whichis“kiki”andwhichis“soundshapezigzaggednesssoundshapezigzaggednessChinaIndiacountryBrazilemerging

marketChinaIndiacountryBrazilemerginbodytastesmell

winebodytastesmellIT

companyThe

engineer

eating

applefruitITcompanyTheengineeriseat

Multiple

ConceptsObama’s

real-estatepolicypresident,

politicianinvestment,

property,

asset,

plan,

documentpresident,

politician,investment,

property,

asset,

plan,

document MultipleConceptspresident,pMultiple

Concepts

applesoftware

company,

brand,

fruit,

juice

adobebrand,

software

company,

materialsoftware

company,software

manufacturer,

brand

juice,

materialbrand,

company,

fruit,MultipleConcepts apple adobes

Multiple

ConceptsObama’s

real-estatepolicypresident,

politicianinvestment,

property,

asset,

plan,

documentpresident,

politician,investment,

property,

example

plan,

documentthing,

issue,

term,

asset, MultipleConceptspresident,pExample:

(from

Dolan)Who

assassinatedAbraham

Lincoln?Example:(fromB.Dolan)WhoasThe

far

reaching

implicationsScientific

MethodThefarreachingimplicationsSScientific

MethodScientificMethodWhat

really

counts

isunderstandingora

mastery

some

commonvocabularyWhatreallycountsisunderstanHow

can

big

data

help?A

much

rapid

cycle

hypothesisgeneration

and

testing?

General

access

toknowledge

science?

Autonomousexperimentation,

withan

‘a(chǎn)ctive

learning’modelHowcanbigdatahelp?AmuchmTechnological

Singularityif

machines

could

even

slightly

surpass

human

intellect,

they

could

improve

theirown

designs

ways

unforeseen

their

designers,

and

thus

recursively

augmentthemselves

into

far

greater

intelligencesTechnologicalSingularityifmaThanksThanks大數(shù)據(jù)平臺(tái)及互聯(lián)網(wǎng)應(yīng)用服務(wù)大數(shù)據(jù)平臺(tái)及互聯(lián)網(wǎng)應(yīng)用服務(wù)Agenda

當(dāng)前面臨問(wèn)題和挑戰(zhàn)

國(guó)內(nèi)外公司解決方案

大數(shù)據(jù)領(lǐng)域騰訊解決之道Agenda當(dāng)前面臨問(wèn)題和挑戰(zhàn)國(guó)內(nèi)外公司解決方案Agenda第一篇：當(dāng)前面臨問(wèn)題和挑戰(zhàn)Agenda第一篇：當(dāng)前面臨問(wèn)題和挑戰(zhàn)大數(shù)據(jù)挑戰(zhàn)（1）-海量數(shù)據(jù)存儲(chǔ)技術(shù)？

1.PB級(jí)數(shù)據(jù)向ZB級(jí)演進(jìn)，如何降低存儲(chǔ)

和計(jì)算成本數(shù)據(jù)量：46PB機(jī)器數(shù)量：5600臺(tái)2.工業(yè)級(jí)業(yè)務(wù)發(fā)展迅速對(duì)大數(shù)據(jù)計(jì)算時(shí)

效性和可靠性提出新的挑戰(zhàn)大數(shù)據(jù)挑戰(zhàn)（1）-海量數(shù)據(jù)存儲(chǔ)技術(shù)？數(shù)據(jù)量：46PB機(jī)器數(shù)量大數(shù)據(jù)挑戰(zhàn)（2）—數(shù)據(jù)應(yīng)用難大數(shù)據(jù)挑戰(zhàn)（2）—數(shù)據(jù)應(yīng)用難大數(shù)據(jù)挑戰(zhàn)（3）-精準(zhǔn)推薦難1.企業(yè)信息泛濫的問(wèn)題（全互聯(lián)網(wǎng)）2.推薦精度低3.推薦效果有效評(píng)估問(wèn)題4.如何有效收集用戶主動(dòng)行為數(shù)據(jù)大數(shù)據(jù)挑戰(zhàn)（3）-精準(zhǔn)推薦難1.企業(yè)信息泛濫的問(wèn)題（全互聯(lián)網(wǎng)Agenda第二篇：

國(guó)內(nèi)外公司解決方案Agenda第二篇：國(guó)內(nèi)外公司解決方案hadoop開(kāi)源產(chǎn)品HbaseMahoutHive/Pig海豚技術(shù)海狗章魚(yú)海星劍魚(yú)藍(lán)鯨…..…..海量計(jì)算:基于Hadoop海量存儲(chǔ)計(jì)算集群,同時(shí)提供一站式的計(jì)算和存儲(chǔ)資源管理

分布式數(shù)據(jù)挖掘:

基于Mahout分布式數(shù)

據(jù)數(shù)據(jù)挖掘數(shù)據(jù)分發(fā)中心:提供批量數(shù)據(jù)抽取和轉(zhuǎn)載,同時(shí)準(zhǔn)實(shí)時(shí)消息,日志分發(fā)(采用客戶pull方式)

海量數(shù)據(jù)實(shí)時(shí)搜索:

基于Hbase和Solr集成,

提供千億級(jí)別數(shù)據(jù)實(shí)時(shí)

查詢和全文檢索流計(jì)算框架:類似M/R流式計(jì)算框架,可以實(shí)現(xiàn)應(yīng)用快速,提供在線數(shù)據(jù)加工服務(wù)海量數(shù)據(jù)查詢:基于hive和Pig,提供Web頁(yè)面海量數(shù)據(jù)可視化查詢服務(wù)國(guó)內(nèi)案例-支付寶大數(shù)據(jù)平臺(tái)

支付寶hadoop相關(guān)應(yīng)用服務(wù)hadoop開(kāi)源HbaseMahoutHive/Pig海豚技?????Online

news,

Google

News

reports

that

recommendations

increasearticles

viewed

38%

(Das

al.

2007).Movies,

Netflix

reports

that

over

60%

their

rentals

originate

fromrecommendations

(Thompson

2008).Amazon,

which

sells

music,

books,

and

movies,

35%

sales

arereported

originate

from

recommendations

(Lamere

Green

2008).Video,

YouTub

人人文庫(kù)> 全部分類> 教育資料 > 輔導(dǎo)培訓(xùn)

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

專題論壇大數(shù)據(jù)課件

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

專題論壇大數(shù)據(jù)課件

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔