




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
1、.1 數(shù)據(jù)采集與清洗數(shù)據(jù)采集與清洗 2019|02|15 周 樂.2什么是大數(shù)據(jù)大數(shù)據(jù)處理流程大數(shù)據(jù)的主要特征大數(shù)據(jù)采集的概念大數(shù)據(jù)采集應(yīng)用.31什么是大數(shù)據(jù).4.5.6.7淘寶推薦 依據(jù)購物行為偏好依據(jù)購物行為偏好引薦引薦依據(jù)你最近的閱讀依據(jù)你最近的閱讀行為和消費(fèi)行為進(jìn)行為和消費(fèi)行為進(jìn)行引薦行引薦依據(jù)你用的設(shè)備往依據(jù)你用的設(shè)備往來不斷猜特征來不斷猜特征. .依據(jù)時(shí)節(jié)改變進(jìn)行依據(jù)時(shí)節(jié)改變進(jìn)行引薦引薦.82014-032015-082017-102016-032018行業(yè)現(xiàn)狀與前景行業(yè)現(xiàn)狀與前景.9.102019年人社部擬最新發(fā)布15項(xiàng)新職業(yè)1.大數(shù)據(jù)工程技術(shù)人員大數(shù)據(jù)工程技術(shù)人員2.云計(jì)算工程
2、技術(shù)人員云計(jì)算工程技術(shù)人員3.人工智能工程技術(shù)人員人工智能工程技術(shù)人員4.物聯(lián)網(wǎng)工程技術(shù)人員物聯(lián)網(wǎng)工程技術(shù)人員5.11.12.13什么是大數(shù)據(jù)大數(shù)據(jù)(大數(shù)據(jù)(Big DataBig Data)是指無法使用是指無法使用傳統(tǒng)和常用的軟件技術(shù)和工具在一定時(shí)傳統(tǒng)和常用的軟件技術(shù)和工具在一定時(shí)間內(nèi)完成獲取、管理和處理的數(shù)據(jù)集間內(nèi)完成獲取、管理和處理的數(shù)據(jù)集.142大數(shù)據(jù)的主要特征.15大數(shù)據(jù)主要特征 VolumeVolumeVelocityVelocityVarietyVarietyVeracityVeracity真實(shí)性(真實(shí)性(VeracityVeracity),),即追求高質(zhì)量的數(shù)即追求高質(zhì)量的數(shù)據(jù)
3、。據(jù)。容量大(容量大(VolumeVolume), ,指大規(guī)模的數(shù)據(jù)量,指大規(guī)模的數(shù)據(jù)量,并且數(shù)據(jù)量呈持續(xù)并且數(shù)據(jù)量呈持續(xù)增長趨勢。增長趨勢。速度快(速度快(VelocityVelocity), ,指的是數(shù)據(jù)被創(chuàng)建指的是數(shù)據(jù)被創(chuàng)建和移動的速度。和移動的速度。種類多(種類多(VarietyVariety), ,指數(shù)據(jù)來自多種數(shù)指數(shù)據(jù)來自多種數(shù)據(jù)源,數(shù)據(jù)種類和據(jù)源,數(shù)據(jù)種類和格式。格式。ValueValue價(jià)值密度低價(jià)值密度低(ValueValue),指隨著),指隨著數(shù)據(jù)量的增長,數(shù)數(shù)據(jù)量的增長,數(shù)據(jù)中有意義的信息據(jù)中有意義的信息卻沒有成相應(yīng)比例卻沒有成相應(yīng)比例增長。增長。.163大數(shù)據(jù)處理流程.
4、17大數(shù)據(jù)處理流程 數(shù)據(jù)預(yù)處理數(shù)據(jù)預(yù)處理 就是將采集就是將采集來的數(shù)據(jù)從多種數(shù)據(jù)庫來的數(shù)據(jù)從多種數(shù)據(jù)庫導(dǎo)入到大型的分布式數(shù)導(dǎo)入到大型的分布式數(shù)據(jù)庫中(目前主要是據(jù)庫中(目前主要是hfdshfds或或hivehive), ,并同時(shí)做并同時(shí)做一些簡單的清洗和預(yù)處一些簡單的清洗和預(yù)處理工作。理工作。數(shù)據(jù)統(tǒng)計(jì)分析數(shù)據(jù)統(tǒng)計(jì)分析 就是對上面就是對上面已經(jīng)完成的存儲在大型分已經(jīng)完成的存儲在大型分布式數(shù)據(jù)庫中的數(shù)據(jù)進(jìn)行布式數(shù)據(jù)庫中的數(shù)據(jù)進(jìn)行歸類統(tǒng)計(jì),可以滿足一般歸類統(tǒng)計(jì),可以滿足一般場景的分析需求。場景的分析需求。數(shù)據(jù)挖掘數(shù)據(jù)挖掘 是對數(shù)據(jù)進(jìn)是對數(shù)據(jù)進(jìn)行基于各種算法的分析行基于各種算法的分析計(jì)算,從而起到預(yù)測
5、的計(jì)算,從而起到預(yù)測的效果,實(shí)現(xiàn)一些高級別效果,實(shí)現(xiàn)一些高級別數(shù)據(jù)分析的需求。數(shù)據(jù)分析的需求。數(shù)據(jù)采集數(shù)據(jù)采集 就是利用就是利用多種數(shù)據(jù)庫(關(guān)系型,多種數(shù)據(jù)庫(關(guān)系型,NOSQLNOSQL)去存儲不)去存儲不同來源的數(shù)據(jù)。同來源的數(shù)據(jù)。數(shù)據(jù)展示數(shù)據(jù)展示 就是對就是對以上處理完的結(jié)果以上處理完的結(jié)果進(jìn)行分析,或者形進(jìn)行分析,或者形成報(bào)表。成報(bào)表。.184大數(shù)據(jù)采集的概念.19大數(shù)據(jù)采集的概念3 3、大數(shù)據(jù)采集技術(shù)方法、大數(shù)據(jù)采集技術(shù)方法 大數(shù)據(jù)采集技術(shù)就是對數(shù)據(jù)進(jìn)行大數(shù)據(jù)采集技術(shù)就是對數(shù)據(jù)進(jìn)行 ETL ETL 操作,通過對數(shù)據(jù)進(jìn)行提取、轉(zhuǎn)換、加載,最操作,通過對數(shù)據(jù)進(jìn)行提取、轉(zhuǎn)換、加載,最終挖掘
6、數(shù)據(jù)的潛在價(jià)值。終挖掘數(shù)據(jù)的潛在價(jià)值。ETLETL指的是指的是Extract-Transform-LoadExtract-Transform-Load,也就是抽取、轉(zhuǎn)換、加,也就是抽取、轉(zhuǎn)換、加載。載。 抽取抽取-從各種數(shù)據(jù)源獲取數(shù)據(jù)從各種數(shù)據(jù)源獲取數(shù)據(jù) 轉(zhuǎn)換轉(zhuǎn)換-按需求格式將源數(shù)據(jù)轉(zhuǎn)換為目標(biāo)數(shù)據(jù)按需求格式將源數(shù)據(jù)轉(zhuǎn)換為目標(biāo)數(shù)據(jù) 加載加載-把目標(biāo)數(shù)據(jù)加載到數(shù)據(jù)倉庫中把目標(biāo)數(shù)據(jù)加載到數(shù)據(jù)倉庫中2 2、數(shù)據(jù)采集與大數(shù)據(jù)采集的區(qū)別、數(shù)據(jù)采集與大數(shù)據(jù)采集的區(qū)別 傳統(tǒng)數(shù)據(jù)采集:來源單一,數(shù)據(jù)量相當(dāng)?。唤Y(jié)構(gòu)單一;關(guān)系數(shù)據(jù)庫和并行數(shù)據(jù)庫傳統(tǒng)數(shù)據(jù)采集:來源單一,數(shù)據(jù)量相當(dāng)?。唤Y(jié)構(gòu)單一;關(guān)系數(shù)據(jù)庫和并行數(shù)據(jù)庫
7、大數(shù)據(jù)的數(shù)據(jù)采集:來源廣泛,數(shù)量巨大;數(shù)據(jù)類型豐富;分布式數(shù)據(jù)庫大數(shù)據(jù)的數(shù)據(jù)采集:來源廣泛,數(shù)量巨大;數(shù)據(jù)類型豐富;分布式數(shù)據(jù)庫1 1、什么是數(shù)據(jù)采集、什么是數(shù)據(jù)采集 數(shù)據(jù)采集就是數(shù)據(jù)獲取,數(shù)據(jù)源主要分為線上數(shù)據(jù)和內(nèi)容數(shù)據(jù)數(shù)據(jù)采集就是數(shù)據(jù)獲取,數(shù)據(jù)源主要分為線上數(shù)據(jù)和內(nèi)容數(shù)據(jù).20大數(shù)據(jù)采集系統(tǒng)1.日志采集系統(tǒng)(Apache Flume、Scribe)3.數(shù)據(jù)庫采集系統(tǒng)(關(guān)系型、nosql等各種數(shù)據(jù)庫)2.網(wǎng)絡(luò)數(shù)據(jù)采集系統(tǒng)(Scrapy 框架、Apache Nutch).215大數(shù)據(jù)采集應(yīng)用.22.23技能準(zhǔn)備PythonPython基礎(chǔ)基礎(chǔ)LinuxLinux操作系統(tǒng)基本操作操作系統(tǒng)基本操作
8、數(shù)據(jù)庫基礎(chǔ)(數(shù)據(jù)庫基礎(chǔ)(SQLSQL語句操作)語句操作).24環(huán)境準(zhǔn)備PythonPythonJdk(javaJdk(java環(huán)境環(huán)境) )數(shù)據(jù)庫(數(shù)據(jù)庫(mysqlmysql).25Thanks .26YOUR TITLE Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.Nothing
9、 is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to th
10、e man difficult to the man who will try.who will try.Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.27YOUR TITLE Nothing is difficult to the man who Nothing is difficu
11、lt to the man who will try.Nothing is difficult to the will try.Nothing is difficult to the man who will try.Nothing is difficult man who will try.Nothing is difficult to the man who will try.Nothing is to the man who will try.Nothing is difficult to the man who will try.difficult to the man who wil
12、l try.Nothing is difficult to the man who Nothing is difficult to the man who will try.Nothing is difficult to the will try.Nothing is difficult to the man who will try.Nothing is difficult man who will try.Nothing is difficult to the man who will try.Nothing is to the man who will try.Nothing is di
13、fficult to the man who will try.difficult to the man who will try.282OKPPT工作室.29YOUR TITLE Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.Nothing is difficult Nothing
14、is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the
15、 man who will try.who will try.Nothing is difficult Nothing is difficult to the man who to the man who will try.Nothing is will try.Nothing is difficult to the man difficult to the man who will try.who will try.30YOUR TITLE Nothing is difficult to the Nothing is difficult to the man who will try.Not
16、hing man who will try.Nothing is difficult to the man is difficult to the man who will try.who will try.Nothing is difficult to the Nothing is difficult to the man who will try.Nothing man who will try.Nothing is difficult to the man who is difficult to the man who will try.will try.Nothing is diffi
17、cult to Nothing is difficult to the man who will the man who will try.Nothing is difficult try.Nothing is difficult to the man who will to the man who will try.try.Nothing is difficult to the Nothing is difficult to the man who will try.Nothing man who will try.Nothing is difficult to the man is dif
18、ficult to the man who will try.who will try.Nothing is difficult to the Nothing is difficult to the man who will try.Nothing man who will try.Nothing is difficult to the man who is difficult to the man who will try.will try.31YOUR TITLE 21%9%28%42%.323OKPPT工作室.33YOUR TITLE Nothing is difficult to th
19、e man who will try.Nothing is difficult to the man who will Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is diff
20、icult to the man who will try.Nothing is difficult to the man who will try. try.Nothing is difficult to the man who will try. Nothing is difficult to the man who will try.Nothing is difficult to the man who will Nothing is difficult to the man who will try.Nothing is difficult to the man who will tr
21、y.Nothing is difficult to the man who will try.try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.34YOUR TITLE Nothing is
22、difficult to the man who will try.Nothing is difficult to the man who will Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.try.Nothing is difficult to the man who will
23、try.Nothing is difficult to the man who will try.Nothing is Nothing is difficult to the difficult to the man who will try.man who will try.Nothing is Nothing is difficult to the difficult to the man who will try.man who will try.Nothing is Nothing is difficult to the difficult to the man who will try.man who will try.Nothing is Nothing is difficult to the difficult to the man who will try.man who will try.35YOUR TITLE Nothing is difficult to the man who will Nothing is difficult to the man who will try.Nothing is difficult to the man who tr
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 山東省威海乳山市2024-2025學(xué)年九年級上學(xué)期期末化學(xué)試題(含答案)
- 2025年消防設(shè)施操作員之消防設(shè)備基礎(chǔ)知識基礎(chǔ)試題庫和答案要點(diǎn)
- 景觀橋分析面試題及答案
- 2023-2024學(xué)年廣東省廣州市越秀區(qū)執(zhí)信中學(xué)七年級(下)期中數(shù)學(xué)試卷(含答案)
- 廣東省廣州市中大附中2023-2024學(xué)年八年級下學(xué)期期中物理試題(含答案)
- 采購合同范本(2篇)
- 電器知識技能培訓(xùn)班課件
- 關(guān)于調(diào)整工作時(shí)間與資源配置的通知
- 租賃協(xié)議合同
- 山東省青島萊西市(五四制)2024-2025學(xué)年八年級上學(xué)期期末生物學(xué)試題(含答案)
- 心衰4級病人護(hù)理常規(guī)
- 《合同法違約責(zé)任》課件
- 2024建筑消防設(shè)施維護(hù)保養(yǎng)技術(shù)規(guī)范
- 醫(yī)院裝修改造項(xiàng)目投標(biāo)方案(技術(shù)標(biāo))
- 【歷年真題】2018年4月00040法學(xué)概論自考試卷(含答案)
- 個(gè)人項(xiàng)目投資合作協(xié)議書范本
- 新媒體營銷全套教學(xué)教案
- 廚房設(shè)備備品備件、易損件明細(xì)
- 社會科學(xué)基礎(chǔ)(高職學(xué)前教育專業(yè))PPT完整全套教學(xué)課件
- 藥物治療學(xué)-藥物治療的一般原則課件
- 人教版PEP五年級下冊英語unit1單元復(fù)習(xí)課件
評論
0/150
提交評論