版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
數(shù)據(jù)挖掘外文翻譯參考文獻數(shù)據(jù)挖掘外文翻譯參考文獻(文檔含中英文對照即英文原文和中文翻譯)外文:WhatisDataMining?Simplystated,dataminingreferstoextractingor“mining”knowledgefromlargeamountsofdata.Thetermisactuallyamisnomer.Rememberthattheminingofgoldfromrocksorsandisreferredtoasgoldminingratherthanrockorsandmining.Thus,“datamining”shouldhavebeenmoreappropriatelynamed“knowledgeminingfromdata”,whichisunfortunatelysomewhatlong.“Knowledgemining”,ashorterterm,maynotreflecttheemphasisonminingfromlargeamountsofdata.Nevertheless,miningisavividtermcharacterizingtheprocessthatfindsasmallsetofpreciousnuggetsfromagreatdealofrawmaterial.Thus,suchamisnomerwhichcarriesboth“data”and“mining”becameapopularchoice.Therearemanyothertermscarryingasimilarorslightlydifferentmeaningtodatamining,suchasknowledgeminingfromdatabases,knowledgeextraction,data/patternanalysis,dataarchaeology,anddatadredging.Manypeopletreatdataminingasasynonymforanotherpopularlyusedterm,“KnowledgeDiscoveryinDatabases”,orKDD.Alternatively,othersviewdataminingassimplyanessentialstepintheprocessofknowledgediscoveryindatabases.Knowledgediscoveryconsistsofaniterativesequenceofthefollowingsteps:·datacleaning:toremovenoiseorirrelevantdata,·dataintegration:wheremultipledatasourcesmaybecombined,·dataselection:wheredatarelevanttotheanalysistaskareretrievedfromthedatabase,·datatransformation:wheredataaretransformedorconsolidatedintoformsappropriateforminingbyperformingsummaryoraggregationoperations,forinstance,·datamining:anessentialprocesswhereintelligentmethodsareappliedinordertoextractdatapatterns,·patternevaluation:toidentifythetrulyinterestingpatternsrepresentingknowledgebasedonsomeinterestingnessmeasures,and·knowledgepresentation:wherevisualizationandknowledgerepresentationtechniquesareusedtopresenttheminedknowledgetotheuser.Thedataminingstepmayinteractwiththeuseroraknowledgebase.Theinterestingpatternsarepresentedtotheuser,andmaybestoredasnewknowledgeintheknowledgebase.Notethataccordingtothisview,dataminingisonlyonestepintheentireprocess,albeitanessentialonesinceituncovershiddenpatternsforevaluation.Weagreethatdataminingisaknowledgediscoveryprocess.However,inindustry,inmedia,andinthedatabaseresearchmilieu,theterm“datamining”isbecomingmorepopularthanthelongertermof“knowledgediscoveryindatabases”.Therefore,inthisbook,wechoosetousetheterm“datamining”.Weadoptabroadviewofdataminingfunctionality:dataminingistheprocessofdiscoveringinterestingknowledgefromlargeamountsofdatastoredeitherindatabases,datawarehouses,orotherinformationrepositories.Basedonthisview,thearchitectureofatypicaldataminingsystemmayhavethefollowingmajorcomponents:1.Database,datawarehouse,orotherinformationrepository.Thisisoneorasetofdatabases,datawarehouses,spreadsheets,orotherkindsofinformationrepositories.Datacleaninganddataintegrationtechniquesmaybeperformedonthedata.2.Databaseordatawarehouseserver.Thedatabaseordatawarehouseserverisresponsibleforfetchingtherelevantdata,basedontheuser’sdataminingrequest.3.Knowledgebase.Thisisthedomainknowledgethatisusedtoguidethesearch,orevaluatetheinterestingnessofresultingpatterns.Suchknowledgecanincludeconcepthierarchies,usedtoorganizeattributesorattributevaluesintodifferentlevelsofabstraction.Knowledgesuchasuserbeliefs,whichcanbeusedtoassessapattern’sinterestingnessbasedonitsunexpectedness,mayalsobeincluded.Otherexamplesofdomainknowledgeareadditionalinterestingnessconstraintsorthresholds,andmetadata(e.g.,describingdatafrommultipleheterogeneoussources).4.Dataminingengine.Thisisessentialtothedataminingsystemandideallyconsistsofasetoffunctionalmodulesfortaskssuchascharacterization,associationanalysis,classification,evolutionanddeviationanalysis.5.Patternevaluationmodule.Thiscomponenttypicallyemploysinterestingnessmeasuresandinteractswiththedataminingmodulessoastofocusthesearchtowardsinterestingpatterns.Itmayaccessinterestingnessthresholdsstoredintheknowledgebase.Alternatively,thepatternevaluationmodulemaybeintegratedwiththeminingmodule,dependingontheimplementationofthedataminingmethodused.Forefficientdatamining,itishighlyrecommendedtopushtheevaluationofpatterninterestingnessasdeepaspossibleintotheminingprocesssoastoconfinethesearchtoonlytheinterestingpatterns.6.Graphicaluserinterface.Thismodulecommunicatesbetweenusersandthedataminingsystem,allowingtheusertointeractwiththesystembyspecifyingadataminingqueryortask,providinginformationtohelpfocusthesearch,andperformingexploratorydataminingbasedontheintermediatedataminingresults.Inaddition,thiscomponentallowstheusertobrowsedatabaseanddatawarehouseschemasordatastructures,evaluateminedpatterns,andvisualizethepatternsindifferentforms.Fromadatawarehouseperspective,dataminingcanbeviewedasanadvancedstageofon-1ineanalyticalprocessing(OLAP).However,datamininggoesfarbeyondthenarrowscopeofsummarization-styleanalyticalprocessingofdatawarehousesystemsbyincorporatingmoreadvancedtechniquesfordataunderstanding.Whiletheremaybemany“dataminingsystems”onthemarket,notallofthemcanperformtruedatamining.Adataanalysissystemthatdoesnothandlelargeamountsofdatacanatmostbecategorizedasamachinelearningsystem,astatisticaldataanalysistool,oranexperimentalsystemprototype.Asystemthatcanonlyperformdataorinformationretrieval,includingfindingaggregatevalues,orthatperformsdeductivequeryansweringinlargedatabasesshouldbemoreappropriatelycategorizedaseitheradatabasesystem,aninformationretrievalsystem,oradeductivedatabasesystem.Datamininginvolvesanintegrationoftechniquesfrommult1pledisciplinessuchasdatabasetechnology,statistics,machinelearning,highperformancecomputing,patternrecognition,neuralnetworks,datavisualization,informationretrieval,imageandsignalprocessing,andspatialdataanalysis.Weadoptadatabaseperspectiveinourpresentationofdatamininginthisbook.Thatis,emphasisisplacedonefficientandscalabledataminingtechniquesforlargedatabases.Byperformingdatamining,interestingknowledge,regularities,orhigh-levelinformationcanbeextractedfromdatabasesandviewedorbrowsedfromdifferentangles.Thediscoveredknowledgecanbeappliedtodecisionmaking,processcontrol,informationmanagement,queryprocessing,andsoon.Therefore,dataminingisconsideredasoneofthemostimportantfrontiersindatabasesystemsandoneofthemostpromising,newdatabaseapplicationsintheinformationindustry.AclassificationofdataminingsystemsDataminingisaninterdisciplinaryfield,theconfluenceofasetofdisciplines,includingdatabasesystems,statistics,machinelearning,visualization,andinformationscience.Moreover,dependingonthedataminingapproachused,techniquesfromotherdisciplinesmaybeapplied,suchasneuralnetworks,fuzzyandorroughsettheory,knowledgerepresentation,inductivelogicprogramming,orhighperformancecomputing.Dependingonthekindsofdatatobeminedoronthegivendataminingapplication,thedataminingsystemmayalsointegratetechniquesfromspatialdataanalysis,Informationretrieval,patternrecognition,imageanalysis,signalprocessing,computergraphics,Webtechnology,economics,orpsychology.Becauseofthediversityofdisciplinescontributingtodatamining,dataminingresearchisexpectedtogeneratealargevarietyofdataminingsystems.Therefore,itisnecessarytoprovideaclearclassificationofdataminingsystems.Suchaclassificationmayhelppotentialusersdistinguishdataminingsystemsandidentifythosethatbestmatchtheirneeds.Dataminingsystemscanbecategorizedaccordingtovariouscriteria,asfollows.1)Classificationaccordingtothekindsofdatabasesmined.Adataminingsystemcanbeclassifiedaccordingtothekindsofdatabasesmined.Databasesystemsthemselvescanbeclassifiedaccordingtodifferentcriteria(suchasdatamodels,orthetypesofdataorapplicationsinvolved),eachofwhichmayrequireitsowndataminingtechnique.Dataminingsystemscanthereforebeclassifiedaccordingly.Forinstance,ifclassifyingaccordingtodatamodels,wemayhavearelational,transactional,object-oriented,object-relational,ordatawarehouseminingsystem.Ifclassifyingaccordingtothespecialtypesofdatahandled,wemayhaveaspatial,time-series,text,ormultimediadataminingsystem,oraWorld-WideWebminingsystem.Othersystemtypesincludeheterogeneousdataminingsystems,andlegacydataminingsystems.2)Classificationaccordingtothekindsofknowledgemined.Dataminingsystemscanbecategorizedaccordingtothekindsofknowledgetheymine,i.e.,basedondataminingfunctionalities,suchascharacterization,discrimination,association,classification,clustering,trendandevolutionanalysis,deviationanalysis,similarityanalysis,etc.Acomprehensivedataminingsystemusuallyprovidesmultipleand/orintegrateddataminingfunctionalities.Moreover,dataminingsystemscanalsobedistinguishedbasedonthegranularityorlevelsofabstractionoftheknowledgemined,includinggeneralizedknowledge(atahighlevelofabstraction),primitive-levelknowledge(atarawdatalevel),orknowledgeatmultiplelevels(consideringseverallevelsofabstraction).Anadvanceddataminingsystemshouldfacilitatethediscoveryofknowledgeatmultiplelevelsofabstraction.3)Classificationaccordingtothekindsoftechniquesutilized.Dataminingsystemscanalsobecategorizedaccordingtotheunderlyingdataminingtechniquesemployed.Thesetechniquescanbedescribedaccordingtothedegreeofuserinteractioninvolved(e.g.,autonomoussystems,interactiveexploratorysystems,query-drivensystems),orthemethodsofdataanalysisemployed(e.g.,database-orientedordatawarehouse-orientedtechniques,machinelearning,statistics,visualization,patternrecognition,neuralnetworks,andsoon).Asophisticateddataminingsystemwilloftenadoptmultipledataminingtechniquesorworkoutaneffective,integratedtechniquewhichcombinesthemeritsofafewindividualapproaches.翻譯:什么是數(shù)據(jù)挖掘?簡單地說,數(shù)據(jù)挖掘是從大量的數(shù)據(jù)中提取或“挖掘”知識。該術(shù)語實際上有點兒用詞不當。注意,從礦石或砂子中挖掘黃金叫做黃金挖掘,而不是叫做礦石挖掘。這樣,數(shù)據(jù)挖掘應(yīng)當更準確地命名為“從數(shù)據(jù)中挖掘知識”,不幸的是這個有點兒長?!爸R挖掘”是一個短術(shù)語,可能它不能反映出從大量數(shù)據(jù)中挖掘的意思。畢竟,挖掘是一個很生動的術(shù)語,它抓住了從大量的、未加工的材料中發(fā)現(xiàn)少量金塊這一過程的特點。這樣,這種用詞不當攜帶了“數(shù)據(jù)”和“挖掘”,就成了流行的選擇。還有一些術(shù)語,具有和數(shù)據(jù)挖掘類似但稍有不同的含義,如數(shù)據(jù)庫中的知識挖掘、知識提取、數(shù)據(jù)/模式分析、數(shù)據(jù)考古和數(shù)據(jù)捕撈。許多人把數(shù)據(jù)挖掘視為另一個常用的術(shù)語—數(shù)據(jù)庫中的知識發(fā)現(xiàn)或KDD的同義詞。而另一些人只是把數(shù)據(jù)挖掘視為數(shù)據(jù)庫中知識發(fā)現(xiàn)過程的一個基本步驟。知識發(fā)現(xiàn)的過程由以下步驟組成:1)數(shù)據(jù)清理:消除噪聲或不一致數(shù)據(jù),2)數(shù)據(jù)集成:多種數(shù)據(jù)可以組合在一起,3)數(shù)據(jù)選擇:從數(shù)據(jù)庫中檢索與分析任務(wù)相關(guān)的數(shù)據(jù),4)數(shù)據(jù)變換:數(shù)據(jù)變換或統(tǒng)一成適合挖掘的形式,如通過匯總或聚集操作,5)數(shù)據(jù)挖掘:基本步驟,使用智能方法提取數(shù)據(jù)模式,6)模式評估:根據(jù)某種興趣度度量,識別表示知識的真正有趣的模式,7)知識表示:使用可視化和知識表示技術(shù),向用戶提供挖掘的知識。數(shù)據(jù)挖掘的步驟可以與用戶或知識庫進行交互。把有趣的模式提供給用戶,或作為新的知識存放在知識庫中。注意,根據(jù)這種觀點,數(shù)據(jù)挖掘只是整個過程中的一個步驟,盡管是最重要的一步,因為它發(fā)現(xiàn)隱藏的模式。我們同意數(shù)據(jù)挖掘是知識發(fā)現(xiàn)過程中的一個步驟。然而,在產(chǎn)業(yè)界、媒體和數(shù)據(jù)庫研究界,“數(shù)據(jù)挖掘”比那個較長的術(shù)語“數(shù)據(jù)庫中知識發(fā)現(xiàn)”更為流行。因此,在本書中,選用的術(shù)語是數(shù)據(jù)挖掘。我們采用數(shù)據(jù)挖掘的廣義觀點:數(shù)據(jù)挖掘是從存放在數(shù)據(jù)庫中或其他信息庫中的大量數(shù)據(jù)中挖掘出有趣知識的過程?;谶@種觀點,典型的數(shù)據(jù)挖掘系統(tǒng)具有以下主要成分:數(shù)據(jù)庫、數(shù)據(jù)倉庫或其他信息庫:這是一個或一組數(shù)據(jù)庫、數(shù)據(jù)倉庫、電子表格或其他類型的信息庫??梢栽跀?shù)據(jù)上進行數(shù)據(jù)清理和集成。數(shù)據(jù)庫、數(shù)據(jù)倉庫服務(wù)器:根據(jù)用戶的數(shù)據(jù)挖掘請求,數(shù)據(jù)庫、數(shù)據(jù)倉庫服務(wù)器負責提取相關(guān)數(shù)據(jù)。知識庫:這是領(lǐng)域知識,用于指導(dǎo)搜索,或評估結(jié)果模式的興趣度。這種知識可能包括概念分層,用于將屬性或?qū)傩灾到M織成不同的抽象層。用戶確信方面的知識也可以包含在內(nèi)??梢允褂眠@種知識,根據(jù)非期望性評估模式的興趣度。領(lǐng)域知識的其他例子有興趣度限制或閾值和元數(shù)據(jù)(例如,描述來自多個異種數(shù)據(jù)源的數(shù)據(jù))。數(shù)據(jù)挖掘引擎:這是數(shù)據(jù)挖掘系統(tǒng)基本的部分,由一組功能模塊組成,用于特征化、關(guān)聯(lián)、分類、聚類分析以及演變和偏差分析。模式評估模塊:通常,此成分使用興趣度度量,并與數(shù)據(jù)挖掘模塊交互,以便將搜索聚集在有趣的模式上。它可能使用興趣度閾值過濾發(fā)現(xiàn)的模式。模式評估模塊也可以與挖掘模塊集成在一起,這依賴于所用的數(shù)據(jù)挖掘方法的實現(xiàn)。對于有效的數(shù)據(jù)挖掘,建議盡可能深地將模式評估推進到挖掘過程之中,以便將搜索限制在有興趣的模式上。圖形用戶界面:本模塊在用戶和數(shù)據(jù)挖掘系統(tǒng)之間進行通信,允許用戶與系統(tǒng)進行交互,指定數(shù)據(jù)挖掘查詢或任務(wù),提供信息、幫助搜索聚焦,根據(jù)數(shù)據(jù)挖掘的中間結(jié)果進行探索式數(shù)據(jù)挖掘。此外,此成分還允許用戶瀏覽數(shù)據(jù)庫和數(shù)據(jù)倉庫模式或數(shù)據(jù)結(jié)構(gòu),評估挖掘的模式,以不同的形式對模式進行可視化。從數(shù)據(jù)倉庫觀點,數(shù)據(jù)挖掘可以看作聯(lián)機分析處理(OLAP)的高級階段。然而,通過結(jié)合更高級的數(shù)據(jù)理解技術(shù),數(shù)據(jù)挖掘比數(shù)據(jù)倉庫的匯總型分析處理走得更遠。盡管市場上已有許多“數(shù)據(jù)挖掘系統(tǒng)”,但是并非所有系統(tǒng)的都能進行真正的數(shù)據(jù)挖掘。不能處理大量數(shù)據(jù)的數(shù)據(jù)分析系統(tǒng),最多是被稱作機器學(xué)習(xí)系統(tǒng)、統(tǒng)計數(shù)據(jù)分析工具或?qū)嶒炏到y(tǒng)原型。一個系統(tǒng)只能夠進行數(shù)據(jù)或信息檢索,包括在大型數(shù)據(jù)庫中找出聚集的值或回答演繹查詢,應(yīng)當歸類為數(shù)據(jù)庫系統(tǒng)
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 微生物課課程設(shè)計
- 護士長防疫護理工作總結(jié)范文(9篇)
- 工廠敬業(yè)奉獻模范事跡材料范文(8篇)
- 安全生產(chǎn)標語口號條幅摘錄
- 張愛玲格言大全70句
- 幼兒倡議節(jié)約糧食的倡議書(12篇)
- 2025年山東淄博市臨淄區(qū)文化旅游限公司招聘工作人員10人管理單位筆試遴選500模擬題附帶答案詳解
- 2025年山東濟寧泗水縣公證處招聘工作人員3人管理單位筆試遴選500模擬題附帶答案詳解
- 2025年山東濟南市城鄉(xiāng)交通運輸局所屬事業(yè)單位招聘工作人員75人歷年管理單位筆試遴選500模擬題附帶答案詳解
- 2025年山東日照市選聘工作人員9人歷年管理單位筆試遴選500模擬題附帶答案詳解
- 工廠設(shè)備工程師年終總結(jié)
- 六年級20道說理題
- 辦公室行政培訓(xùn)
- 【《伊利乳業(yè)盈利能力分析與評價案例》10000字】
- 湖南省岳陽市2023-2024學(xué)年高一上學(xué)期1月期末質(zhì)量監(jiān)測試題+物理 含答案
- 圓柱的表面積課件
- 2024年秋季學(xué)期新人教版3年級上冊英語課件 Unit 5 Part A 第1課時 Let's talk Guess and check
- 2024年高等教育法學(xué)類自考-00226知識產(chǎn)權(quán)法考試近5年真題附答案
- 2024年寧夏中考語文試卷(含答案逐題解析)+2023年中考語文試卷及答案
- 金匱要略2022-2023-2學(xué)期學(xué)習(xí)通超星期末考試答案章節(jié)答案2024年
- DB31-T 1502-2024 工貿(mào)行業(yè)有限空間作業(yè)安全管理規(guī)范
評論
0/150
提交評論