Clementine7.0軟件基本操作(中).ppt_第1頁
Clementine7.0軟件基本操作(中).ppt_第2頁
Clementine7.0軟件基本操作(中).ppt_第3頁
Clementine7.0軟件基本操作(中).ppt_第4頁
Clementine7.0軟件基本操作(中).ppt_第5頁
已閱讀5頁,還剩37頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、Clementine7.0軟件操作(中),第七講,主講教師:沈浩 北京廣播學(xué)院新聞傳播學(xué)院 副教授 北京廣播學(xué)院調(diào)查統(tǒng)計研究所 副所長 IPSOS(中國)市場研究有限公司 首席技術(shù)顧問,Graph Nodes,The Graphs palette contains the following nodes: Plot Multiplot Distribution Histogram Collection Web Evaluation,Graph with size overlay,Graph with panel overlay,Graph with color overlay,Graph wi

2、th color and transparency overlays,3-D Graphs,Animation,Using Graphs,Plot Node,Multiplot Node,Distribution Node,Histogram Node,Collection Node,Web Node,Creating a Web Summary,Evaluation Chart Node,Cumulative gains charts always start at 0% and end at 100% You go from left to right. For a good model,

3、 the gains chart will rise steeply toward 100% and then level off. A model that provides no information will follow the diagonal from lower left to upper right (shown in the chart if Include baseline is selected).,Gains charts.,Cumulative lift charts tend to start above 1.0 and gradually descend unt

4、il they reach 1.0 you go from left to right. The right edge of the chart represents the entire data set The ratio of hits in cumulative quantiles to hits in data is 1.0. For a good model, lift should start well above 1.0 on the left, remain on a high plateau as you move to the right, and then trail

5、off sharply toward 1.0 on the right side of the chart For a model that provides no information, the line will hover around 1.0 for the entire graph. (If Include baseline is selected, a horizontal line at 1.0 is shown in the chart for reference.),Lift charts.,Cumulative response charts tend to be ver

6、y similar to lift charts except for the scaling. Response charts usually start near 100% and gradually descend until they reach the overall response rate (total hits / total records) on the right edge of the chart. For a good model, the line will start near or at 100% on the left, remain on a high p

7、lateau as you move to the right, and then trail off sharply toward the overall response rate on the right side of the chart. For a model that provides no information, the line will hover around the overall response rate for the entire graph. (If Include baseline is selected, a horizontal line at the

8、 overall response rate is shown in the chart for reference.),Response charts.,Cumulative profit charts show the sum of profits as you increase the size of the selected sample, moving from left to right. Profit charts usually start near zero, increase steadily as you move to the right until they reac

9、h a peak or plateau in the middle, and then decrease toward the right edge of the chart. For a good model, profits will show a well-defined peak somewhere in the middle of the chart. For a model that provides no information, the line will be relatively straight and may be increasing, decreasing, or

10、level depending on the cost/revenue structure that applies.,Profit charts.,Profits=Revenue-Cost,Cumulative ROI (return on investment投資回報) charts tend to be similar to response charts and lift charts except for the scaling. ROI charts usually start above 0% and gradually descend until they reach the

11、overall ROI for the entire data set (which can be negative). For a good model, the line should start well above 0%, remain on a high plateau as you move to the right, and then trail off rather sharply toward the overall ROI on the right side of the chart. For a model that provides no information, th

12、e line should hover around the overall ROI value.,ROI charts.,Modeling Nodes,The Modeling palette contains the following nodes: Neural Net神經(jīng)網(wǎng)絡(luò) C5.0決策樹 Kohonen神經(jīng)聚類 Linear Regression線形回歸 Generalized Rule Induction (GRI)一般規(guī)則偵測 Apriori神經(jīng)規(guī)則 K-Means快速聚類 Logistic Regression羅輯斯蒂克回歸 Factor Analysis/PCA因子和主成分

13、分析 TwoStep Cluster兩階段聚類 Classification and Regression (C&R) Trees分類和回歸決策樹 Sequence Detection序列分析,Modeling,Modeling,Target,Out,in,in,Modeling,in,in,Gen-Outcome,Neural Net Node,Requirements. There are no restrictions on field types. Neural Net nodes can handle numeric, symbolic, or flag inputs and out

14、puts. The Neural Net node expects one or more fields with direction In and one or more fields with direction Out. Fields set to Both or None are ignored. Field types must be fully instantiated when the node is executed. Strengths. Neural networks are powerful general function estimators. They usuall

15、y perform prediction tasks at least as well as other techniques and sometimes perform significantly better. They also require minimal statistical or mathematical knowledge to train or apply. Clementine incorporates several features to avoid some of the common pitfalls of neural networks, including s

16、ensitivity analysis to aid in interpretation of the network, pruning and validation to prevent overtraining, and dynamic networks to automatically find an appropriate network architecture.,A new weight is derived by taking the old weight, applying an adjustment based on a function of the prediction

17、error (represented here by dj). (the nonlinear function applied to the result of the combination of the weights and inputs) The momentum term (D) serves to encourage the weight change to maintain the same direction as the last weight change. It and the learning rate (K) are control parameters that c

18、an be modified by experienced neural network practitioners to fine-tune the performance of backpropagation neural networks.,Wji is the weight connecting neuron i to neuron j, t is the trial number, K is the learning rate (a value set between 0 and 1), dj is the error gradient at node j Oi is the act

19、ivation level of a node D is a momentum term (a value set between 0 and 1).,Neural Net Node,Neural Net Node,A Neural Network Example: Predicting Credit Risk,Because the target variable is categorical, the analysis will substitute three dummy coded (0,1) fields for the single three-category field. Si

20、milarly, this will be done for the marital status field. Such adjustment is made automatically within Clementine, based on the fields type.,We will use a neural network to predict the credit risk category into which individuals should be placed.,credit risk,“good risk” “bad risk, but profitable” “ba

21、d risk with loss”.,predictors,outcome field,marital status income number of store credit cards Age number of credit cards number of loans number of children Gender Mortgage抵押 whether salary is weekly or monthly,A Neural Network Stream一齊動手!,D:mydataTrain.txt D:mydataTest.txt,部分結(jié)論,Kohonen Node,The Koh

22、onen node is used to create and train a special type of neural network called a Kohonen network, a knet, or a self-organizing map. This type of network can be used to cluster the data set into distinct groups, when you dont know what those groups are at the beginning. Unlike most learning methods in

23、 Clementine. Kohonen networks do not use a target field. This type of learning, with no target field, is called unsupervised learning. Instead of trying to predict an outcome, Kohonen nets try to uncover patterns in the set of input fields. Records are grouped so that records within a group or clust

24、er tend to be similar to each other, and records in different groups are dissimilar.,income,Age,Gender,marital status,number of children,Seg11,Segij,Kohonen network consists of an input layer of units and a two-dimensional output grid of processing units. During training, each unit competes with all

25、 of the others to “win” each record. When a unit wins a record, its weights are adjusted to better match the pattern of predictor values for that record. As training proceeds, the weights on the grid units are adjusted so that they form a two-dimensional “map” of the clusters. (Hence the term self-o

26、rganizing map.) Usually, a Kohonen net will end up with a few units that summarize many observations (strong units), and several units that dont really correspond to any of the observations (weak units). The strong units represent probable cluster centers. Another use of Kohonen networks is in dimen

27、sion reduction. The spatial characteristic of the two-dimensional grid provides a mapping from the k original predictors to two derived features that preserve the similarity relationships in the original predictors. In some cases, this can give you the same kind of benefit as factor analysis or PCA.

28、,Kohonen Node,Kohonen Node一齊動手!,Kohonen Stream,部分結(jié)論,聚類結(jié)果,C5.0 Node,Requirements. To train a C5.0 model, you need one or more In fields and one or more symbolic Out field(s). Fields set to Both or None are ignored. Fields used in the model must have their types fully instantiated. Strengths. C5.0 mod

29、els are quite robust in the presence of problems such as missing data and large numbers of input fields. They usually do not require long training times to estimate. In addition, C5.0 models tend to be easier to understand than some other model types, since the rules derived from the model have a ve

30、ry straightforward interpretation. C5.0 also offers the powerful boosting method to increase accuracy of classification.,C5.0 基本原理,This node uses the C5.0 algorithm to build either a decision tree or a ruleset. A C5.0 model works by splitting the sample based on the field that provides the maximum information gain. Each subsample defined by the first split is then split again, usually based on a different field, and the process repeats until the subsamples cannot be split any further. F

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論