商務(wù)數(shù)據(jù)分析與統(tǒng)計建模：chap1-1 一元回歸及其相關(guān)問題

上傳人：紅*** IP屬地：江西上傳時間：2022-07-19 格式：PPT 頁數(shù)：114 大小：9.59MB 積分：16 舉報 版權(quán)申訴

商務(wù)數(shù)據(jù)分析與統(tǒng)計建模：chap1-1 一元回歸及其相關(guān)問題_第2頁

商務(wù)數(shù)據(jù)分析與統(tǒng)計建模：chap1-1 一元回歸及其相關(guān)問題_第3頁

商務(wù)數(shù)據(jù)分析與統(tǒng)計建模：chap1-1 一元回歸及其相關(guān)問題_第4頁

商務(wù)數(shù)據(jù)分析與統(tǒng)計建模：chap1-1 一元回歸及其相關(guān)問題_第5頁

已閱讀5頁，還剩109頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認領(lǐng)

文檔簡介

1、主教材何曉群，應(yīng)用回歸分析，中國人民大學(xué)出版社，2015年第4版王斌會，多元統(tǒng)計分析及R語言建模，暨南大學(xué)出版社，2016年第3版何曉群，多元統(tǒng)計分析（第四版），中國人民大學(xué)出版社，2015年概率論與數(shù)理統(tǒng)計，雷平主編，立信會計出版社商務(wù)與經(jīng)濟統(tǒng)計，雷平譯，機械工業(yè)出版社數(shù)據(jù)統(tǒng)計分析及R語言編程，王斌會，北京大學(xué)出版社，2004年8月 Introduction to Linear Regression Analysis, Montgomery, 5th ed, Wiley Press,2013參考書目在終極的分析中，一切知識都是歷史在抽象的意義下，一切科學(xué)都是數(shù)學(xué)在理性的基礎(chǔ)上，所有的判

2、斷都是統(tǒng)計學(xué)-C.R.勞統(tǒng)計與真理怎樣運用偶然性 12-5Correlation vs. RegressionA scatter diagram can be used to show the relationship between two variablesCorrelation analysis is used to measure strength of the association (linear relationship) between two variablesCorrelation is only concerned with strength of the relati

3、onship No causal effect is implied with correlation 12-6TYPES OF REGRESSION MODEL 12-8r 0r = 0THREE DEGREES OF CORRELATION 12-9Types of RelationshipsYXYXYYXXStrong relationshipsWeak relationships(continued) 12-10Types of RelationshipsYXYXNo relationship(continued) 12-11Coefficient of CorrelationMeas

4、ures the relative strength of the linear relationship between two variablesSample coefficient of correlation: 12-13Features of Correlation Coefficient, rUnit freeRanges between 1 and 1The closer to 1, the stronger the negative linear relationshipThe closer to 1, the stronger the positive linear rela

5、tionshipThe closer to 0, the weaker any positive linear relationship-1.0+1.00Perfect Positive CorrelationIncreasing degree of negative correlation-.5+.5Perfect Negative CorrelationNo CorrelationIncreasing degree of positive correlationCOEFFICIENT OF CORRELATION VALUES 12-15Scatter Plots of Data with

6、 Various Correlation CoefficientsYXYXYXYXYXr = -1r = -.6r = 0r = +.3r = +1YXr = 0 12-16Introduction to Regression AnalysisRegression analysis is used to:Predict the value of a dependent variable based on the value of at least one independent variableExplain the impact of changes in an independent va

7、riable on the dependent variableDependent variable: the variable we wish to explainIndependent variable: the variable used to explain the dependent variable 12-17Simple Linear Regression ModelOnly one independent variable, XRelationship between X and Y is described by a linear functionChanges in Y a

8、re assumed to be caused by changes in X 12-18Linear componentSimple Linear Regression ModelThe population regression model:Population Y intercept Population SlopeCoefficient Random Error termDependent VariableIndependent VariableRandom Error component 12-19Random Error for this Xi valueYXObserved Va

9、lue of Y for XiPredicted Value of Y for Xi XiSlope = 1Intercept = 0 iSimple Linear Regression Model 12-20The simple linear regression equation provides an estimate of the population regression lineSimple Linear Regression EquationEstimate of the regression interceptEstimate of the regression slopeEs

10、timated (or predicted) Y value for observation xValue of XThe individual random error terms e have a mean of zeroRandom error for Xi valueYObserved value of Y for XiXiSlope = 1Intercept = 0 iiSimple Linear Regression Model 12-22Least Squares Methodb0 and b1 are obtained by finding the values of b0 a

11、nd b1 that minimize the sum of the squared differences between Y and : 12-23b0 is the estimated average value of Y when the value of X is zerob1 is the estimated change in the average value of Y as a result of a one-unit change in XInterpretation of the Slope and the Intercept 12-24Simple Linear Reg

12、ression ExampleA real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)A random sample of 10 houses is selectedDependent variable (Y) = house price in $1000sIndependent variable (X) = square feet 12-25Sample Data for House Pric

13、e ModelHouse Price in $1000s(Y)Square Feet (X)2451400312160027917003081875199110021915504052350324245031914252551700 12-26Graphical PresentationHouse price model: scatter plot 12-27Least-squares MethodThis interpretation states:The sum of the squares of the errors should be made as small as possible

14、.Sum of the squares of residuals.The values of b0 and b1 that minimise this sum of the squares of the residuals are given by:Where:LEAST SQUARES ANALYSIS - EQUATIONS Sample Regression Line : Slope of Regression line: Intercept of Regression line:COMPUTATIONAL PROCEDUREThe expression in the numerator

15、 of the slope formula can be denoted as SSxy The expression in the denominator of the slope formula can be denoted as SSxx Hence, slope can be written as: 12-30Calculation of b0 & b1 12-31Alternative formula for b1Simple Linear Regression Example: Scatter PlotHouse price model: Scatter PlotDCOVASimp

16、le Linear Regression Example: Excel OutputRegression StatisticsMultiple R0.76211R Square0.58082Adjusted R Square0.52842Standard Error41.33032Observations10ANOVAdfSSMSFSignificance FRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000CoefficientsStandard Errort St

17、atP-valueLower 95%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Square Feet0.109770.032973.329380.010390.033740.18580The regression equation is:DCOVA 12-34Graphical PresentationHouse price model: scatter plot and regression lineSlope = 0.10977Intercept = 98.248 12-35Interpretatio

18、n of the Intercept, b0b0 is the estimated average value of Y when the value of X is zero (if X = 0 is in the range of observed X values)Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price

19、 not explained by square feet 12-36Interpretation of the Slope Coefficient, b1b1 measures the estimated change in the average value of Y as a result of a one-unit change in XHere, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additi

20、onal one square foot of size 12-37Predict the price for a house with 2000 square feet:The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850Predictions using Regression Analysis 12-38Interpolation vs. ExtrapolationWhen using a regression model for prediction, only predic

21、t within the relevant range of dataRelevant range for interpolationDo not try to extrapolate beyond the range of observed Xs 12-39Interpretation of Results:The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.The model

22、estimates that for each increase of one square foot in the size of the store, the expected annual sales are predicted to increase by $1487. 12-40Example: Produce StoresY = 1636.415 +1.487XData for seven stores:Regression Model Obtained:Predict the annual sales for a store with 2000 square feet.Annua

23、l Store Square Sales Feet($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760 12-41Using the regression line for predictionIf the regression line is a poor fit of the data the prediction will be of little useIf the regression line is a good fit of

24、the data, it is always dangerous to make a prediction of y for an x-value that was outside the limits (i.e. smallest and largest) of the x-values used in finding the equation of the line 12-42(continued)XiYXYiSST = (Yi - Y)2SSE = (Yi - Yi )2 SSR = (Yi - Y)2 _YY_YiMeasures of Variation 12-44Measures

25、of VariationTotal variation is made up of two parts:Total Sum of SquaresRegression Sum of SquaresError Sum of Squareswhere: = Average value of the dependent variableYi = Observed values of the dependent variable i = Predicted value of Y for the given Xi value 12-45SST = total sum of squares (Total V

26、ariation)Measures the variation of the Yi values around their mean YSSR = regression sum of squares (Explained Variation)Variation attributable to the relationship between X and YSSE = error sum of squares (Unexplained Variation)Variation in Y attributable to factors other than X(continued)Measures

27、of Variation 12-46The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variableThe coefficient of determination is also called r-squared and is denoted as r2Coefficient of Determination, r2note: 12-47Goodne

28、ss of fitHow can we determine how well a regression line fits the data?Choose:the line that has the smallest sum of the squares of errors, orcoefficient of determination (r2)the line of good fit 12-48Coefficient of determination This quantity is defined as:The square of the correlation coefficient r

29、The value of r always lies between -1 and 1, the value of r2 must always lie between 0 and 1If the value of r2 is close to 1, a straight line fits the data wellIf the value of r2 is close to 0, a straight line fits the data poorly 12-49COEFFICIENT OF DETERMINATION, r2 Proportion of variability of th

30、e dependent variable (y) accounted for, or explained by, the independent variable (x) in a regression model Range of r2 is from 0 to 1 : 0 r2 1 r2 of zero implies the predictor accounts for none of the variability of the dependent variable and that there is no regression prediction of y by x r2 of o

31、ne implies perfect prediction of y by x and that 100% variability of y is accounted for by xCOEFFICIENT OF DETERMINATION, r2SSyy = Explained variation + Unexplained Variation0 r2 1 12-51COMPUTATIONAL FORMULA FOR r2It can be shown through algebra that : From this equation, a computational formula for

32、 r2 can be developed. This formula holds only for simple linear regression 12-52EXAMPLEFor the CD- Concert example, the coefficient of determinationcan be computed as follows: 12-53r2 = 1Examples of Approximate r2 ValuesYXYXr2 = 1r2 = 1Perfect linear relationship between X and Y: 100% of the variati

33、on in Y is explained by variation in X 12-54Examples of Approximate r2 ValuesYXYX0 r2 1Weaker linear relationships between X and Y: Some but not all of the variation in Y is explained by variation in X 12-55Examples of Approximate r2 Valuesr2 = 0No linear relationship between X and Y: The value of Y

34、 does not depend on X. (None of the variation in Y is explained by variation in X)YXr2 = 0 12-56Excel OutputRegression StatisticsMultiple R0.76211R Square0.58082Adjusted R Square0.52842Standard Error41.33032Observations10ANOVAdfSSMSFSignificance FRegression118934.934818934.934811.08480.01039Residual

35、813665.56521708.1957Total932600.5000CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Square Feet0.109770.032973.329380.010390.033740.1858058.08% of the variation in house prices is explained by variation in square feetSimple Linear Reg

36、ression Example: Coefficient of Determination, r2 in MinitabThe regression equation isPrice = 98.2 + 0.110 Square FeetPredictor Coef SE Coef T PConstant 98.25 58.03 1.69 0.129Square Feet 0.10977 0.03297 3.33 0.010S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8%Analysis of VarianceSource DF SS MS F PRegres

37、sion 1 18935 18935 11.08 0.010Residual Error8 13666 1708Total 9 3260058.08% of the variation in house prices is explained by variation in square feetDCOVA 12-58Standard Error of EstimateThe standard deviation of the variation of observations around the regression line is estimated byWhereSSE = error

38、 sum of squares n = sample sizeRegression StatisticsMultiple R0.76211R Square0.58082Adjusted R Square0.52842Standard Error41.33032Observations10ANOVA dfSSMSFSignificance FRegression118934.934818934.93411.0840.01039Residual813665.56521708.1957Total932600.5000CoefficientsStandard Errort StatP-valueLow

39、er 95%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Square Feet0.109770.032973.329380.010390.033740.18580SSEEXCEL OUTPUT FOR HOUSE PRICE MODELSimple Linear Regression Example:Standard Error of Estimate in MinitabThe regression equation isPrice = 98.2 + 0.110 Square FeetPredictor

40、Coef SE Coef T PConstant 98.25 58.03 1.69 0.129Square Feet 0.10977 0.03297 3.33 0.010S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8%Analysis of VarianceSource DF SS MS F PRegression 1 18935 18935 11.08 0.010Residual Error8 13666 1708Total 9 32600DCOVA 12-61Comparing Standard ErrorsYYXXSYX is a measure of

41、 the variation of observed Y values from the regression lineThe magnitude of SYX should always be judged relative to the size of the Y values in the sample datai.e., SYX = $41.33K is moderately small relative to house prices in the $200 - $300K range 12-62Assumptions of RegressionL.I.N.ELinearityThe

42、 relationship between X and Y is linearIndependence of ErrorsError values are statistically independentNormality of ErrorError values are normally distributed for any given value of XEqual Variance (also called homoscedasticity)The probability distribution of the errors has constant variance 12-63Re

43、sidual AnalysisThe residual for observation i, ei, is the difference between its observed and predicted valueCheck the assumptions of regression by examining the residualsExamine for linearity assumptionEvaluate independence assumption Evaluate normal distribution assumption Examine for constant var

44、iance for all levels of X (homoscedasticity) Graphical Analysis of ResidualsCan plot residuals vs. X 12-64Residual Analysis for LinearityNot LinearLinearxresidualsxYxYxresiduals 12-65Residual Analysis for IndependenceNot IndependentIndependentXXresidualsresidualsXresiduals 12-66Checking for Normalit

45、yExamine the Stem-and-Leaf Display of the ResidualsExamine the Boxplot of the ResidualsExamine the Histogram of the ResidualsConstruct a Normal Probability Plot of the Residuals 12-67Residual Analysis for NormalityPercentResidualWhen using a normal probability plot, normal errors will approximately

46、display in a straight line-3 -2 -1 0 1 2 30100 12-68Residual Analysis for Equal Variance Non-constant varianceConstant variancexxYxxYresidualsresidualsNon constant VarianceGraphs of Nonindependent Error TermsHealthy Residual Plot 12-72Simple Linear Regression Example: Excel Residual OutputRESIDUAL O

47、UTPUTPredicted House Price Residuals1251.92316-6.9231622273.8767138.123293284.85348-5.8534844304.062843.9371625218.99284-19.992846268.38832-49.388327356.2025148.797498367.17929-43.179299254.667464.3326410284.85348-29.85348Does not appear to violate any regression assumptionsSimple Linear Regression

48、Example: Minitab Residual OutputDCOVADoes not appear to violate any regression assumptions 12-74Used when data are collected over time to detect if autocorrelation is presentAutocorrelation exists if residuals in one time period are related to residuals in another periodMeasuring Autocorrelation:The

49、 Durbin-Watson Statistic 12-75AutocorrelationAutocorrelation is correlation of the errors (residuals) over timeViolates the regression assumption that residuals are random and independentHere, residuals show a cyclic pattern, not random 12-76The Durbin-Watson Statistic The possible range is 0 D 4 D

50、should be close to 2 if H0 is true D less than 2 may signal positive autocorrelation, D greater than 2 may signal negative autocorrelationThe Durbin-Watson statistic is used to test for autocorrelationH0: residuals are not correlatedH1: autocorrelation is present 12-77Testing for Positive Autocorrel

51、ation Calculate the Durbin-Watson test statistic = D (The Durbin-Watson Statistic can be found using PHStat in Excel)Decision rule: reject H0 if D dLH0: positive autocorrelation does not existH1: positive autocorrelation is present0dU2dLReject H0Do not reject H0 Find the values dL and dU from the Du

52、rbin-Watson table (for sample size n and number of independent variables k)Inconclusive 12-78Example with n = 25:Durbin-Watson CalculationsSum of Squared Difference of Residuals3296.18Sum of Squared Residuals3279.98Durbin-Watson Statistic1.00494Testing for Positive Autocorrelation(continued)Excel/PH

53、Stat output: 12-79Here, n = 25 and there is k = 1 one independent variableUsing the Durbin-Watson table, dL = 1.29 and dU = 1.45D = 1.00494 dL = 1.29, so reject H0 and conclude that significant positive autocorrelation existsTherefore the linear model is not the appropriate model to forecast salesTe

54、sting for Positive Autocorrelation(continued)Decision: reject H0 since D = 1.00494 dL0dU=1.452dL=1.29Reject H0Do not reject H0Inconclusive 12-80Inferences About the SlopeThe standard error of the regression slope coefficient (b1) is estimated bywhere:= Estimate of the standard error of the least squ

55、ares slope = Standard error of the estimate 12-81Excel OutputRegression StatisticsMultiple R0.76211R Square0.58082Adjusted R Square0.52842Standard Error41.33032Observations10ANOVAdfSSMSFSignificance FRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000Coefficient

56、sStandard Errort StatP-valueLower 95%Upper 95%Intercept98.2483358.033481.692960.12892-35.57720232.07386Square Feet0.109770.032973.329380.010390.033740.18580 12-82Comparing Standard Errors of the SlopeYXYX is a measure of the variation in the slope of regression lines from different possible samples

57、12-83Inference about the Slope: t Testt test for a population slopeIs there a linear relationship between X and Y?Null and alternative hypotheses H0: 1 = 0(no linear relationship) H1: 1 0(linear relationship does exist)Test statistic where: b1 = regression slope coefficient 1 = hypothesized slope Sb

58、1 = standard error of the slope 12-84House Price in $1000s(y)Square Feet (x)2451400312160027917003081875199110021915504052350324245031914252551700Estimated Regression Equation:The slope of this model is 0.1098 Is there a relationship between the square footage of the house and its sales price?Infere

59、nce about the Slope: t Test(continued) 12-85Inferences about the Slope: t Test ExampleH0: 1 = 0H1: 1 0From Excel output: CoefficientsStandard Errort StatP-valueIntercept98.2483358.033481.692960.12892Square Feet0.109770.032973.329380.01039tb1 12-86Inferences about the Slope: t Test ExampleH0: 1 = 0H1

60、: 1 0Test Statistic: t = 3.329There is sufficient evidence that square footage affects house priceFrom Excel output: Reject H0CoefficientsStandard Errort StatP-valueIntercept98.2483358.033481.692960.12892Square Feet0.109770.032973.329380.01039tb1Decision:Conclusion:Reject H0Reject H0a/2=.025-t/2Do n

人人文庫> 全部分類> 行業(yè)資料 > 信息產(chǎn)業(yè)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責(zé)。
6. 下載文件中如有侵權(quán)或不適當內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

商務(wù)數(shù)據(jù)分析與統(tǒng)計建模：chap1-1 一元回歸及其相關(guān)問題

文檔簡介

溫馨提示

最新文檔

評論

商務(wù)數(shù)據(jù)分析與統(tǒng)計建模：chap1-1 一元回歸及其相關(guān)問題

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔