Statistics(16)
-
Kaggle 데이터 분석
캐글의 Binary Classification with a Bank Churn Dataset 을 가지고 통계적 가설 검정과 EDA (탐색적 데이터 분석)을 진행하였습니다. 신용평가에 영향을 미치는 주요 요인은 무엇일까? 신용점수에 영향을 미치는 요인을 분석해 보고자, 먼저 신용도에 따라서 평균을 기준으로 평균보다 높은 그룹(A), 평균보다 낮은 그룹(B)로 나누고, 두 그룹 간 평균값을 비교하는 독립표본 t-검정을 진행했습니다. 독립변수: Salary, Age, Balance, 종속변수: CreditScore 가설1 : A그룹과 B그룹의 급여에 차이가 있는가? 가설2 : A그룹과 B그룹의 나이에 차이가 있는가? 가설3 : A그룹과 B그룹의 잔고에 차이가 있는가? 귀무가설 (H0) : A그룹과 B그룹의 급..
2024.01.16 -
Hypothesis Testing in a Linear Regression: P-value
Another approach to Hypothesis Testing : the p-value approach Step 1 : Formulate Hypothesis Step 2 : Calculate the t-statistic Step 3 : Calculate the p-value p-value = 2*T.DIST(-|t-statistic|, residual df, TRUE) = 0.4853 Conclusion: Do not reject the Null hypothesis We cannot reject the belief held by salespeople
2024.01.13 -
Hypothesis Testing in a Linear Regression: Confidence Intervals
The Confidence interval approach to Hypothesis Testing Step 1 : Formulate Hypothesis Step 2 : Consider the 95% confidence interval for β2 Conclusion: Since 500 falls in the confidence interval, hence do not reject the Null hypothesis. We cannot reject the H0 for any value that is in the confidence interval. p-values and their importance in interpreting regression results Failure to reject ..
2024.01.13 -
Hypothesis Testing in a Linear Regression
Hypothesis Testing in a regression context: The t-cutoff approach Step 1 : Formulate Hypothesis Step 2 : Calculate the t-statistic Step 3 : Rejection region for the t-statistic Step 4 : Check whether t-statistic falls in the rejection region
2024.01.13 -
Residual degree of freedom
Residual degree of freedom - the number of observations in the data set that are free to vary after estimating the parameters (coefficients) of the model. - It is a measure of the effective sample size for estimating the variability of the residuals or errors. The formula for calculating the residual degrees of freedom in a regression model is: 1) Somple linear regression: Residual degrees of fr..
2024.01.13 -
R-square, Adjusted R-square
R-square - Increasing the number of X-variables, increases R-square. - Varies from 0 to 1 - Proportion of variation in the Y variable explained by the regression model. - Values closer to 1 indicate a good fit. ‘Overall’ variation in Y variable : ‘Total’ Sum of Squares ‘Explained’ variation in Y variable : ‘Regression’ Sum of Squares ‘Unexplained’ variation in Y variable : ‘Residual’ Sum of Squa..
2024.01.12