Rstudio Multiple Regression How You Know Which Is Important
Multiple (Linear) Regression
R provides comprehensive back up for multiple linear regression. The topics beneath are provided in guild of increasing complexity.
Fitting the Model
# Multiple Linear Regression Example
fit <- lm(y ~ x1 + x2 + x3, data=mydata)
summary(fit) # bear witness results
# Other useful functions
coefficients(fit) # model coefficients
confint(fit, level=0.95) # CIs for model parameters
fitted(fit) # predicted values
residuals(fit) # residuals
anova(fit) # anova table
vcov(fit) # covariance matrix for model parameters
influence(fit) # regression diagnostics
Diagnostic Plots
Diagnostic plots provide checks for heteroscedasticity, normality, and influential observerations.
# diagnostic plots
layout(matrix(c(one,ii,3,iv),ii,2)) # optional 4 graphs/folio
plot(fit)
click to view
For a more comprehensive evaluation of model fit see regression diagnostics or the exercises in this interactive class on regression.
Comparison Models
You can compare nested models with the anova( ) function. The following lawmaking provides a simultaneous test that x3 and x4 add to linear prediction above and across x1 and x2.
# compare models
fit1 <- lm(y ~ x1 + x2 + x3 + x4, data=mydata)
fit2 <- lm(y ~ x1 + x2)
anova(fit1, fit2)
Cross Validation
Yous can do K-Fold cross-validation using the cv.lm( ) function in the DAAG package.
# 1000-fold cross-validation
library(DAAG)
cv.lm(df=mydata, fit, m=3) # 3 fold cross-validation
Sum the MSE for each fold, split by the number of observations, and have the foursquare root to get the cross-validated standard fault of estimate.
You tin can assess R2 shrinkage via Yard-fold cantankerous-validation. Using the crossval() function from the bootstrap package, do the post-obit:
fit <- lm(y~x1+x2+x3,data=mydata) library(bootstrap) # matrix of predictors results <- crossval(X,y,theta.fit,theta.predict,ngroup=10)# Assessing R2 shrinkage using ten-Fold Cross-Validation
# define functions
theta.fit <- office(10,y){lsfit(x,y)}
theta.predict <- part(fit,10){cbind(one,10)%*%fit$coef}
X <- every bit.matrix(mydata[c("x1","x2","x3")])
# vector of predicted values
y <- equally.matrix(mydata[c("y")])
cor(y, fit$fitted.values)**ii # raw R2
cor(y,results$cv.fit)**two # cross-validated R2
Variable Pick
Selecting a subset of predictor variables from a larger set (e.g., stepwise selection) is a controversial topic. You can perform stepwise pick (forward, backward, both) using the stepAIC( ) part from the MASS bundle. stepAIC( ) performs stepwise model option by verbal AIC.
# Stepwise Regression
library(MASS)
fit <- lm(y~x1+x2+x3,information=mydata)
pace <- stepAIC(fit, direction="both")
step$anova # display results
Alternatively, you tin perform all-subsets regression using the leaps( ) office from the leaps parcel. In the following code nbest indicates the number of subsets of each size to report. Here, the 10 best models will be reported for each subset size (1 predictor, 2 predictors, etc.).
# All Subsets Regression
library(leaps)
adhere(mydata)
leaps<-regsubsets(y~x1+x2+x3+x4,data=mydata,nbest=x)
# view results
summary(leaps)
# plot a table of models showing variables in each model.
# models are ordered by the pick statistic.
plot(leaps,scale="r2")
# plot statistic by subset size
library(car)
subsets(leaps, statistic="rsq")
click to view
Other options for plot( ) are bic, Cp, and adjr2. Other options for plotting with
subset( ) are bic, cp, adjr2, and rss.
Relative Importance
The relaimpo bundle provides measures of relative importance for each of the predictors in the model. Encounter help(calc.relimp) for details on the four measures of relative importance provided.
# Bootstrap Measures of Relative Importance (1000 samples) # Calculate Relative Importance for Each Predictor
library(relaimpo)
calc.relimp(fit,type=c("lmg","last","offset","pratt"),
rela=TRUE)
kick <- boot.relimp(fit, b = 1000, type = c("lmg",
"terminal", "beginning", "pratt"), rank = Truthful,
diff = Truthful, rela = TRUE)
booteval.relimp(kick) # impress result
plot(booteval.relimp(boot,sort=TRUE)) # plot consequence
click to view
Graphic Enhancements
The car package offers a broad variety of plots for regression, including added variable plots, and enhanced diagnostic and Scatterplots.
Going Further
Nonlinear Regression
The nls package provides functions for nonlinear regression. See John Fox's Nonlinear Regression and Nonlinear Least Squares for an overview. Huet and colleagues' Statistical Tools for Nonlinear Regression: A Practical Guide with S-PLUS and R Examples is a valuable reference book.
Robust Regression
There are many functions in R to help with robust regression. For example, y'all can perform robust regression with the rlm( ) office in the MASS package. John Fox's (who else?) Robust Regression provides a practiced starting overview. The UCLA Statistical Computing website has Robust Regression Examples.
The robust package provides a comprehensive library of robust methods, including regression. The robustbase package also provides basic robust statistics including model selection methods. And David Olive has provided an detailed online review of Applied Robust Statistics with sample R code.
To Do
This form in automobile learning in R includes excercises in multiple regression and cross validation.
Source: https://www.statmethods.net/stats/regression.html
0 Response to "Rstudio Multiple Regression How You Know Which Is Important"
Post a Comment