How to add a variable to the model in base R?


If we want to add variables to the model in base R then update function can be used. The update function will update the previous modle by adding the new variable and this variable can be a single variable as well as an interaction of the two or more also any possible transformation of the existing variables.

Example

Consider the below data frame −

 Live Demo

x1<-rnorm(20)
x2<-rnorm(20,5,1.14)
x3<-rnorm(20,5,0.58)
y1<-rnorm(20,20,2.25)
df1<-data.frame(x1,x2,x3,y1)
df1

Output

      x1       x2          x3    y1
1 0.23523969 7.577512 5.443941 19.76642
2 0.11106994 7.504542 3.897426 19.65692
3 -0.09726361 7.277049 5.335444 19.27655
4 0.26056059 3.933092 4.203294 22.50656
5 -0.78472270 5.375368 5.480062 19.56555
6 -0.14489152 4.310053 5.704146 17.52129
7 -0.96409135 5.145660 4.753728 22.70288
8 -1.04832947 3.954133 4.820469 21.58309
9 -0.65659070 3.994727 4.791794 19.09328
10 0.88016095 6.480780 4.364470 18.50680
11 0.93215306 4.410714 4.664997 14.50948
12 1.49864968 5.172408 5.121840 21.58837
13 1.63126398 4.313327 4.389091 16.06222
14 0.33486400 4.756670 5.012716 16.63648
15 1.20832732 5.942533 6.097934 24.82682
16 1.27126998 6.753667 3.977962 22.59800
17 -0.42438014 4.766934 4.684150 19.70354
18 0.18121480 6.760182 5.444401 25.38505
19 -2.73192870 5.247787 5.305925 20.75227
20 -0.44498078 5.203272 5.877478 19.10085

Creating a linear regression model to predict y1 by using x1 and x2 −

Example

Model_1<-lm(y1~x1+x2,data=df1)
summary(Model_1)

Output

Call:
lm(formula = y1 ~ x1 + x2, data = df1)
Residuals:
   Min 1   Q       Median 3Q    Max
-4.4836 -1.8695 -0.5435 2.1606 4.8678
Coefficients:
            Estimate Std.  Error t value Pr(>|t|)
(Intercept) 16.2664 2.9395 5.534 3.64e-05 ***
x1          -0.4001 0.6179 -0.647 0.526
x2          0.7027  0.5289 1.329 0.202
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.776 on 17 degrees of freedom
Multiple R-squared: 0.1029, Adjusted R-squared: -0.002624
F-statistic: 0.9751 on 2 and 17 DF, p-value: 0.3973

Creating the model by adding x3 −

Example

Model_1<-lm(update(y1~x1+x2,~.+x3,data=df1))
summary(Model_1)

Output

Call:
lm(formula = update(y1 ~ x1 + x2, ~. + x3, data = df1))
Residuals:
   Min       1Q    Median 3Q    Max
-4.4014 -2.0418 -0.6401 2.3419 4.1880
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.5651 5.9847 2.267 0.0376 *
x1          -0.3204 0.6498 -0.493 0.6287
x2          0.6838 0.5418 1.262 0.2251
x3          0.5635 1.0796 0.522 0.6089
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.838 on 16 degrees of freedom
Multiple R-squared: 0.1179, Adjusted R-squared: -0.04746
F-statistic: 0.7131 on 3 and 16 DF, p-value: 0.5584

Creating the model by adding x3 and the interaction between x1 and x2 −

Example

Model_2<-lm(update(y1~x1+x2,~.+x1*x2+x3,data=df1))
summary(Model_2)

Output

Call:
lm(formula = update(y1 ~ x1 + x2, ~. + x1 * x2 + x3, data = df1))
Residuals:
   Min    1Q     Median    3Q    Max
-3.1970 -1.5739 -0.1827 0.9408 4.5058
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.9974 5.5099 2.540 0.0226 *
x1          -8.9403 4.4024 -2.031 0.0604 .
x2          0.3321 0.5293 0.627 0.5398
x3          0.7861 0.9996 0.786 0.4439
x1:x2       1.6809 0.8505 1.976 0.0668 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.611 on 15 degrees of freedom
Multiple R-squared: 0.3002, Adjusted R-squared: 0.1135
F-statistic: 1.608 on 4 and 15 DF, p-value: 0.2236

Updated on: 07-Dec-2020

800 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements