- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Differentiate between categorical and numerical independent variables in R.
For categorical variable, each level is considered as an independent variable and is recognized by factor function. On the other hand, the numerical independent variable is either continuous or discrete in nature.
Check out the Example given below for linear regression model summary to understand the difference between categorical and numerical independent variables.
Example
Following snippet creates a sample data frame −
x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) df
The following dataframe is created
x y 1 1 1 2 4 5 3 3 10 4 3 4 5 1 6 6 3 4 7 1 2 8 1 10 9 1 6 10 2 5 11 1 2 12 3 4 13 0 5 14 1 5 15 4 5 16 4 7 17 3 5 18 2 4 19 1 3 20 2 6
To create linear model for data in df and find the model summary on the above created data frame, add the following code to the above snippet −
x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) Model_1<-lm(y~x,data=df) summary(Model_1)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Call: lm(formula = y ~ x, data = df) Residuals: Min 1Q Median 3Q Max -3.549 -1.313 -0.503 1.128 5.451 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 4.168 1.013 4.11 0.00065 *** x 0.382 0.426 0.90 0.38249 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.29 on 18 degrees of freedom Multiple R-squared: 0.0426, Adjusted R-squared: -0.0106 F-statistic: 0.801 on 1 and 18 DF, p-value: 0.382
To create linear model for data in df with as a factor variable and find the model summary on the above created data frame, add the following code to the above snippet −
x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) Model_1<-lm(y~x,data=df) Model_2<-lm(y~factor(x),data=df) summary(Model_2)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Call: lm(formula = y ~ factor(x), data = df) Residuals: Min 1Q Median 3Q Max -3.375 -1.400 -0.533 1.083 5.625 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 5.00e+00 2.50e+00 2.00 0.064 . factor(x)1 -6.25e-01 2.65e+00 -0.24 0.817 factor(x)2 -3.92e-15 2.89e+00 0.00 1.000 factor(x)3 4.00e-01 2.74e+00 0.15 0.886 factor(x)4 6.67e-01 2.89e+00 0.23 0.820 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.5 on 15 degrees of freedom Multiple R-squared: 0.0526, Adjusted R-squared: -0.2 F-statistic: 0.208 on 4 and 15 DF, p-value: 0.93