How to find the column variance if some columns are categorical in R data frame?


To find the column variance if some columns are categorical in R data frame, we can follow the below steps −

  • First of all, create a data frame.

  • Then, use numcolwise function from plyr package to find the column variance if some columns are categorical.

Example

Create the data frame

Let’s create a data frame as shown below −

Group<-sample(c("I","II","III","IV"),25,replace=TRUE)
Num1<-sample(1:50,25)
Num2<-sample(1:50,25)
df1<-data.frame(Group,Num1,Num2)
df1

Output

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

   Group Num1 Num2
1  II    11    11
2  III   29    45
3  II     2     5
4  IV     3    13
5  II     9    30
6  IV    40    18
7  III   22    20
8  IV    28    37
9  III   50    42
10 I     10    43
11 II    18    38
12 II    14    31
13 IV     1    19
14 IV    24    35
15 II    15    48
16 IV    36    12
17 IV    19     1
18 I     48    50
19 IV    43     7
20 III   26    15
21 I     35    46
22 III   39    34
23 IV    38    28
24 IV    23     8
25 I     32    47

Find the column variance if some columns are categorical

Using numcolwise function from plyr package to find the column variance of numerical columns in the data frame df1 −

Group<-sample(c("I","II","III","IV"),25,replace=TRUE)
Num1<-sample(1:50,25)
Num2<-sample(1:50,25)
df1<-data.frame(Group,Num1,Num2)
library(plyr)
numcolwise(var)(df1)

Output

     Num1    Num2
1 206.0833 242.8933

Example 2

Create the data frame

Let’s create a data frame as shown below −

Categories<-sample(c("First","Second","Third"),25,replace=TRUE)
Score<-sample(1:10,25,replace=TRUE)
Price<-sample(1:5,25,replace=TRUE)
df2<-data.frame(Categories,Score,Price)
df2

Output

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

   Categories Score Price
1  First       1     3
2  Third       5     3
3  Second      6     1
4  First       3     3
5  First       2     2
6  Second      2     4
7  Third       6     5
8  Third       7     4
9  Third       6     4
10 Second      7     4
11 First       7     4
12 Second      6     2
13 First       9     3
14 Second      8     5
15 Third       6     4
16 Third       2     5
17 First      10     1
18 First       1     5
19 Second      7     4
20 First       1     2
21 Third      10     3
22 Third      10     5
23 Second      8     3
24 Second     10     2
25 Second      9     1

Find the column variance if some columns are categorical

Using numcolwise function from plyr package to find the column variance of numerical columns in the data frame df2 −

Categories<-sample(c("First","Second","Third"),25,replace=TRUE)
Score<-sample(1:10,25,replace=TRUE)
Price<-sample(1:5,25,replace=TRUE)
df2<-data.frame(Categories,Score,Price)
library(plyr)
numcolwise(var)(df2)

Output

    Score   Price
1 9.456667 1.71

Updated on: 08-Nov-2021

361 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements