Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to find the statistical summary of an R data frame with all the descriptive statistics?
When analyzing data in R, the default summary() function provides basic statistics like minimum, quartiles, mean, and maximum. However, for comprehensive statistical analysis, we need additional descriptive measures such as variance, standard deviation, skewness, and kurtosis. The basicStats() function from the fBasics package provides all these descriptive statistics in one output.
Loading Required Package
First, install and load the fBasics package ?
library(fBasics)
Example 1: mtcars Dataset
Let's examine the built-in mtcars dataset ?
data(mtcars) head(mtcars, 10)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Complete Statistical Summary
basicStats(mtcars)
mpg cyl disp hp drat
nobs 32.000000 32.000000 32.000000 32.000000 32.000000
NAs 0.000000 0.000000 0.000000 0.000000 0.000000
Minimum 10.400000 4.000000 71.100000 52.000000 2.760000
Maximum 33.900000 8.000000 472.000000 335.000000 4.930000
1. Quartile 15.425000 4.000000 120.825000 96.500000 3.080000
3. Quartile 22.800000 8.000000 326.000000 180.000000 3.920000
Mean 20.090625 6.187500 230.721875 146.687500 3.596563
Median 19.200000 6.000000 196.300000 123.000000 3.695000
Sum 642.900000 198.000000 7383.100000 4694.000000 115.090000
SE Mean 1.065424 0.315709 21.909473 12.120317 0.094519
LCL Mean 17.917679 5.543607 186.037211 121.967950 3.403790
UCL Mean 22.263571 6.831393 275.406539 171.407050 3.789335
Variance 36.324103 3.189516 15360.799829 4700.866935 0.285881
Stdev 6.026948 1.785922 123.938694 68.562868 0.534679
Skewness 0.610655 -0.174612 0.381657 0.726024 0.265904
Kurtosis -0.372766 -1.762120 -1.207212 -0.135551 -0.714701
Example 2: trees Dataset
Let's analyze the trees dataset which contains girth, height, and volume measurements ?
data(trees) head(trees, 10)
Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9
basicStats(trees)
Girth Height Volume
nobs 31.000000 31.000000 31.000000
NAs 0.000000 0.000000 0.000000
Minimum 8.300000 63.000000 10.200000
Maximum 20.600000 87.000000 77.000000
1. Quartile 11.050000 72.000000 19.400000
3. Quartile 15.250000 80.000000 37.300000
Mean 13.248387 76.000000 30.170968
Median 12.900000 76.000000 24.200000
Sum 410.700000 2356.000000 935.300000
SE Mean 0.563626 1.144411 2.952324
LCL Mean 12.097309 73.662800 24.141517
UCL Mean 14.399466 78.337200 36.200418
Variance 9.847914 40.600000 270.202796
Stdev 3.138139 6.371813 16.437846
Skewness 0.501056 -0.356877 1.013274
Kurtosis -0.710941 -0.723368 0.246039
Key Statistics Explained
| Statistic | Description |
|---|---|
| nobs | Number of observations |
| Variance | Measure of data spread |
| Stdev | Standard deviation (square root of variance) |
| Skewness | Measure of asymmetry (>0 = right-skewed, <0 = left-skewed) |
| Kurtosis | Measure of tail heaviness (>0 = heavy tails, <0 = light tails) |
| LCL/UCL Mean | Lower and Upper Confidence Limits for the mean |
Conclusion
The basicStats() function from fBasics provides comprehensive descriptive statistics including variance, standard deviation, skewness, and kurtosis. This gives a complete statistical picture beyond the basic summary() function, making it invaluable for thorough data analysis.
