Found 2038 Articles for R Programming

How to create the boxplots in base R ordered by means?

Nizamuddin Siddiqui
Updated on 08-Dec-2020 06:35:20

78 Views

To create the boxplots in base R ordered by means, we first need to order the categorical column based on the mean of the numerical column and then the boxplot will be created.For example, if we have a data frame df that has a categorical column x and a numerical column y then the boxplot ordered by means can be created by using df$x

How to find the column number of a string column based on a string match in an R data frame?

Nizamuddin Siddiqui
Updated on 08-Dec-2020 06:32:00

308 Views

A data frame might be very long and contain columns with only string values as well as numerical values. While doing the analysis, we might want to check which columns contain a particular string value. For example, if we have a column with string values as A, B, and C and we want to check which column contains a value “A” then apply function can be used as shown in the below examples.ExampleConsider the below data frame − Live Demox1

How to find the mean of a numerical column by two categorical columns in an R data frame?

Nizamuddin Siddiqui
Updated on 08-Dec-2020 06:28:16

883 Views

If we have two categorical columns along with a numerical column in an R data frame then we can find the mean of the numerical column by using the combination of the categorical columns with the help of aggregate function. For example, if a data frame df contains a numerical column X and two categorical columns C1 and C2 then the mean of X can be found for the combinations of C1 and C2 by using the below command −aggregate(X~C1+C2,data=df,FUN="mean")ExampleConsider the below data frame −C1

How to add a variable to the model in base R?

Nizamuddin Siddiqui
Updated on 07-Dec-2020 06:25:50

805 Views

If we want to add variables to the model in base R then update function can be used. The update function will update the previous modle by adding the new variable and this variable can be a single variable as well as an interaction of the two or more also any possible transformation of the existing variables.ExampleConsider the below data frame − Live Demox1

How to find the range for 95% of all values in an R vector?

Nizamuddin Siddiqui
Updated on 07-Dec-2020 06:21:52

1K+ Views

The range for 95% of all values actually represents the middle 95% values. Therefore, we can find the 2.5th percentile and 97.5th percentile so that the range for middle 95% can be obtained. For this purpose, we can use quantile function in R. To find the 2.5th percentile, we would need to use the probability = 0.025 and for the 97.5th percentile we can use probability = 0.0975.Example Live Demox1

How to convert NA’s in sequence to a single NA in an R vector?

Nizamuddin Siddiqui
Updated on 07-Dec-2020 06:19:31

81 Views

Sometimes values are missing in a sequence and R program records them as NA (Not Available). In this type of situation, we might want to replace consecutive NA records with single NA value. This can be done by using is.na along with diff function as shown in the below examples.Example Live Demox1

How to create side by side histograms in base R?

Nizamuddin Siddiqui
Updated on 07-Dec-2020 06:17:24

2K+ Views

To create side by side histograms in base R, we first need to create a histogram using hist function by defining a larger limit of X-axis with xlim argument. After that we can create another histogram that has the larger mean and smaller standard deviation so that the bars do not clash with each other and add=T argument must also be added inside the second hist function.Example Live Demohist(rnorm(5000,mean=5,sd=2.1),col="green",xlim=c(1,20))OutputExamplehist(rnorm(5000,mean=15,sd=1.25),col="red",add=T)Output

How to identify duplicate values in a column of matrix in R?

Nizamuddin Siddiqui
Updated on 07-Dec-2020 06:15:50

355 Views

We can easily identify duplicate values in a matrix by using duplicated function but it does not specify that the first occurrence is also duplicated. Therefore, we need to use it with OR sign | and the argument fromLast = TRUE of duplicated function so that the first occurrence of the duplicated values will be also identified as duplicate.Example Live DemoM1

How to compare two columns in an R data frame for an exact match?

Nizamuddin Siddiqui
Updated on 07-Dec-2020 06:12:05

6K+ Views

Sometimes analysis requires the user to check if values in two columns of an R data frame are exactly the same or not, this is helpful to analyze very large data frames if we suspect the comparative values in two columns. This can be easily done with the help of ifelse function.ExampleConsider the below data frame − Live Demox1

How to create a scatterplot with regression line using ggplot2 with 0 intercept and slope equals to 1 in R?

Nizamuddin Siddiqui
Updated on 07-Dec-2020 06:08:44

233 Views

To create a regression line with 0 intercept and slope equals to 1 using ggplot2, we can use geom_abline function but we need to pass the appropriate limits for the x axis and y axis values. For example, if we have two columns x and y in a data frame df and both have ranges starting from -1 to 1 then the scatterplot with regression line with 0 intercept and slope equals to 1 can be created as −ggplot(df,aes(x,y))+geom_point()+geom_abline()+lims(x=c(-1,1),y=c(-1,1))ExampleConsider the below data frame − Live Demox

Advertisements