- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to create a sample from an R data frame if weights are assigned to the row values?
To create a random sample in R, we can use sample function but if the weight of the values is provided then we need to assign the probability of the values based on the weights. For example, if we have a data frame df that contains a column X with some values and another column Weight with the corresponding weights then a random sample of size 10 can be generated as follows −
df[sample(seq_len(nrow(df)),10,prob=df$Weight_x),]
Example
Consider the below data frame −
set.seed(1256) x<−rnorm(20,5,1) weight_x<−sample(1:10,20,replace=TRUE) df<−data.frame(x,weight_x) df
Output
x weight_x 1 4.126636 10 2 5.806501 1 3 5.768463 10 4 5.980315 8 5 6.593158 2 6 4.298533 10 7 6.196574 4 8 4.136517 5 9 4.504645 10 10 4.416107 6 11 5.257177 10 12 5.836453 1 13 5.334041 10 14 4.959786 2 15 3.406828 7 16 4.149746 2 17 4.657464 4 18 4.820102 10 19 5.401021 9 20 6.718216 6
Finding different samples using weight column −
Example
df[sample(seq_len(nrow(df)),5,prob=df$weight_x),]
Output
x weight_x 11 5.257177 10 19 5.401021 9 13 5.334041 10 10 4.416107 6 5 6.593158 2
Example
df[sample(seq_len(nrow(df)),3,prob=df$weight_x),]
Output
x weight_x 13 5.334041 10 3 5.768463 10 18 4.820102 10
Example
df[sample(seq_len(nrow(df)),7,prob=df$weight_x),]
Output
x weight_x 9 4.504645 10 19 5.401021 9 12 5.836453 1 5 6.593158 2 15 3.406828 7 11 5.257177 10 6 4.298533 10
Example
df[sample(seq_len(nrow(df)),10,prob=df$weight_x),]
Output
x weight_x 4 5.980315 8 9 4.504645 10 19 5.401021 9 1 4.126636 10 13 5.334041 10 12 5.836453 1 11 5.257177 10 18 4.820102 10 10 4.416107 6 3 5.768463 10
Example
df[sample(seq_len(nrow(df)),9,prob=df$weight_x),]
Output
x weight_x 8 4.136517 5 11 5.257177 10 7 6.196574 4 4 5.980315 8 9 4.504645 10 6 4.298533 10 19 5.401021 9 18 4.820102 10 16 4.149746 2
Example
df[sample(seq_len(nrow(df)),4,prob=df$weight_x),]
Output
x weight_x 1 4.126636 10 6 4.298533 10 11 5.257177 10 7 6.196574 4
Example
df[sample(seq_len(nrow(df)),15,prob=df$weight_x),]
Output
x weight_x 3 5.768463 10 15 3.406828 7 19 5.401021 9 16 4.149746 2 9 4.504645 10 8 4.136517 5 11 5.257177 10 10 4.416107 6 18 4.820102 10 6 4.298533 10 4 5.980315 8 17 4.657464 4 1 4.126636 10 20 6.718216 6 13 5.334041 10
Example
df[sample(seq_len(nrow(df)),2,prob=df$weight_x),]
Output
x weight_x 11 5.257177 10 13 5.334041 10
Example
df[sample(seq_len(nrow(df)),12,prob=df$weight_x),]
Output
x weight_x 1 4.126636 10 3 5.768463 10 8 4.136517 5 11 5.257177 10 10 4.416107 6 6 4.298533 10 13 5.334041 10 4 5.980315 8 20 6.718216 6 12 5.836453 1 18 4.820102 10 19 5.401021 9
Example
df[sample(seq_len(nrow(df)),18,prob=df$weight_x),]
Output
x weight_x 5 6.593158 2 4 5.980315 8 6 4.298533 10 20 6.718216 6 15 3.406828 7 3 5.768463 10 9 4.504645 10 10 4.416107 6 13 5.334041 10 19 5.401021 9 8 4.136517 5 11 5.257177 10 18 4.820102 10 1 4.126636 10 7 6.196574 4 12 5.836453 1 17 4.657464 4 16 4.149746 2
Advertisements