Consider the following data set:
df <- data.frame(id=1:10,
v1=c(2.35456185,1.44501001,2.98712312,0.12345123,0.96781234,
1.23934551,5.00212233,4.34120000,1.23443213,0.00112233),
v2=c(0.22222222,0.00123456,2.19024869,0.00012000,0.00029848,
0.12348888,0.46236577,0.85757000,0.05479729,0.00001202))
My intention is to round the values in v1
and v2
to the nearest one decimal place (10% of observation), two decimals (40% of observations), and three decimal places (50% of observations) randomly. I can use the round()
function to round numbers to certain decimal places uniformly. In my case, however, it's not uniform. Thank you in advance!
Example of output needed (of course mine is not random):
id v1 v2
1 2.3 0.2
2 1.45 0
3 2.99 2.19
4 0.12 0
5 0.97 0
6 1.239 0.123
7 5.002 0.462
8 4.341 0.858
9 1.234 0.055
10 0.001 0
CodePudding user response:
Update: Addressing the probabilities:
library(dplyr)
df %>%
rowwise() %>%
mutate(v2 = round(v1,sample(1:3, 1, prob = c(0.1, 0.4, 0.5))))
id v1 v2
<int> <dbl> <dbl>
1 1 2.35 2.35
2 2 1.45 1.44
3 3 2.99 2.99
4 4 0.123 0.12
5 5 0.968 1
6 6 1.24 1.24
7 7 5.00 5.00
8 8 4.34 4.34
9 9 1.23 1.2
10 10 0.00112 0
Here we round row wise randomly between 1 and 3:
library(dplyr)
df %>%
rowwise() %>%
mutate(V1 = round(v2,sample(1:3, 1)))
id v1 V2
<int> <dbl> <dbl>
1 1 2.35 2.36
2 2 1.45 1.44
3 3 2.99 2.99
4 4 0.123 0.123
5 5 0.968 0.968
6 6 1.24 1.24
7 7 5.00 5.00
8 8 4.34 4.34
9 9 1.23 1.23
10 10 0.00112 0.001
CodePudding user response:
We may create a grouping with sample
based on the prob
bablity, and then round
the v1
column based on the value of the group
library(dplyr)
df %>%
group_by(grp = sample(1:3, size = n(), replace = TRUE,
prob = c(0.10, 0.4, 0.5))) %>%
mutate(v1 = round(v1, first(grp))) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 10 × 2
id v1
<int> <dbl>
1 1 2.36
2 2 1.44
3 3 2.99
4 4 0.123
5 5 0.97
6 6 1.24
7 7 5.00
8 8 4.3
9 9 1.23
10 10 0
For multiple columns, use across
to loop over
df %>%
mutate(across(v1:v2, ~ round(.x, sample(1:3, size = n(),
replace = TRUE, prob = c(0.10, 0.40, 0.50)))))
Or we pass the sample
d output in digits
argument of round
directly
df$v1 <- with(df, round(v1, sample(1:3, size = nrow(df),
replace = TRUE, prob = c(0.10, 0.4, 0.5))))