I would like to know if there is a way to negate a random fraction of the values in a single column based on the values in another column in R. In the example dataframe below, I'd like to be able to randomly select 10% of the exposure values to be the same magnitude, but negative values, but only for the rows that have "Toy" listed as an object.
df <- data.frame(ChildID=c("M1", "F1", "F1", "F2", "M2", "M3", "M3", "M3", "M3", "F3", "F1", "F2", "M2", "M3"),
object=c("Mouth", "Toy", "Mouth", "Toy", "Toy", "Toy", "Mouth", "Toy", "Toy", "Mouth", "Toy", "Toy", "Toy", "Toy"),
exposure=c(0.1, 0.2, 0.1, 0.05, 0.6, 0.1, 0.4, 0.1, 1.0, 0.5, 0.1, 0.4, 0.1, 1.0))
Here's what I would like the result to look like, for example.
Child ID | object | exposure |
---|---|---|
M1 | Mouth | 0.1 |
F1 | Toy | 0.2 |
F1 | Mouth | 0.1 |
F2 | Toy | 0.05 |
M2 | Toy | -0.6 |
M3 | Toy | 0.1 |
M3 | Mouth | 0.4 |
M3 | Toy | 0.1 |
M3 | Toy | 1.0 |
F3 | Mouth | 0.5 |
F1 | Toy | 0.1 |
F2 | Toy | 0.4 |
M2 | Toy | 0.1 |
M3 | Toy | 1.0 |
I tried using dplyr, but I can't filter it because that removes the other rows that I don't want to mutate. I realize this is a basic question, but I'm pulling my hair out trying to find the right work around. Thanks so much!
CodePudding user response:
You can get the rows with Toy using ==
and which
, sample
them and use them to subset and exchange the sign of exposure.
i <- which(df$object == "Toy")
i <- sample(i, round(length(i) / 10)) #In case 10% of Toy
#i <- sample(i, round(nrow(df) / 10)) #In case 10% of all
df$exposure[i] <- -df$exposure[i]
i
#[1] 12
df
# ChildID object exposure
#1 M1 Mouth 0.10
#2 F1 Toy 0.20
#3 F1 Mouth 0.10
#4 F2 Toy 0.05
#5 M2 Toy 0.60
#6 M3 Toy 0.10
#7 M3 Mouth 0.40
#8 M3 Toy 0.10
#9 M3 Toy 1.00
#10 F3 Mouth 0.50
#11 F1 Toy 0.10
#12 F2 Toy -0.40
#13 M2 Toy 0.10
#14 M3 Toy 1.00
Benchmark
library(tidyverse)
bench::mark(check=FALSE,
GKi = {i <- which(df$object == "Toy")
i <- sample(i, round(length(i) / 10)) #In case 10% of Toy
df$exposure[i] <- -df$exposure[i]
df},
tmfmnk = {df %>%
mutate(rowid = 1:n(),
exposure_new = if_else(rowid %in% sample(rowid[object == "Toy"], floor((n()*10)/100)), -exposure, exposure)) %>%
select(-rowid)},
AndS. = {df |>
mutate(id = row_number()) |>
filter(object == "Toy") |>
slice_sample(prop = 0.1) |>
mutate(exposure = -exposure) |>
(\(d) bind_rows(d, filter(df, !row_number() %in% d$id)))()|>
select(-id)})
# expression min median itr/s…¹ mem_a…² gc/se…³ n_itr n_gc total…⁴ result
# <bch:expr> <bch:tm> <bch:t> <dbl> <bch:b> <dbl> <int> <dbl> <bch:t> <list>
#1 GKi 11.23µs 12.99µs 74916. 2.49KB 60.0 9992 8 133ms <NULL>
#2 tmfmnk 1.49ms 1.55ms 639. 12.52KB 40.6 268 17 419ms <NULL>
#3 AndS. 4.93ms 5.02ms 198. 21.88KB 50.9 70 18 353ms <NULL>
GKi is about 100 times faster than tmfmnk and 350 times than AndS, allocates less memory and uses 125 characters, compared to 166 (tmfmnk) and 206 (AndS.).
CodePudding user response:
You could sample the rows you want with filter
and slice_sample
and then bind them to the original data, while removing the original rows.
library(tidyverse)
df |>
mutate(id = row_number()) |>
filter(object == "Toy") |>
slice_sample(prop = 0.1) |>
mutate(exposure = -exposure) |>
(\(d) bind_rows(d, filter(df, !row_number() %in% d$id)))()|>
select(-id)
#> ChildID object exposure
#> 1 M3 Toy -1.00
#> 2 M1 Mouth 0.10
#> 3 F1 Toy 0.20
#> 4 F1 Mouth 0.10
#> 5 F2 Toy 0.05
#> 6 M2 Toy 0.60
#> 7 M3 Toy 0.10
#> 8 M3 Mouth 0.40
#> 9 M3 Toy 0.10
#> 10 M3 Toy 1.00
#> 11 F3 Mouth 0.50
#> 12 F1 Toy 0.10
#> 13 F2 Toy 0.40
#> 14 M2 Toy 0.10
CodePudding user response:
One option might be:
df %>%
mutate(rowid = 1:n(),
exposure_new = if_else(rowid %in% sample(rowid[object == "Toy"], floor((n()*10)/100)), -exposure, exposure)) %>%
select(-rowid)
ChildID object exposure exposure_new
1 M1 Mouth 0.10 0.10
2 F1 Toy 0.20 0.20
3 F1 Mouth 0.10 0.10
4 F2 Toy 0.05 0.05
5 M2 Toy 0.60 0.60
6 M3 Toy 0.10 0.10
7 M3 Mouth 0.40 0.40
8 M3 Toy 0.10 0.10
9 M3 Toy 1.00 1.00
10 F3 Mouth 0.50 0.50
11 F1 Toy 0.10 0.10
12 F2 Toy 0.40 -0.40
13 M2 Toy 0.10 0.10
14 M3 Toy 1.00 1.00
If the proportion should be computed from rows with a specific value only:
df %>%
mutate(rowid = 1:n(),
exposure_new = if_else(rowid %in% sample(rowid[object == "Toy"], floor((sum(object == "Toy")*10)/100)), -exposure, exposure)) %>%
select(-rowid)