I have this situation:
ID n post date el a b c d
100_left 4 50 10/11/2020 y 190 5.41 4 300
100_right 4 50 10/11/2020 n 5.4 5 200
101_left 4 50 10/11/2020 y 180 5.49 6 360
101_right 4 50 10/11/2020 n 5.48 6 180
102_left 4 50 10/11/2020 y 190 5.5 3 300
102_right 4 50 10/11/2020 n 5.46 5 200
103_left 4 50 10/11/2020 y 190 5.39 3 170
103_right 4 50 10/11/2020 n 5.44 3 360
I would like to use the same values of n_left
for n_right
(ID
), but just for columns from a
to d
. Like this:
ID n post date el a b c d
100_left 4 50 10/11/2020 y 190 5.41 4 300
100_right 4 50 10/11/2020 n 190 5.41 4 300
101_left 4 50 10/11/2020 y 180 5.49 6 360
101_right 4 50 10/11/2020 n 180 5.49 6 360
102_left 4 50 10/11/2020 y 190 5.5 3 300
102_right 4 50 10/11/2020 n 190 5.5 3 300
103_left 4 50 10/11/2020 y 190 5.39 3 170
103_right 4 50 10/11/2020 n 190 5.39 3 170
Sorry for not putting codes but I don't know where to start. OBS: This is just a sample, my original dataset is much bigger.
CodePudding user response:
We may group by the prefix part of 'ID' after removing the characters from _
with str_remove
, then mutate
across
the columns 'a' to 'd', by selecting the values where 'el' is 'y'
library(dplyr)
library(stringr)
df1 %>%
dplyr::group_by(grp = stringr::str_remove(ID, "_.*")) %>%
dplyr::mutate(across(a:d, ~ .[el == 'y'])) %>%
ungroup %>%
dplyr::select(-grp)
-output
# A tibble: 8 × 9
ID n post date el a b c d
<chr> <int> <int> <chr> <chr> <int> <dbl> <int> <int>
1 100_left 4 50 10/11/2020 y 190 5.41 4 300
2 100_right 4 50 10/11/2020 n 190 5.41 4 300
3 101_left 4 50 10/11/2020 y 180 5.49 6 360
4 101_right 4 50 10/11/2020 n 180 5.49 6 360
5 102_left 4 50 10/11/2020 y 190 5.5 3 300
6 102_right 4 50 10/11/2020 n 190 5.5 3 300
7 103_left 4 50 10/11/2020 y 190 5.39 3 170
8 103_right 4 50 10/11/2020 n 190 5.39 3 170
data
df1 <- structure(list(ID = c("100_left", "100_right", "101_left", "101_right",
"102_left", "102_right", "103_left", "103_right"), n = c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L), post = c(50L, 50L, 50L, 50L, 50L,
50L, 50L, 50L), date = c("10/11/2020", "10/11/2020", "10/11/2020",
"10/11/2020", "10/11/2020", "10/11/2020", "10/11/2020", "10/11/2020"
), el = c("y", "n", "y", "n", "y", "n", "y", "n"), a = c(190L,
NA, 180L, NA, 190L, NA, 190L, NA), b = c(5.41, 5.4, 5.49, 5.48,
5.5, 5.46, 5.39, 5.44), c = c(4L, 5L, 6L, 6L, 3L, 5L, 3L, 3L),
d = c(300L, 200L, 360L, 180L, 300L, 200L, 170L, 360L)),
class = "data.frame", row.names = c(NA,
-8L))