I'm struggling with the following task. Here is a dummy example of the dataset I'm working with (dput output):
structure(list(SHOPID = c("11T", "2T1", "33A", "14B", "29O"), PPCODE = c(4412,
3567, 4412, 3567, 5123), PRICE = c(1.45, 1.21, 1.45, 1.21, 1.01
), TYPE = c("LAR", "SMA", "SMA", "SMA", "LAR"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
The task is to write such a script that if some ppcode is in the type SMA (small), but not in LAR (large) additional row should be added where the data related to that ppcode will be the same, but with type "LAR". So, in this provided example the line where SHOPID=NA, PPCODE=3567, PRICE= 1.21, TYPE=LAR should be added at the bottom. This should work not only for this particular case but and in general on the bigger dataset. I prefer dplyr library and tried with it, but other solutions are also welcome.
CodePudding user response:
Since you only specify SMA
to LAR
duplication, it's difficult to use tidyr::complete
. Here's an alternative:
library(dplyr)
dat %>%
group_by(PPCODE, PRICE) %>%
filter(!"LAR" %in% TYPE) %>%
ungroup() %>%
mutate(SHOPID = NA, TYPE = "LAR") %>%
distinct() %>%
bind_rows(dat, .)
# # A tibble: 6 x 4
# SHOPID PPCODE PRICE TYPE
# <chr> <dbl> <dbl> <chr>
# 1 11T 4412 1.45 LAR
# 2 2T1 3567 1.21 SMA
# 3 33A 4412 1.45 SMA
# 4 14B 3567 1.21 SMA
# 5 29O 5123 1.01 LAR
# 6 <NA> 3567 1.21 LAR