I have a df like the following one:
ID Comment1 Comment2
X9999 text text
X9999.000 text text
Y8888 text text
Y8888.111 text text
Z7777.555 text text
In the first column, there are Ids and sub-Ids. Ids are like X9999, sub-Ids like X9999.999. How could I make R check if there is any sun-Id row without the respective Id row, and if there isn't insert one?
CodePudding user response:
You can use dplyr
to do a full_join
on the unique codes with the .xxxx part excluded.
library(dplyr)
df2 <- full_join(df,data.frame(ID=unique(gsub('\\..*','',df$ID))))
CodePudding user response:
We can group by the ID
(minus the sub-id component), then we can find any group that does not have a main ID. Then, we can use uncount
to duplicate the row if it doesn't have a main ID. Then, for the first row, we can remove the sub-ID component.
library(tidyverse)
df %>%
group_by(grp = str_replace_all(ID, "\\..*", "")) %>%
mutate(duplicate_row = !any(ID == grp)) %>%
uncount(case_when(duplicate_row ~ 2, TRUE ~ 1)) %>%
mutate(ID = ifelse(row_number() == 1 &
duplicate_row == TRUE, str_replace_all(ID, "\\..*", ""), ID)) %>%
ungroup %>%
select(names(df))
Output
ID Comment1 Comment2
<chr> <chr> <chr>
1 X9999 text text
2 X9999.000 text text
3 Y8888 text text
4 Y8888.111 text text
5 Z7777 text text
6 Z7777.555 text text
Data
df <- structure(list(ID = c("X9999", "X9999.000", "Y8888", "Y8888.111",
"Z7777.555"), Comment1 = c("text", "text", "text", "text", "text"
), Comment2 = c("text", "text", "text", "text", "text")), class = "data.frame", row.names = c(NA,
-5L))