So I have this data frame:
dat1 <- data.frame(id=1:n,
group=rep(LETTERS[1:2], n/2),
age=sample(18:30, n, replace=TRUE),
type=NA,
op=factor(paste0("op", 1:n)),
x=NA)
dat1
dat2 <- data.frame(id=1:n,
group=rep(LETTERS[1:2], n/2),
age=NA,
type=factor(paste0("type", 1:n)),
op=NA,
x=rnorm(n))
dat <- full_join(dat1,dat2) %>% arrange(id)
dat
id group age type op x
1 1 A 19 <NA> op1 NA
2 1 A NA type1 <NA> 0.18819303
3 2 B 29 <NA> op2 NA
4 2 B NA type2 <NA> 0.11916096
5 3 A 19 <NA> op3 NA
6 3 A NA type3 <NA> -0.02509255
7 4 B 28 <NA> op4 NA
8 4 B NA type4 <NA> 0.10807273
9 5 A 27 <NA> op5 NA
10 5 A NA type5 <NA> -0.48543524
11 6 B 26 <NA> op6 NA
12 6 B NA type6 <NA> -0.50421713
This dataset has multiple rows per observation, while some variables are stored in one row and some in another. I like to have it in a tidy format with only one row per id. I can filter into two data frames then rejoin, but there must be an easier way. Summarise would work but only with numeric variables.. The group is fixed with id, there is no same id in different groups
group_by(id) %>%
summarise_if(is.numeric, sum, na.rm=T)
data
It seems there must be a very easy solution but I can't figure it out. Thanks for help!
CodePudding user response:
From this answer by tmfmnk to a similar question, you should be able to solve this by adding this code block at the end:
dat <- dat %>%
group_by(id) %>%
summarize(across(everything(), ~ first(na.omit(.))))
dat
The question in the link is a special case for only numbers, but this code block should work in general.