Hoping there's an easy solution here. Basically, I have a very long .csv that has data that was created by multiple users, which consists of an ID, and two variables corresponding to weight before and after a procedure. To avoid human error in data entry, and since the data for the IDs wasn't generated sequentially, the data is often entered like this on the .csv (example df created):
id = c(rep(1:4,2),5:8)
pre = c(rep(10,4),rep(NA,4),rep(100,4))
post = c(rep(NA,4),rep(10,4),rep(100,4))
df = cbind(id,pre,post)
print(df)
id pre post
[1,] 1 10 NA
[2,] 2 10 NA
[3,] 3 10 NA
[4,] 4 10 NA
[5,] 1 NA 10
[6,] 2 NA 10
[7,] 3 NA 10
[8,] 4 NA 10
[9,] 5 100 100
[10,] 6 100 100
[11,] 7 100 100
[12,] 8 100 100
When I need the data to merge the pre and post columns together via the ID column- only if they're not already joined by coincidence (like how IDs 8-12 happen to have been inputted together). So that it looks like:
id_ = c(1:8)
pre_ = c(rep(10,4),rep(100,4))
post_ = c(rep(10,4),rep(100,4))
df_final = cbind(id_,pre_,post_)
print(df_final)
id_ pre_ post_
[1,] 1 10 10
[2,] 2 10 10
[3,] 3 10 10
[4,] 4 10 10
[5,] 5 100 100
[6,] 6 100 100
[7,] 7 100 100
[8,] 8 100 100
I know if two sets of data are in two different dfs, I can use join_left from dplyr to merge data...unsure what to do in this situation since they're all in one .csv and not every single ID is unjoined.
Thanks.
CodePudding user response:
Your question doesn't make clear, but it sounds from your description that every id should have a pre and post value. If that works, the following would work.
Per r2evans comment above, I assume you are using the tidyverse due to your tag of tidyverse.
library(tidyverse)
id = c(rep(1:4,2),5:8)
pre = c(rep(10,4),rep(NA,4),rep(100,4))
post = c(rep(NA,4),rep(10,4),rep(100,4))
df = cbind(id,pre,post)
df<- as.data.frame(df)
df %>% left_join(., df, by = "id") %>% select(id, pre.x, post.y) %>% drop_na()
id pre.x post.y
1 1 10 10
2 2 10 10
3 3 10 10
4 4 10 10
5 5 100 100
6 6 100 100
7 7 100 100
8 8 100 100
CodePudding user response:
We may use a group by summarise
assuming that there are equal length vectors
library(dplyr)
df %>%
group_by(id) %>%
summarise(across(c(pre, post), ~ .x[complete.cases(.x)]))
# A tibble: 8 × 3
id pre post
<int> <dbl> <dbl>
1 1 10 10
2 2 10 10
3 3 10 10
4 4 10 10
5 5 100 100
6 6 100 100
7 7 100 100
8 8 100 100
data
df <- data.frame(id, pre, post)
CodePudding user response:
Another alternative would be to fill one of your columns to overwrite NA's with the (presumably sole) found value, then just keep one of each complete row.
df %>%
group_by(id) %>%
fill(pre, .direction = "downup") %>%
ungroup() %>%
drop_na()
# distinct(id, .keep_all = TRUE) # might help if an id has both complete and incomplete records