Home > Software engineering >  Merging paired columns with duplication in R
Merging paired columns with duplication in R

Time:12-16

Goodevening, I'm working on data with two types of blood pressure (non-invasive and arterial) with different duration for each person. It has 6 columns of "id" "begin_time" "end_time" "nibp_time" "nibp_value" "abp_time" "abp_value". I used difftime() to reduce the complexity. Time interval between "nibp_value" or "abp_value" is 5 minutes. So my data looks like below.

df <- data.frame(id = c(1,1,1,1,1,1,1,1,2,2,2,2,3,3), 
                 nibp_time = c(0,1,0,1,0,1,0,1,0,1,2,3,NaN,NaN), 
                 nibp_value = c(80,65,80,65,80,65,80,65,95,90,83,89,NaN,NaN),
                 abp_time = c(1,1,2,2,3,3,4,4,NaN,NaN,NaN,NaN,0,1), 
                 abp_value = c(68,68,66,66,70,70,73,73,NaN,NaN,NaN,NaN,88,84))

The problem is, there are missing values and I would like to merge "nibp_value" and "abp_value" according to "nibp_time" and "abp_time". If the "nibp_time" equals "abp_time", "abp_value" should be stored ("abp_value" has priority) and looks like below.

df2<- data.frame(id = c(1,1,1,1,1,2,2,2,2,3,3),
           bp_time = c(0,1,2,3,4,0,1,2,3,0,1),
           bp_value = c(80,68,66,70,73,95,90,83,89,88,84))

So that I can apply

as.data.table(df2)[, dcast(.SD, id ~ bp_time, value.var = "bp_value")]

to make it to serial format.

I tried

df$bp_time <- ifelse(is.na(df$abp_time), df$nibp_time, df$abp_time)

In that way, the value at the time '0' of id '1' would be eliminated because abp_time has no '0' for id '1'. Would you help me to merge properly?

CodePudding user response:

Do you mean something like this?

df3 = merge(df[!is.na(df$nibp_time),], df2, by.x=c("nibp_time","nibp_value"), by.y=c("bp_time", "bp_value"))
df3$final_value = ifelse(!is.na(df3$abp_value), df3$abp_value, df3$nibp_value)

Output:

> df3
  nibp_time nibp_value id.x abp_time abp_value id.y final_value
1         0         80    1        1        68    1          68
2         0         80    1        2        66    1          66
3         0         80    1        3        70    1          70
4         0         80    1        4        73    1          73
5         0         95    2      NaN       NaN    2          95
6         1         90    2      NaN       NaN    2          90
7         2         83    2      NaN       NaN    2          83
8         3         89    2      NaN       NaN    2          89

Update:

If you need all rows from both tables add all=TRUE. Or all.y=TRUE / all.x=TRUE if you need just all rows from one of these tables.

df3 = merge(df[!is.na(df$nibp_time),], df2, by.x=c("nibp_time","nibp_value"), by.y=c("bp_time", "bp_value"), all=TRUE)
df3$final_value = ifelse(!is.na(df3$abp_value), df3$abp_value, df3$nibp_value)
    
    > df3
   nibp_time nibp_value id.x abp_time abp_value id.y final_value
1          0         80    1        1        68    1          68
2          0         80    1        2        66    1          66
3          0         80    1        3        70    1          70
4          0         80    1        4        73    1          73
5          0         88   NA       NA        NA    3          88
6          0         95    2      NaN       NaN    2          95
7          1         65    1        1        68   NA          68
8          1         65    1        2        66   NA          66
9          1         65    1        3        70   NA          70
10         1         65    1        4        73   NA          73
11         1         68   NA       NA        NA    1          68
12         1         84   NA       NA        NA    3          84
13         1         90    2      NaN       NaN    2          90
14         2         66   NA       NA        NA    1          66
15         2         83    2      NaN       NaN    2          83
16         3         70   NA       NA        NA    1          70
17         3         89    2      NaN       NaN    2          89
18         4         73   NA       NA        NA    1          73
  • Related