Goodevening, I'm working on data with two types of blood pressure (non-invasive and arterial) with different duration for each person. It has 6 columns of "id" "begin_time" "end_time" "nibp_time" "nibp_value" "abp_time" "abp_value". I used difftime() to reduce the complexity. Time interval between "nibp_value" or "abp_value" is 5 minutes. So my data looks like below.
df <- data.frame(id = c(1,1,1,1,1,1,1,1,2,2,2,2,3,3),
nibp_time = c(0,1,0,1,0,1,0,1,0,1,2,3,NaN,NaN),
nibp_value = c(80,65,80,65,80,65,80,65,95,90,83,89,NaN,NaN),
abp_time = c(1,1,2,2,3,3,4,4,NaN,NaN,NaN,NaN,0,1),
abp_value = c(68,68,66,66,70,70,73,73,NaN,NaN,NaN,NaN,88,84))
The problem is, there are missing values and I would like to merge "nibp_value" and "abp_value" according to "nibp_time" and "abp_time". If the "nibp_time" equals "abp_time", "abp_value" should be stored ("abp_value" has priority) and looks like below.
df2<- data.frame(id = c(1,1,1,1,1,2,2,2,2,3,3),
bp_time = c(0,1,2,3,4,0,1,2,3,0,1),
bp_value = c(80,68,66,70,73,95,90,83,89,88,84))
So that I can apply
as.data.table(df2)[, dcast(.SD, id ~ bp_time, value.var = "bp_value")]
to make it to serial format.
I tried
df$bp_time <- ifelse(is.na(df$abp_time), df$nibp_time, df$abp_time)
In that way, the value at the time '0' of id '1' would be eliminated because abp_time has no '0' for id '1'. Would you help me to merge properly?
CodePudding user response:
Do you mean something like this?
df3 = merge(df[!is.na(df$nibp_time),], df2, by.x=c("nibp_time","nibp_value"), by.y=c("bp_time", "bp_value"))
df3$final_value = ifelse(!is.na(df3$abp_value), df3$abp_value, df3$nibp_value)
Output:
> df3
nibp_time nibp_value id.x abp_time abp_value id.y final_value
1 0 80 1 1 68 1 68
2 0 80 1 2 66 1 66
3 0 80 1 3 70 1 70
4 0 80 1 4 73 1 73
5 0 95 2 NaN NaN 2 95
6 1 90 2 NaN NaN 2 90
7 2 83 2 NaN NaN 2 83
8 3 89 2 NaN NaN 2 89
Update:
If you need all rows from both tables add all=TRUE
. Or all.y=TRUE
/ all.x=TRUE
if you need just all rows from one of these tables.
df3 = merge(df[!is.na(df$nibp_time),], df2, by.x=c("nibp_time","nibp_value"), by.y=c("bp_time", "bp_value"), all=TRUE)
df3$final_value = ifelse(!is.na(df3$abp_value), df3$abp_value, df3$nibp_value)
> df3
nibp_time nibp_value id.x abp_time abp_value id.y final_value
1 0 80 1 1 68 1 68
2 0 80 1 2 66 1 66
3 0 80 1 3 70 1 70
4 0 80 1 4 73 1 73
5 0 88 NA NA NA 3 88
6 0 95 2 NaN NaN 2 95
7 1 65 1 1 68 NA 68
8 1 65 1 2 66 NA 66
9 1 65 1 3 70 NA 70
10 1 65 1 4 73 NA 73
11 1 68 NA NA NA 1 68
12 1 84 NA NA NA 3 84
13 1 90 2 NaN NaN 2 90
14 2 66 NA NA NA 1 66
15 2 83 2 NaN NaN 2 83
16 3 70 NA NA NA 1 70
17 3 89 2 NaN NaN 2 89
18 4 73 NA NA NA 1 73