Home > Software engineering >  How to create a character column that reads results from to other character strings?
How to create a character column that reads results from to other character strings?

Time:08-20

I have

       measure           estimate.difference
1   pT8 vs pT7  0.33 (95% CI: -1.39 to 2.06)
2 No pT vs pT7 -2.31 (95% CI: -4.95 to 0.33)
3 No pT vs pT8 -2.64 (95% CI: -5.67 to 0.38)
4   pT8 vs pT7  0.14 (95% CI: -0.26 to 0.53)
5 No pT vs pT7  0.16 (95% CI: -0.37 to 0.69)

The df$estimate.difference compares the outcome between two independent prediction models, which are listed in df$measure.

If the df$estimate.difference is positive, the "left-side" of the ... vs ... is the best model, and if the value is negative, the "right-side" is best. I need a column that denotes what model is best (my dataset is large).

Is it possible to create a character column, that first reads if the df$estimate.difference is positive or negative, and then subsequently prints "measure name" of either the left or right-side?

Expected output

       measure       best.model               estimate.difference
1   pT8 vs pT7              pT8       0.33 (95% CI: -1.39 to 2.06)
2 No pT vs pT7              pT7      -2.31 (95% CI: -4.95 to 0.33)
3 No pT vs pT8              pT8      -2.64 (95% CI: -5.67 to 0.38)
4   pT8 vs pT7              pT8       0.14 (95% CI: -0.26 to 0.53)
5 No pT vs pT7            No pT       0.16 (95% CI: -0.37 to 0.69)

Data

df <- structure(list(measure = c("pT8 vs pT7", "No pT vs pT7", "No pT vs pT8", 
                           "pT8 vs pT7", "No pT vs pT7"), estimate.difference = c("0.33 (95% CI: -1.39 to 2.06)", 
                                                                                  "-2.31 (95% CI: -4.95 to 0.33)", "-2.64 (95% CI: -5.67 to 0.38)", 
                                                                                  "0.14 (95% CI: -0.26 to 0.53)", "0.16 (95% CI: -0.37 to 0.69)"
                           )), row.names = c(NA, 5L), class = "data.frame")

CodePudding user response:

Using strsplit and mapply.

transform(df, 
          best.model=mapply(`[`, strsplit(df$measure, ' vs '),
                            (as.numeric(sapply(strsplit(df$estimate.difference, ' '), `[`, 1)) < 0 )   1)
          )
#        measure           estimate.difference best.model
# 1   pT8 vs pT7  0.33 (95% CI: -1.39 to 2.06)        pT8
# 2 No pT vs pT7 -2.31 (95% CI: -4.95 to 0.33)        pT7
# 3 No pT vs pT8 -2.64 (95% CI: -5.67 to 0.38)        pT8
# 4   pT8 vs pT7  0.14 (95% CI: -0.26 to 0.53)        pT8
# 5 No pT vs pT7  0.16 (95% CI: -0.37 to 0.69)      No pT
  • Related