Create a column that list all column names that contain negative values in that specific row-CodePudding

I have an extremely large dataframe.

Let's say I have something like:

> p
    fu2_difftime_mridate_d_date fu2_difftime_premri fu2_difftime_prevfu fu2_difftime_fudate_d_date
1                          30.4                12.7               -14.2                        6.4
2                          12.9               -48.1               -47.6                       13.3

I want to have an overlook of all the columns which contain 0 and negative values in each row. I seek a solution in dplyr that creates a new covariate which contain all column names that has 0 or negative values in each row:

> p
                negvalues fu2_difftime_premri fu2_difftime_prevfu fu2_difftime_fudate_d_date
1    "fu2_difftime_prevfu"               12.7               -14.2                        6.4

Data

p <- structure(list(fu2_date = structure(c(15428, 13746), class = "Date"), 
    fu2_mri_date = structure(c(16156, 13732), class = "Date"), 
    fu2_difftime_mridate_d_date = c(30.4, 12.9), fu2_difftime_premri = c(12.7, 
    -48.1), fu2_difftime_prevfu = c(-14.2, -47.6), fu2_difftime_fudate_d_date = c(6.4, 
    13.3)), row.names = 1:2, class = "data.frame")

CodePudding user response：

A base R solution can be,

apply(p[-c(1, 2)] <= 0, 1, \(i)toString(names(which(i))))
#                    1                                          2 
#"fu2_difftime_prevfu" "fu2_difftime_premri, fu2_difftime_prevfu" 

#Add to the dataframe

p$new <- apply(p[-c(1, 2)] <= 0, 1, \(i)toString(names(which(i))))

p
    fu2_date fu2_mri_date fu2_difftime_mridate_d_date fu2_difftime_premri fu2_difftime_prevfu fu2_difftime_fudate_d_date                                      new
1 2012-03-29   2014-03-27                        30.4                12.7               -14.2                        6.4                      fu2_difftime_prevfu
2 2007-08-21   2007-08-07                        12.9               -48.1               -47.6                       13.3 fu2_difftime_premri, fu2_difftime_prevfu

CodePudding user response：

Here is a tidyverse solution. Basically we test across all columns (note: across(,.. is the same as across(everything(), ...) if a value is negative with . < 0 if true give back current column name with: cur_column(). Thereafter we have to use unite:

library(dplyr)
library(tidyr)

p %>% 
  mutate(across(, ~case_when(. < 0 ~ cur_column()), .names = 'new_{col}')) %>%
  unite(Neg_cols, starts_with('new'), na.rm = TRUE, sep = ', ')

    fu2_date fu2_mri_date fu2_difftime_mridate_d_date fu2_difftime_premri fu2_difftime_prevfu
1 2012-03-29   2014-03-27                        30.4                12.7               -14.2
2 2007-08-21   2007-08-07                        12.9               -48.1               -47.6
  fu2_difftime_fudate_d_date                                Neg_cols
1                        6.4                      fu2_difftime_prevfu
2                       13.3 fu2_difftime_premri, fu2_difftime_prevfu

CodePudding user response：

A tidyverse solution:

library(tidyverse)

p %>%
  mutate(across(where(is.numeric), ~ ifelse(.x <= 0, cur_column(), NA)), .keep = "used") %>%
  unite("negative", sep = ", ", na.rm = TRUE) %>%
  bind_cols(p)

#                                   negative   fu2_date fu2_mri_date fu2_difftime_mridate_d_date fu2_difftime_premri fu2_difftime_prevfu fu2_difftime_fudate_d_date
# 1                      fu2_difftime_prevfu 2012-03-29   2014-03-27                        30.4                12.7               -14.2                        6.4
# 2 fu2_difftime_premri, fu2_difftime_prevfu 2007-08-21   2007-08-07                        12.9               -48.1               -47.6                       13.3

CodePudding user response：

You could just do it with the base package instead of the dplyr package, using the function sapply.

# Load data
p <- structure(
  list(
      fu2_date = structure(c(15428, 13746), class = "Date"),
      fu2_mri_date = structure(c(16156, 13732), class = "Date"),
      fu2_difftime_mridate_d_date = c(30.4, 12.9),
      fu2_difftime_premri = c(12.7, -48.1),
      fu2_difftime_prevfu = c(-14.2, -47.6),
      fu2_difftime_fudate_d_date = c(6.4, 13.3)
    ),
    row.names = 1:2,
    class = "data.frame"
  )

# Look all the columns which contain 0 and negative values in each row
clmns <- sapply(p, function(x) {
  any(x <= 0)
})
clmns

fu2_date                fu2_mri_date fu2_difftime_mridate_d_date         fu2_difftime_premri         fu2_difftime_prevfu  fu2_difftime_fudate_d_date
FALSE                       FALSE                       FALSE                        TRUE                        TRUE                       FALSE

And then, you can see the column names that satisfy the previous expression:

names(p)[clmns]
[1] "fu2_difftime_premri" "fu2_difftime_prevfu"