I have an extremely large dataframe.
Let's say I have something like:
> p
fu2_difftime_mridate_d_date fu2_difftime_premri fu2_difftime_prevfu fu2_difftime_fudate_d_date
1 30.4 12.7 -14.2 6.4
2 12.9 -48.1 -47.6 13.3
I want to have an overlook of all the columns which contain 0 and negative values in each row. I seek a solution in dplyr
that creates a new covariate which contain all column names that has 0 or negative values in each row:
> p
negvalues fu2_difftime_premri fu2_difftime_prevfu fu2_difftime_fudate_d_date
1 "fu2_difftime_prevfu" 12.7 -14.2 6.4
Data
p <- structure(list(fu2_date = structure(c(15428, 13746), class = "Date"),
fu2_mri_date = structure(c(16156, 13732), class = "Date"),
fu2_difftime_mridate_d_date = c(30.4, 12.9), fu2_difftime_premri = c(12.7,
-48.1), fu2_difftime_prevfu = c(-14.2, -47.6), fu2_difftime_fudate_d_date = c(6.4,
13.3)), row.names = 1:2, class = "data.frame")
CodePudding user response:
A base R solution can be,
apply(p[-c(1, 2)] <= 0, 1, \(i)toString(names(which(i))))
# 1 2
#"fu2_difftime_prevfu" "fu2_difftime_premri, fu2_difftime_prevfu"
#Add to the dataframe
p$new <- apply(p[-c(1, 2)] <= 0, 1, \(i)toString(names(which(i))))
p
fu2_date fu2_mri_date fu2_difftime_mridate_d_date fu2_difftime_premri fu2_difftime_prevfu fu2_difftime_fudate_d_date new
1 2012-03-29 2014-03-27 30.4 12.7 -14.2 6.4 fu2_difftime_prevfu
2 2007-08-21 2007-08-07 12.9 -48.1 -47.6 13.3 fu2_difftime_premri, fu2_difftime_prevfu
CodePudding user response:
Here is a tidyverse solution. Basically we test across all columns (note: across(,.. is the same as across(everything(), ...) if a value is negative with . < 0
if true give back current column name with: cur_column()
.
Thereafter we have to use unite
:
library(dplyr)
library(tidyr)
p %>%
mutate(across(, ~case_when(. < 0 ~ cur_column()), .names = 'new_{col}')) %>%
unite(Neg_cols, starts_with('new'), na.rm = TRUE, sep = ', ')
fu2_date fu2_mri_date fu2_difftime_mridate_d_date fu2_difftime_premri fu2_difftime_prevfu
1 2012-03-29 2014-03-27 30.4 12.7 -14.2
2 2007-08-21 2007-08-07 12.9 -48.1 -47.6
fu2_difftime_fudate_d_date Neg_cols
1 6.4 fu2_difftime_prevfu
2 13.3 fu2_difftime_premri, fu2_difftime_prevfu
CodePudding user response:
A tidyverse
solution:
library(tidyverse)
p %>%
mutate(across(where(is.numeric), ~ ifelse(.x <= 0, cur_column(), NA)), .keep = "used") %>%
unite("negative", sep = ", ", na.rm = TRUE) %>%
bind_cols(p)
# negative fu2_date fu2_mri_date fu2_difftime_mridate_d_date fu2_difftime_premri fu2_difftime_prevfu fu2_difftime_fudate_d_date
# 1 fu2_difftime_prevfu 2012-03-29 2014-03-27 30.4 12.7 -14.2 6.4
# 2 fu2_difftime_premri, fu2_difftime_prevfu 2007-08-21 2007-08-07 12.9 -48.1 -47.6 13.3
CodePudding user response:
You could just do it with the base
package instead of the dplyr
package, using the function sapply.
# Load data
p <- structure(
list(
fu2_date = structure(c(15428, 13746), class = "Date"),
fu2_mri_date = structure(c(16156, 13732), class = "Date"),
fu2_difftime_mridate_d_date = c(30.4, 12.9),
fu2_difftime_premri = c(12.7, -48.1),
fu2_difftime_prevfu = c(-14.2, -47.6),
fu2_difftime_fudate_d_date = c(6.4, 13.3)
),
row.names = 1:2,
class = "data.frame"
)
# Look all the columns which contain 0 and negative values in each row
clmns <- sapply(p, function(x) {
any(x <= 0)
})
clmns
fu2_date fu2_mri_date fu2_difftime_mridate_d_date fu2_difftime_premri fu2_difftime_prevfu fu2_difftime_fudate_d_date
FALSE FALSE FALSE TRUE TRUE FALSE
And then, you can see the column names that satisfy the previous expression:
names(p)[clmns]
[1] "fu2_difftime_premri" "fu2_difftime_prevfu"