I have a large dataframe with one categorical and many numerical columns.
big_df <- data.frame(category = LETTERS[1:10]
,Q_COL1 = c(0,1,0,2,0,0,17,0,12,19)
,Q_COL2 = c(0,2,3,0,1,12,0,1,0,12)
)
It looks like this:
category Q_COL1 q_COL2
1 A 0 0
2 B 1 2
3 C 0 3
4 D 2 0
5 E 0 1
6 F 0 12
7 G 17 0
8 H 0 1
9 I 12 0
10 J 19 12
For each numerical column, I would like to create a character vector as follows:
col1_char <- big_df %>%
select(category, Q_COL1) %>%
filter(Q_COL1 > 0) %>%
select(category) %>%
deframe()
col2_char <- big_df %>%
select(category, Q_COL2) %>%
filter(Q_COL2 > 0) %>%
select(category) %>%
deframe()
This code allows to list categories (from the category
column) for which there were no 0
values in each of the numerical column.
The output vectors look like this:
> col1_char
[1] "B" "D" "G" "I" "J"
> col2_char
[1] "B" "C" "E" "F" "H" "J"
Each of these will be of a different length. If possible, the optimal output would store them in a dataframe, with NAs to account for different lengths. A list would also be good.
I can produce these objects one-by-one, but it's not very elegant. I could probably write a loop to do this, but I wonder is there a neater, perhaps tidyverse
, way?
CodePudding user response:
This can be a one-liner with good ol' base R,
sapply(big_df[-1], \(i) big_df$category[i > 0])
$Q_COL1
[1] "B" "D" "G" "I" "J"
$Q_COL2
[1] "B" "C" "E" "F" "H" "J"
To put the in a data frame, lot of options can be found here
stringi::stri_list2matrix(sapply(big_df[-1], \(i)big_df$category[i > 0]))
[,1] [,2]
[1,] "B" "B"
[2,] "D" "C"
[3,] "G" "E"
[4,] "I" "F"
[5,] "J" "H"
[6,] NA "J"
CodePudding user response:
Using tidyverse
:
big_df %>%
mutate(across(where(is.numeric), ~ifelse(.x > 0, category, NA)))
category Q_COL1 Q_COL2
A NA NA
B B B
C NA C
D D NA
E NA E
F NA F
G G NA
H NA H
I I NA
J J J