How to pull column name into a new column based on column contents in R-CodePudding

I need to do three actions:

1. Count the rowwise non NA's values in a table and sum them (in a single column "check_na")

[I put my solution to this below, interested if someone can figure out how to do this with map though. I have already checked https://stackoverflow.com/questions/50680413/count-na-in-given-columns-by-rows for answers for this]

2. For those values that are not NA, create a column that concatenates these the unique values in a new column "block detail".

[I don't know how to do this]

3. If "check_na" has a value then pull in the column name(s) and concatenate them in a new column ("block type")

[I don't know how to do this]

This is what the final product should look like. note that in line 2 even though "b" appears twice, it only shows up once in "block detail" yet the columns that have it are individually listed "y|z"

      w x     y     z     na_check block_detail block_type
  <dbl> <chr> <chr> <chr>    <int> <chr>        <chr>     
1    NA a     NA    NA           1 a            x         
2    NA NA    b     b            2 b            y|z       
3    NA NA    b     c            2 b|c          y|z       
4    NA NA    NA    NA           0 NA           NA        
5    NA NA    NA    b            1 b            z

below is sample data and my solution to part 1:


#sample data
df <- tibble(w=rep(NA_real_,5),
       x=c(1,rep(NA_real_,4)),
       y=c(NA_real_,1,rep(NA_real_,3)),
       z=c(NA_real_,1,rep(NA_real_,2),1)
       )

#my solution to the first part, interested if someone can do this more efficiently or can do this with map as I have 100s columns that I need to do this with

df_na_check <- df %>% 
  mutate(across(everything(),
                list(na_check=~!is.na(.)),
                .names="{.col}_{.fn}")) %>% 
  rowwise() %>% 
mutate(na_check=sum(c_across(contains("na_check")))) %>% 
  select(w:z,na_check)

I appreciate any help. Ideally if the solution can use tidyverse but open to other methods (data.table or base r)

CodePudding user response：

We can first use rowSums to get the number of columns that are not NA. Then, we can use purrr to collapse the unique characters that are no NA into block_detail. Then, we can use apply to go over each row to get the column names where there is not an NA for block_type.

library(tidyverse)

df %>% 
  mutate(na_check = rowSums(!is.na(.), na.rm = T),
         block_detail = pmap_chr(select(., -na_check), ~paste0(unique(na.omit(c(...))), collapse = "|")),
         block_type = apply(df, 1, \(x) paste0(names(df)[which(!is.na(x))], collapse = "|")))

Output

   w    x    y    z na_check block_detail block_type
1 NA    a <NA> <NA>        1            a          x
2 NA <NA>    b    b        2            b        y|z
3 NA <NA>    b    c        2          b|c        y|z
4 NA <NA> <NA> <NA>        0                        
5 NA <NA> <NA>    b        1            b          z

Or using purrr instead of apply:

df %>% 
  mutate(na_check = rowSums(!is.na(.), na.rm = T),
         block_detail = pmap_chr(select(., -na_check), ~str_c(unique(na.omit(c(...))), collapse = "|"))) %>% 
  mutate(block_type = pmap_chr(select(., -c(na_check, block_detail)), ~str_c(names(c(...))[!is.na(c(...))], collapse="|")))

Data

df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA, 
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA, 
"b")), class = "data.frame", row.names = c(NA, -5L))

CodePudding user response：

How about the following:

library(tidyverse)

df <- tibble(w=rep(NA_real_,5),
             x=c("a",rep(NA_character_,4)),
             y=c(NA_character_, "b", "b", NA_character_, NA_character_),
             z=c(NA_character_, "b", "c", NA_character_, "b"))

paste_no_na <- function(x){
    x <- unique(x)
    paste0(x[!is.na(x)], collapse = "|")
}

df_summary <- df %>% 
    mutate(row = row_number(),
           w = as.character(w)) %>% 
    pivot_longer(cols = -row) %>% 
    mutate(name = if_else(is.na(value), NA_character_, name)) %>% 
    group_by(row) %>% 
    summarize(na_check = sum(!is.na(value)),
              block_detail = paste_no_na(value),
              block_type = paste_no_na(name))

cbind(df, df_summary %>% select(-row))

The basic idea is to pivot longer so that we can then analyze each group (ie original row) and then summarize. If you have many rows, the performance might become poor. Check if this is the case for you.

CodePudding user response：

This answer mainly uses apply to go through the dataframe row-wise, and finally use unite to combine colnames that is not NA.

library(tidyverse)

df %>% 
  mutate(na_check = apply(df, 1, function(x) sum(!is.na(x))),
         across(-na_check, ~ifelse(is.na(.x), NA, cur_column()), .names = "{.col}_colname"),
         block_detail = apply(df, 1, function(x) paste(unique(na.omit(x)), collapse = '|'))) %>% 
  unite(col = "block_type", ends_with("_colname"), na.rm = T, sep = "|")

   w    x    y    z na_check block_type block_detail
1 NA    a <NA> <NA>        1          x            a
2 NA <NA>    b    b        2        y|z            b
3 NA <NA>    b    c        2        y|z          b|c
4 NA <NA> <NA> <NA>        0                        
5 NA <NA> <NA>    b        1          z            b

Data

Credit to @AndrewGB for the dput 1!

df <- structure(list(w = c(NA, NA, NA, NA, NA), x = c("a", NA, NA, 
NA, NA), y = c(NA, "b", "b", NA, NA), z = c(NA, "b", "c", NA, 
"b")), class = "data.frame", row.names = c(NA, -5L))

CodePudding user response：

Here is a combination of mutate and unite:

library(dplyr)
library(tidyr)

df %>% 
  mutate(na_check = rowSums(!is.na(.), na.rm = T)) %>% 
  unite("block_detail", w:z, na.rm = TRUE, sep = "|", remove = FALSE) %>% 
  mutate(across(-c(block_detail, na_check), ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>%
  unite(block_type, starts_with('new'), na.rm = TRUE, sep = '|') %>% 
  relocate(block_detail, .after = na_check)

     w    x    y    z na_check block_detail block_type
1 <NA>    a <NA> <NA>        1            a          x
2 <NA> <NA>    b    b        2          b|b        y|z
3 <NA> <NA>    b    c        2          b|c        y|z
4 <NA> <NA> <NA> <NA>        0                        
5 <NA> <NA> <NA>    b        1            b          z

data:

df <- structure(list(w = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), x = c("a", NA, NA, NA, NA), y = c(NA, 
"b", "b", NA, NA), z = c(NA, "b", "c", NA, "b")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))