Home > OS >  Pivot R df using colnames as values (concatenated in some cases) and column values as new colnames
Pivot R df using colnames as values (concatenated in some cases) and column values as new colnames

Time:08-10

I have a df structured as below

month=c('jan','feb','jan','feb','feb','feb'),
therapyA=c(NA,'in person',NA,'teleheatlh',NA,'in person'),
therapyB=c('in person','in person',NA,'teleheatlh',NA,'in person'),
therapyC=c(NA,'in person','telehealth','teleheatlh',NA,'in person'),
therapyD=c(NA,'in person',NA,'teleheatlh','telehealth','in person'))

organization month   therapyA   therapyB   therapyC   therapyD
1            A   jan       <NA>  in person       <NA>       <NA>
2            A   feb  in person  in person  in person  in person
3            B   jan       <NA>       <NA> telehealth       <NA>
4            B   feb teleheatlh teleheatlh teleheatlh teleheatlh
5            C   feb       <NA>       <NA>       <NA> telehealth
6            D   feb  in person  in person  in person  in person

I would like to pivot the df in such as way as to get the results: meaning I'd like to use the current values as column names and concatenate all current column names that correalte. The results also need to stay grouped organization and month.

1            A   jan                            TherapyB                                <NA>
2            A   feb TherapyA,TherapyB,TherapyC,TherapyD                                <NA>
3            B   jan                                <NA>                            TherapyC
4            B   feb                                <NA> TherapyA,TherapyB,TherapyC,TherapyD
5            C   feb                                <NA>                            TherapyD
6            D   feb TherapyA,TherapyB,TherapyC,TherapyD                                <NA>

I have tried using dplyr::pivot_longer unsuccessfully. I have also tried using various base R operations, but was unable to get far without manually rewriting the whole df. Thank you for any input.

CodePudding user response:

We reshape first to 'long' format with pivot_longer, then group by the columns, paste (toString) the column names and reshape back to 'wide' format (pivot_wider)

library(dplyr)
library(tidyr)
library(snakecase)
df %>% 
 mutate(rn = row_number()) %>% 
 pivot_longer(cols = starts_with('therapy'), values_drop_na = TRUE) %>% 
 group_by(rn, organization, month, value) %>%
 summarise(name = toString(to_upper_camel_case(name)), .groups = 'drop') %>% 
 pivot_wider(names_from = value, values_from = name) %>% 
 select(-rn)

-output

# A tibble: 6 × 4
  organization month `in person`                            telehealth                            
  <chr>        <chr> <chr>                                  <chr>                                 
1 A            jan   TherapyB                               <NA>                                  
2 A            feb   TherapyA, TherapyB, TherapyC, TherapyD <NA>                                  
3 B            jan   <NA>                                   TherapyC                              
4 B            feb   <NA>                                   TherapyA, TherapyB, TherapyC, TherapyD
5 C            feb   <NA>                                   TherapyD                              
6 D            feb   TherapyA, TherapyB, TherapyC, TherapyD <NA>           

data

df <- structure(list(organization = c("A", "A", "B", "B", "C", "D"), 
    month = c("jan", "feb", "jan", "feb", "feb", "feb"), therapyA = c(NA, 
    "in person", NA, "telehealth", NA, "in person"), therapyB = c("in person", 
    "in person", NA, "telehealth", NA, "in person"), therapyC = c(NA, 
    "in person", "telehealth", "telehealth", NA, "in person"), 
    therapyD = c(NA, "in person", NA, "telehealth", "telehealth", 
    "in person")), row.names = c(NA, -6L), class = "data.frame")
  • Related