I have a df structured as below
month=c('jan','feb','jan','feb','feb','feb'),
therapyA=c(NA,'in person',NA,'teleheatlh',NA,'in person'),
therapyB=c('in person','in person',NA,'teleheatlh',NA,'in person'),
therapyC=c(NA,'in person','telehealth','teleheatlh',NA,'in person'),
therapyD=c(NA,'in person',NA,'teleheatlh','telehealth','in person'))
organization month therapyA therapyB therapyC therapyD
1 A jan <NA> in person <NA> <NA>
2 A feb in person in person in person in person
3 B jan <NA> <NA> telehealth <NA>
4 B feb teleheatlh teleheatlh teleheatlh teleheatlh
5 C feb <NA> <NA> <NA> telehealth
6 D feb in person in person in person in person
I would like to pivot the df in such as way as to get the results: meaning I'd like to use the current values as column names and concatenate all current column names that correalte. The results also need to stay grouped organization and month.
1 A jan TherapyB <NA>
2 A feb TherapyA,TherapyB,TherapyC,TherapyD <NA>
3 B jan <NA> TherapyC
4 B feb <NA> TherapyA,TherapyB,TherapyC,TherapyD
5 C feb <NA> TherapyD
6 D feb TherapyA,TherapyB,TherapyC,TherapyD <NA>
I have tried using dplyr::pivot_longer unsuccessfully. I have also tried using various base R operations, but was unable to get far without manually rewriting the whole df. Thank you for any input.
CodePudding user response:
We reshape first to 'long' format with pivot_longer
, then group by the columns, paste
(toString
) the column names and reshape back to 'wide' format (pivot_wider
)
library(dplyr)
library(tidyr)
library(snakecase)
df %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = starts_with('therapy'), values_drop_na = TRUE) %>%
group_by(rn, organization, month, value) %>%
summarise(name = toString(to_upper_camel_case(name)), .groups = 'drop') %>%
pivot_wider(names_from = value, values_from = name) %>%
select(-rn)
-output
# A tibble: 6 × 4
organization month `in person` telehealth
<chr> <chr> <chr> <chr>
1 A jan TherapyB <NA>
2 A feb TherapyA, TherapyB, TherapyC, TherapyD <NA>
3 B jan <NA> TherapyC
4 B feb <NA> TherapyA, TherapyB, TherapyC, TherapyD
5 C feb <NA> TherapyD
6 D feb TherapyA, TherapyB, TherapyC, TherapyD <NA>
data
df <- structure(list(organization = c("A", "A", "B", "B", "C", "D"),
month = c("jan", "feb", "jan", "feb", "feb", "feb"), therapyA = c(NA,
"in person", NA, "telehealth", NA, "in person"), therapyB = c("in person",
"in person", NA, "telehealth", NA, "in person"), therapyC = c(NA,
"in person", "telehealth", "telehealth", NA, "in person"),
therapyD = c(NA, "in person", NA, "telehealth", "telehealth",
"in person")), row.names = c(NA, -6L), class = "data.frame")