Home > Mobile >  Widen R dataframe
Widen R dataframe

Time:10-22

I am trying to widen a dataframe in R but it seems like in a way that is unusual as I haven't been able find a simple way to do so after lots of searching here.

Say I have a dataframe like this, where the ID is a number that identifies a unique person and this unique person might have several codes associated with them:

ID<-c(1, 2, 2, 2, 3, 3,4)
CODE<-c(123, 938, 293, 456, 203, 203, 91)
df <- data.frame(ID, CODE)

I want to widen it so that there is only one row per ID and the codes are added to additional columns. This should result in as many new columns being generated as there are codes for the ID that has the most number of codes associated with it. If an ID only has one code, for example, then all newly generated columns after the first should be filled with NAs. The resulting dataframe should look like this:

ID CODE1 CODE2 CODE3
1 123 NA NA
2 938 293 456
3 203 203 NA
4 91 NA NA

It would be even better if duplicate codes could be removed as well so that the result would actually be like this (such that the second occurrence of 203 for ID 3 becomes an NA):

ID CODE1 CODE2 CODE3
1 123 NA NA
2 938 293 456
3 203 NA NA
4 91 NA NA

I can do this by concatenating the codes (via group_by and summarise) and then separating the codes into individual columns, but I imagine there's a more direct way to do this.

Thanks for any suggestions!

CodePudding user response:

library(tidyverse)
df %>%
  distinct(ID, CODE) %>%
  group_by(ID) %>%
  mutate(col = paste0("CODE", row_number())) %>%
  ungroup() %>%
  pivot_wider(names_from = col, values_from = CODE)

result

# A tibble: 4 x 4
     ID CODE1 CODE2 CODE3
  <dbl> <dbl> <dbl> <dbl>
1     1   123    NA    NA
2     2   938   293   456
3     3   203    NA    NA
4     4    91    NA    NA
  • Related