Home > Enterprise >  R pivot_wider x multiple columns with names as ColName row number
R pivot_wider x multiple columns with names as ColName row number

Time:07-13

Thank you for looking, I struggled to think of a good name for the question.

I have looked at many different versions of my question here and elsewhere but haven't found the exact answer I need.

I have data consisting of ID numbers, codes, and dates. There are often more than 1 row per ID, there is no set number of times an ID can appear. The data now are in long format, I need to pivot them to wide format. I need the result to be 1 row per ID, with the Code, and Date fields widened. I want to keep them the same names but add a N to the end of the field name where the highest N is the max number of times a single ID repeats...If an ID repeats 9 times, the pivot will produce 9 new columns for Code and 9 for Date, as Code1, Code2...Code9.

I can do it, see below, but the names are a total mess. I am unable to get the names to be neat the way I describe.

EG Data:

eg_data <- data.frame(
  ID = c('1','1','1','2', '2', '3', '4', '4') , 
  FName = c('John','John','John','Gina', 'Gina', 'Tom', 'Bobby', 'Bobby') , 
  LName = c('Smith','Smith','Smith','Jones', 'Jones', 'Anderson', 'Kennedy', 'Kennedy') , 
  Code = c('ECV','EDC','EER','ECV', 'ECV', 'EER', 'EDC', 'EER') , 
  Date = c('2022-04-23','2021-12-21','2020-01-25','2022-05-18', '2020-05-26', '2021-01-21', '2020-05-14', '2020-06-25')) 

What I've done - yes, it works, but it requires renaming every_single_column that gets widened. There must be a better way:

eg_data %>%
  group_by(ID) %>%
  mutate(
    Count = row_number(), 
    CodeName = paste0('ContactCode', Count),
    DateName = paste0('ContactDate', Count)) %>%
  ungroup() %>%
  select(-Count) %>%
  pivot_wider(
    names_from = c(CodeName, DateName), values_from = c(Code, Date)) -> LongNamesYuck

Desired Output (using two of the IDs above for brevity)

desired_format <- data.frame(
  ID = c('1','2'),
  FName = c('John', 'Gina'),
  LName = c('Smith', 'Jones'),
  Code1 = c('ECV', 'ECV'),
  Code2 = c('EDC', 'ECV'),
  Code3 = c('EER', NA),
  Date1 = c('2022-04-23', '2022-05-18'),
  Date2 = c('2021-12-21', '2020-05-26'),
  Date3 = c('2020-01-25', NA))

The example below gets close, but it uses the values from one of the fields as new field names, and I don't want that.

How to `pivot_wider` multiple columns without combining the names?

Any help is appreciated, thank you.

CodePudding user response:

We can make it compact i.e. using rowid (from data.table instead of two lines group_by/mutate), and then use pivot_wider with names_from on the sequence ('rn') column and the values_from as a vector of column names (quoted/unquoted)

library(dplyr)
library(tidyr)
library(data.table)
out <- eg_data %>%
  mutate(rn = rowid(ID, FName, LName)) %>% 
  pivot_wider(names_from = rn, values_from = c(Code, Date), names_sep = "")

-checking the output with desired_format

> desired_format
  ID FName LName Code1 Code2 Code3      Date1      Date2      Date3
1  1  John Smith   ECV   EDC   EER 2022-04-23 2021-12-21 2020-01-25
2  2  Gina Jones   ECV   ECV  <NA> 2022-05-18 2020-05-26       <NA>
> out[1:2,]
# A tibble: 2 × 9
  ID    FName LName Code1 Code2 Code3 Date1      Date2      Date3     
  <chr> <chr> <chr> <chr> <chr> <chr> <chr>      <chr>      <chr>     
1 1     John  Smith ECV   EDC   EER   2022-04-23 2021-12-21 2020-01-25
2 2     Gina  Jones ECV   ECV   <NA>  2022-05-18 2020-05-26 <NA>      
  • Related