Thank you for looking, I struggled to think of a good name for the question.
I have looked at many different versions of my question here and elsewhere but haven't found the exact answer I need.
I have data consisting of ID numbers, codes, and dates. There are often more than 1 row per ID, there is no set number of times an ID can appear. The data now are in long format, I need to pivot them to wide format. I need the result to be 1 row per ID, with the Code, and Date fields widened. I want to keep them the same names but add a N to the end of the field name where the highest N is the max number of times a single ID repeats...If an ID repeats 9 times, the pivot will produce 9 new columns for Code and 9 for Date, as Code1, Code2...Code9.
I can do it, see below, but the names are a total mess. I am unable to get the names to be neat the way I describe.
EG Data:
eg_data <- data.frame(
ID = c('1','1','1','2', '2', '3', '4', '4') ,
FName = c('John','John','John','Gina', 'Gina', 'Tom', 'Bobby', 'Bobby') ,
LName = c('Smith','Smith','Smith','Jones', 'Jones', 'Anderson', 'Kennedy', 'Kennedy') ,
Code = c('ECV','EDC','EER','ECV', 'ECV', 'EER', 'EDC', 'EER') ,
Date = c('2022-04-23','2021-12-21','2020-01-25','2022-05-18', '2020-05-26', '2021-01-21', '2020-05-14', '2020-06-25'))
What I've done - yes, it works, but it requires renaming every_single_column that gets widened. There must be a better way:
eg_data %>%
group_by(ID) %>%
mutate(
Count = row_number(),
CodeName = paste0('ContactCode', Count),
DateName = paste0('ContactDate', Count)) %>%
ungroup() %>%
select(-Count) %>%
pivot_wider(
names_from = c(CodeName, DateName), values_from = c(Code, Date)) -> LongNamesYuck
Desired Output (using two of the IDs above for brevity)
desired_format <- data.frame(
ID = c('1','2'),
FName = c('John', 'Gina'),
LName = c('Smith', 'Jones'),
Code1 = c('ECV', 'ECV'),
Code2 = c('EDC', 'ECV'),
Code3 = c('EER', NA),
Date1 = c('2022-04-23', '2022-05-18'),
Date2 = c('2021-12-21', '2020-05-26'),
Date3 = c('2020-01-25', NA))
The example below gets close, but it uses the values from one of the fields as new field names, and I don't want that.
How to `pivot_wider` multiple columns without combining the names?
Any help is appreciated, thank you.
CodePudding user response:
We can make it compact i.e. using rowid
(from data.table
instead of two lines group_by/mutate
), and then use pivot_wider
with names_from
on the sequence ('rn') column and the values_from
as a vector of column names (quoted/unquoted)
library(dplyr)
library(tidyr)
library(data.table)
out <- eg_data %>%
mutate(rn = rowid(ID, FName, LName)) %>%
pivot_wider(names_from = rn, values_from = c(Code, Date), names_sep = "")
-checking the output with desired_format
> desired_format
ID FName LName Code1 Code2 Code3 Date1 Date2 Date3
1 1 John Smith ECV EDC EER 2022-04-23 2021-12-21 2020-01-25
2 2 Gina Jones ECV ECV <NA> 2022-05-18 2020-05-26 <NA>
> out[1:2,]
# A tibble: 2 × 9
ID FName LName Code1 Code2 Code3 Date1 Date2 Date3
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 John Smith ECV EDC EER 2022-04-23 2021-12-21 2020-01-25
2 2 Gina Jones ECV ECV <NA> 2022-05-18 2020-05-26 <NA>