I am working with health data and have written code to group operation codes with their laterality code into code groups by the day on which they were performed.
I have written code which worked with tidyr(1.1.4)
and now fails on tidyr(1.2.0)
.
reprex below
library(tidyverse)
df <- tribble(
~row, ~opdate_grp, ~side, ~OPERTN, ~OPDATE, ~opgrp,
1,1,"bilateral","W373","2018-03-23",1,
1,1,"bilateral","Z941","2018-03-23",1,
2,1,"bilateral","W374","2018-10-12",1,
2,1,"bilateral","Z941","2018-10-12",1,
3,1,"left","W383","2020-12-08",1,
3,1,"left","Z943","2020-12-08",1,
4,2,"none","O325","2013-03-31",2,
5,1,"none","G47","2012-01-22",1,
5,2,"none","C488","2012-01-25",2,
6,1,"none","X90","2018-12-25",1,
6,2,"none","T36","2018-12-28",2
)
The next chunk was written to reshape the data such that the resulting df included a single column for OPERTN
& OPDATE
and opdate_grp
was an index indicating sequence of operations and dates so they could be filtered later on in the work flow.
The previous code (below) worked with tidyr(version 1.2.0)
.
df %>%
# Make operation groups into a row for string searching
pivot_wider(values_from= c(OPERTN, OPDATE),
names_sep = "",
values_fill=NA,
values_fn = list(OPERTN= ~unlist(.x)%>% toString(),
OPDATE = ~.x[1]))
to produce a table where each row number has operation code groups which are indexed using opgrp
:
# A tibble: 8 × 6
# row opdate_grp side OPERTN OPDATE opgrp
# <dbl> <dbl> <chr> <chr> <chr> <dbl>
# 1 1 bilateral W373, Z941 2018-03-23 1
# 2 1 bilateral W374, Z941 2018-10-12 1
# 3 1 left W383, Z943 2020-12-08 1
# 4 2 none O325 2013-03-31 1
# 5 1 none G47 2012-01-22 1
# 5 2 none C488 2012-01-25 2
# 6 1 none X90 2018-12-25 1
# 6 2 none T36 2018-12-28 2
However, when specifying names_from = opdate_grp
, which is required in tidyr(1.2.0)
, the resulting df contains seperate columns for each operation code gorup and date, which is annoying and stops my code from working. The reult is below:
df %>%
# Make operation groups into a row for string searching
pivot_wider(values_from= c(OPERTN, OPDATE),
names_from= opgrp,
names_sep = "_",
#values_fill=NA,
values_fn = list(OPERTN= ~unlist(.x)%>% toString(),
OPDATE = ~.x[1]))
# A tibble: 8 × 7
# row opdate_grp side OPERTN1 OPERTN2 OPDATE1 OPDATE2
# <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
# 1 1 bilateral W373, Z941 NA 2018-03-23 NA
# 2 1 bilateral W374, Z941 NA 2018-10-12 NA
# 3 1 left W383, Z943 NA 2020-12-08 NA
# 4 2 none NA O325 NA 2013-03-31
# 5 1 none G47 NA 2012-01-22 NA
# 5 2 none NA C488 NA 2012-01-25
# 6 1 none X90 NA 2018-12-25 NA
# 6 2 none NA T36 NA 2018-12-28
What I have tried
I have tried repeating reshaping functions to bring the multiple columns into one but this is clunky and does not scale well.
I have tried rewriting pivot_wider()
code but this fails as I cannot alter the invisible processes.
Question:
Is there a work around to prevent this new behaviour or to fix this?
My real data contains up to 15 operation groups for each row
and over 10 million rows, so the solution should scale and be stable (i hope) with future updates.
CodePudding user response:
Below code works in tidyr 1.2.0
library(tidyr)
library(dplyr)
df %>%
pivot_wider(values_from = c(OPERTN, OPDATE),
names_from = opgrp, values_fn = list(OPERTN= ~unlist(.x)%>%
toString(),
OPDATE = ~.x[1]), names_sep = "")
-output
# A tibble: 8 × 7
row opdate_grp side OPERTN1 OPERTN2 OPDATE1 OPDATE2
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 1 1 bilateral W373, Z941 <NA> 2018-03-23 <NA>
2 2 1 bilateral W374, Z941 <NA> 2018-10-12 <NA>
3 3 1 left W383, Z943 <NA> 2020-12-08 <NA>
4 4 2 none <NA> O325 <NA> 2013-03-31
5 5 1 none G47 <NA> 2012-01-22 <NA>
6 5 2 none <NA> C488 <NA> 2012-01-25
7 6 1 none X90 <NA> 2018-12-25 <NA>
8 6 2 none <NA> T36 <NA> 2018-12-28
If the output required is a single column, use group_by
with summarise
df %>%
group_by(row, opdate_grp, side, OPDATE) %>%
summarise(OPERTN = toString(OPERTN), .groups = 'drop')
# A tibble: 8 × 5
row opdate_grp side OPERTN OPDATE
<dbl> <dbl> <chr> <chr> <chr>
1 1 1 bilateral W373, Z941 2018-03-23
2 2 1 bilateral W374, Z941 2018-10-12
3 3 1 left W383, Z943 2020-12-08
4 4 2 none O325 2013-03-31
5 5 1 none G47 2012-01-22
6 5 2 none C488 2012-01-25
7 6 1 none X90 2018-12-25
8 6 2 none T36 2018-12-28