My initial dataset
df1 <- structure(list(id = c(1, 1, 2, 3, 3, 3),
name = c("james", "james", "peter", "anne", "anne", "anne"),
trip_id = c(10,11,10,30,11,32),
date = c("2021/01/01", "2021/06/01","2021/08/01","2021/10/01","2021/10/21","2021/12/01"),
cost = c(100,150,3000,1200,1100,5000)
),
row.names = c(NA,-6L),
class = c("tbl_df", "tbl", "data.frame"))
I require to pivot wider the date and the cost of each trip so they are both together in pairs. I think im closer but would appreciate your feedback.
My current code
df2= df1 %>% pivot_wider(names_from = trip_id,
values_from = c(date, cost))
My desired result
df2 <- structure(list(id = c(1, 2, 3),
name = c("james", "peter", "anne"),
date_10 = c("2021/01/01","2021/08/01",NA),
cost_10 = c(100,3000,NA),
date_11 = c("2021/06/01",NA,"2021/10/21"),
cost_11 = c(150,NA,1100),
date_30 = c(NA,NA,"2021/10/01"),
cost_30 = c(NA,NA,1200),
date_32 = c(NA,NA,"2021/12/01"),
cost_32 = c(NA,NA,5000)
),
row.names = c(NA,-3L),
class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
It looks like you were quite close. We take the trip_id
before pivot_wider
to help reorder the columns. You may or may not need the sort
depending on your desired result. If you just want pairs, there's no need to sort.
library(tidyverse)
nums <- sort(unique(df1$trip_id))
nums <- as.character(nums)
df2 <-
df1 %>%
pivot_wider(names_from = trip_id,
values_from = c(date, cost)) %>%
select(id, name, ends_with(nums))
df2
#> # A tibble: 3 x 10
#> id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
#> <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 1 james 2021/01/01 100 2021/0~ 150 <NA> NA <NA> NA
#> 2 2 peter 2021/08/01 3000 <NA> NA <NA> NA <NA> NA
#> 3 3 anne <NA> NA 2021/1~ 1100 2021/1~ 1200 2021/1~ 5000
CodePudding user response:
For this purpose you do need an additional pivot_longer
step before changing your data shape into wide format. Note that I used values_transform
argument in pivot_longer
to change the class of cost
to character so that I could combine it with date
in the intermediate val
variable:
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(c(date, cost), names_to = "var",
values_to = "val",
values_transform = list(val = as.character)) %>%
pivot_wider(names_from = c(var, trip_id), values_from = val) %>%
mutate(across(starts_with("cost"), as.double))
# A tibble: 3 x 10
id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
<dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 1 james 2021/01/01 100 2021/06/01 150 NA NA NA NA
2 2 peter 2021/08/01 3000 NA NA NA NA NA NA
3 3 anne NA NA 2021/10/21 1100 2021/10/01 1200 2021/12/01 5000
CodePudding user response:
The way you have it is correct. Column numbering/Row numbering should not impact the data unless you have time series/panel data and some few exceptions.
Otherwise, you could accomplish the same using the reshape
function which will give you what you want. Mark you this is Base R:
reshape(data.frame(df1), timevar = 'trip_id', idvar = c('id', 'name'), dir='wide', sep = '_')
id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
1 1 james 2021/01/01 100 2021/06/01 150 <NA> NA <NA> NA
3 2 peter 2021/08/01 3000 <NA> NA <NA> NA <NA> NA
4 3 anne <NA> NA 2021/10/21 1100 2021/10/01 1200 2021/12/01 5000