I have a dataset like this (This issue has been solved, I have to removed the datasets since it is kinda of sensitive)
My current code is :
ea2<-ea%>%
select("ea_no","ea_actual","incidence_cases2012","incidence_cases2013","incidence_cases2014","incidence_cases2016","cumulative_incidence_2014","cumulative_incidence_2016")%>%
pivot_longer(
cols = c("incidence_cases2012","incidence_cases2013","incidence_cases2014","incidence_cases2016","cumulative_incidence_2014"),
names_to = "year",
values_to ="incidence_cases"
)%>%
mutate(year=str_sub(year, 16,19)) %>%
pivot_longer(
cols = c("cumulative_incidence_2014","cumulative_incidence_2016"),
names_to = "year2",
values_to = "cumulative_incidence"
) %>% mutate(year2=str_sub(year2, 22,25))
But the problem is that I cannot create one col: "year", and put the same year value from two different vars within the same ea_no. Now the output has two year cols (year and year2) since I just simply run twice pivot long code. The idea output like this:
ea_no. year. cumulative_incidence. incidence
1. 2012. xxx xxxx
2. 2014. na xxxx
3. 2016 xxx xxxx
And my current code's output is like:
ea_no. year. cumulative_incidence. year2 incidence
1. 2012. xxx 2012 xxxx
1. na 2014. xxxx
2. 2016 xxx na na
2. 2012 xxxx
3. 2014. na 2012 xxxx
3. 2016 xxx 2014 xxxx
Does someone help to figure out a solution? and btw, thanks for helping how to smartly subset the number of year from the var name (Now I just use str_sub with fixed distance, but it cannot work for vars with different distance)? Thanks a lot!
CodePudding user response:
Not quite sure if this what you need given that the desired output seems to be incomplete. The important point is using the names_pattern
attribute; here you can define a regex pattern for the new columns, namely two capturing groups:
(incidence_cases|cumulative_incidence)
, which will match and split into columns all columns withincidence_cases
OR withcumulative_incidence
and(\\d )
, which will match and create a new column foryear
Solution:
ea %>%
select(matches("ea|incidence")) %>%
pivot_longer(
cols = matches("incidence"),
names_to = c(".value", "year"),
names_pattern = "(incidence_cases|cumulative_incidence)_?(\\d )"
)
# A tibble: 408 × 5
ea_no ea_actual year incidence_cases cumulative_incidence
<dbl> <chr> <chr> <dbl> <dbl>
1 10499003 "" 2012 NA NA
2 10499003 "" 2013 NA NA
3 10499003 "" 2014 NA NA
4 10499003 "" 2016 NA NA
5 10499004 "" 2012 NA NA
6 10499004 "" 2013 NA NA
7 10499004 "" 2014 NA NA
8 10499004 "" 2016 NA NA
9 10499005 "RV01" 2012 0 NA
10 10499005 "RV01" 2013 0.00726 NA