Home > front end >  Pivot multiple values while keep same name column but different value columns
Pivot multiple values while keep same name column but different value columns

Time:10-19

I have a dataset like this (This issue has been solved, I have to removed the datasets since it is kinda of sensitive)

My current code is :

ea2<-ea%>%
  select("ea_no","ea_actual","incidence_cases2012","incidence_cases2013","incidence_cases2014","incidence_cases2016","cumulative_incidence_2014","cumulative_incidence_2016")%>%
    pivot_longer(
    cols = c("incidence_cases2012","incidence_cases2013","incidence_cases2014","incidence_cases2016","cumulative_incidence_2014"),
    names_to = "year",
    values_to ="incidence_cases"
  )%>%
  mutate(year=str_sub(year, 16,19)) %>%
    pivot_longer(
    cols = c("cumulative_incidence_2014","cumulative_incidence_2016"),
    names_to = "year2",
    values_to = "cumulative_incidence"
  ) %>% mutate(year2=str_sub(year2, 22,25)) 

But the problem is that I cannot create one col: "year", and put the same year value from two different vars within the same ea_no. Now the output has two year cols (year and year2) since I just simply run twice pivot long code. The idea output like this:

ea_no.  year.   cumulative_incidence.  incidence
1.       2012.    xxx                    xxxx
2.       2014.     na                    xxxx
3.       2016      xxx                   xxxx

And my current code's output is like:

ea_no.  year.   cumulative_incidence.   year2  incidence
1.       2012.    xxx                    2012      xxxx
1.                 na                    2014.     xxxx
2.       2016      xxx                   na         na     
2.                                       2012      xxxx
3.       2014.     na                    2012      xxxx
3.       2016      xxx                   2014      xxxx

Does someone help to figure out a solution? and btw, thanks for helping how to smartly subset the number of year from the var name (Now I just use str_sub with fixed distance, but it cannot work for vars with different distance)? Thanks a lot!

CodePudding user response:

Not quite sure if this what you need given that the desired output seems to be incomplete. The important point is using the names_pattern attribute; here you can define a regex pattern for the new columns, namely two capturing groups:

  • (incidence_cases|cumulative_incidence), which will match and split into columns all columns with incidence_cases OR with cumulative_incidenceand
  • (\\d ), which will match and create a new column for year

Solution:

ea %>%
  select(matches("ea|incidence")) %>%
  pivot_longer(
    cols = matches("incidence"),
    names_to = c(".value", "year"),
    names_pattern = "(incidence_cases|cumulative_incidence)_?(\\d )"
  )
# A tibble: 408 × 5
      ea_no ea_actual year  incidence_cases cumulative_incidence
      <dbl> <chr>     <chr>           <dbl>                <dbl>
 1 10499003 ""        2012         NA                         NA
 2 10499003 ""        2013         NA                         NA
 3 10499003 ""        2014         NA                         NA
 4 10499003 ""        2016         NA                         NA
 5 10499004 ""        2012         NA                         NA
 6 10499004 ""        2013         NA                         NA
 7 10499004 ""        2014         NA                         NA
 8 10499004 ""        2016         NA                         NA
 9 10499005 "RV01"    2012          0                         NA
10 10499005 "RV01"    2013          0.00726                   NA
  •  Tags:  
  • r
  • Related