Home > OS >  Pivot_wider without removing duplicates
Pivot_wider without removing duplicates

Time:05-22

I want to use pivot_wider, the goal being that the number of resulting columns are equal to the number of rows pivoted, by keeping duplicate values separate.

My example dataset:

data <- data.frame(Person = c("Peter", "Peter", "Peter", "Peter", "Peter", "Peter",
                              "Carol", "Carol", "Carol", "Carol", "Carol", "Carol"),
                  GroupID = c(1, 1, 2, 2, 3, 3, 1, 1, 4, 4, 5, 5),
                  GroupTheme = c(1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2),
                  Committee = c("Transport", "State", "Transport", "State", "Transport", "State",
                                "Technology", "Nature", "Technology", "Nature", "Technology", "Nature"))

I want to have one row per Person. For that, i need to widen the dataset by GroupID and groupTheme. I want to have one row per Person. Note that the observations for "Committee" of a Person repeat for each Group. This is intended and is the case for every "Name" in the original dataset.

The code I have used so far:

widened = function(col, pre){
  data %>%
    select(Person, {{col}}) %>% 
    distinct() %>%
    with_groups(Person, ~mutate(.x, n = row_number())) %>% 
    pivot_wider(names_from = n, values_from = {{col}}, names_prefix = pre)
}

data <- reduce(list(widened(GroupID, "GroupID_"),
            widened(GroupTheme, "GroupTheme_"),
            widened(Committee, "Committee_")), 
       left_join, by = "Person")

results in the following dataset:

Person GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 Committee_1 Committee_2
  <chr>      <dbl>     <dbl>     <dbl>        <dbl>        <dbl> <chr>       <chr>      
1 Peter          1         2         3            1            2 Transport   State      
2 Carol          1         4         5            1            2 Technology  Nature 

As you can see, there are 3 columns with GroupID_ but only 2 columns with GroupThemes_. This is because the maximum number of unique values for GroupTheme_ is 2 over all rows.

However, I want to be able to match each GroupID_ to its correspondent GroupTheme_. So, GroupTheme_1 should correspond to GroupID_1 and etc. The dataset should look like this:

Person GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 GroupTheme_3 Committee_1
1  Peter         1         2         3            1            1            2   Transport
2  Carol         1         4         5            1            2            2  Technology
  Committee_2
1       State
2      Nature

In my opinion, this is done by not removing the duplicate values between the GroupTheme_ columns. This allows me to match each GroupID_ to each GroupTheme_ by number, just like it was the case in the original longer dataset.

I played around with the options of pivot_wider but did not figure out a way to do this.

If you have an alternative (maybe more straightforward) way to solve the problem of being able to match each ID to Theme after pivoting wider, that is very much appreciated as well.

Thank you in advance

CodePudding user response:

data %>%
  group_by(Person) %>%
  mutate(name = as.integer(factor(Committee, unique(Committee))))%>%
  pivot_wider(c(Person, GroupID, GroupTheme), values_from = Committee,
              names_prefix = 'Committee_') %>%
  mutate(name = row_number()) %>%
  pivot_wider(c(Person, starts_with('Committee')), 
              values_from = c(GroupID, GroupTheme))


# A tibble: 2 x 9
# Groups:   Person [2]
  Person Committee_1 Committee_2 GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 GroupTheme_3
  <chr>  <chr>       <chr>           <dbl>     <dbl>     <dbl>        <dbl>        <dbl>        <dbl>
1 Peter  Transport   State               1         2         3            1            1            2
2 Carol  Technology  Nature              1         4         5            1            2            2
  • Related