Home > Enterprise >  How to group by data in one column into multiple columns keeping rows
How to group by data in one column into multiple columns keeping rows

Time:04-30

Identifier Value
511016 75.72911
511016 79.01783
511016 74.87570
511029 72.75873
511029 74.41798
511029 78.56112

The dataset consists of two columns (as above) but 77,000 rows. The first column is the 'name' vector and the second the value. I need to transform the data so that the first column has just one value for the identifier and after this, the columns take all the values that the respective identifier has. Like this:

Identifier Value 1 Value 2 etc...
511016 75.72911 79.01783
511029 72.75873 74.41798

I have been able to use group_keys to produce a single column with the identifiers but cannot get the values to match in this way. Any help is appreciated.

CodePudding user response:

library(dplyr)
library(tidyr)

df <- tribble(~Identifier,  ~Value,
511016,     75.72911,
511016,     79.01783,
511016,     74.87570,
511029,     72.75873,
511029,     74.41798,
511029,     78.56112
) 


df %>% 
  group_by(Identifier) %>% 
  mutate(row_id = row_number()) %>% 
  pivot_wider(id_cols = "Identifier",  names_from = row_id,
              values_from = Value, 
              names_glue = "{.value}{row_id}") %>% 
  ungroup()

CodePudding user response:

This should do it:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
dat <- structure(list(Identifier = c(511016L, 511016L, 511016L, 511029L, 
                              511029L, 511029L), 
               Value = c(75.72911, 79.01783, 74.8757, 72.75873, 
                         74.41798, 78.56112)), row.names = c(NA, 6L), class = "data.frame")
dat %>% 
  group_by(Identifier) %>% 
  mutate(obs = row_number()) %>% 
  pivot_wider(names_from = "obs", 
                    values_from = "Value", 
                    names_prefix="value")
#> # A tibble: 2 × 4
#> # Groups:   Identifier [2]
#>   Identifier value1 value2 value3
#>        <int>  <dbl>  <dbl>  <dbl>
#> 1     511016   75.7   79.0   74.9
#> 2     511029   72.8   74.4   78.6

Created on 2022-04-29 by the reprex package (v2.0.1)

CodePudding user response:

Assuming your df has two columns, One similar Option is:

library(dplyr)
library(tidyr)

df %>% 
  group_by(Identifier) %>% 
  mutate(name = paste(colnames(df[2]), row_number())) %>% 
  pivot_wider(names_from = name, values_from = Value)

  Identifier `Value 1` `Value 2` `Value 3`
       <int>     <dbl>     <dbl>     <dbl>
1     511016      75.7      79.0      74.9
2     511029      72.8      74.4      78.6
  •  Tags:  
  • r
  • Related