Home > OS >  R: Double Pivots Using DPLYR?
R: Double Pivots Using DPLYR?

Time:12-09

I am working with the R programming language.

I have a dataset that looks something like this:

x = c("GROUP", "A", "B", "C")
date_1 = c("CLASS 1", 20, 60, 82)
date_1_1 = c("CLASS 2", 37, 22, 8)
date_2 = c("CLASS 1", 15,100,76)
date_2_1 = c("CLASS 2", 84, 18,88)

my_data = data.frame(x,  date_1, date_1_1, date_2, date_2_1)

      x  date_1 date_1_1  date_2 date_2_1
1 GROUP CLASS 1  CLASS 2 CLASS 1  CLASS 2
2     A      20       37      15       84
3     B      60       22     100       18
4     C      82        8      76       88
    

I am trying to restructure the data so it looks like this:

  • note : in the real excel data, date_1 is the same date as date_1_1 and date_2 is the same as date_2_1 ... R wont accept the same names, so I called them differently

enter image description here

Currently, I am manually doing this in Excel using different "tranpose" functions - but I am wondering if there is a way to do this in R (possibly using the DPLYR library).

I have been trying to read different tutorial websites online (Pivoting), but so far nothing seems to match the problem I am trying to work on.

Can someone please show me how to do this?

Thanks!

CodePudding user response:

Made assumptions about your data because of the duplicate column names. For example, if the Column header pattern is CLASS_ClassNum_Date

df<-data.frame(GROUP = c("A", "B", "C"),
               CLASS_1_1 = c(20, 60, 82),
               CLASS_2_1 = c(37, 22, 8),
               CLASS_1_2 = c(15,100,76),
               CLASS_2_2 = c(84, 18,88))
library(tidyr)
pivot_longer(df, -GROUP, 
             names_pattern = "(CLASS_.*)_(.*)", 
             names_to = c(".value", "Date"))
  GROUP Date  CLASS_1 CLASS_2
  <chr> <chr>   <dbl>   <dbl>
1 A     1          20      37
2 A     2          15      84
3 B     1          60      22
4 B     2         100      18
5 C     1          82       8
6 C     2          76      88

Edit: Substantially improved pivot_longer by using names_pattern= correctly

CodePudding user response:

There are lots of ways to achieve your desired outcome, but I don't believe there is an 'easy'/'simple' way. Here is one potential solution:

library(tidyverse)
library(vctrs)

x = c("GROUP", "A", "B", "C")
date_1 = c("CLASS 1", 20, 60, 82)
date_1_1 = c("CLASS 2", 37, 22, 8)
date_2 = c("CLASS 1", 15,100,76)
date_2_1 = c("CLASS 2", 84, 18,88)

my_data = data.frame(x,  date_1, date_1_1, date_2, date_2_1)

# Combine column names with the names in the first row
colnames(my_data) <- paste(my_data[1,], colnames(my_data), sep = "-")

my_data %>%
  filter(`GROUP-x` != "GROUP") %>% # remove first row (info now in column names)
  pivot_longer(everything(), # pivot the data
               names_to = c(".value", "Date"),
               names_sep = "-") %>%
  mutate(GROUP = vec_fill_missing(GROUP, # fill NAs in GROUP introduced by pivoting
                                  direction = "downup")) %>%
  filter(Date != "x") %>% # remove "unneeded" rows
  mutate(`CLASS 2` = vec_fill_missing(`CLASS 2`, # fill NAs again 
                                      direction = "downup")) %>%
  na.omit() %>% # remove any remaining NAs
  mutate(across(starts_with("CLASS"), ~as.numeric(.x)),
         Date = str_extract(Date, "\\d ")) %>%
  rename("date" = "Date", # rename the columns
         "group" = "GROUP",
         "count_class_1" = `CLASS 1`,
         "count_class_2" = `CLASS 2`) %>%
  arrange(date) # arrange by "date" to get your desired output
#> # A tibble: 6 × 4
#>   date  group count_class_1 count_class_2
#>   <chr> <chr>         <dbl>         <dbl>
#> 1 1     A                20            37
#> 2 1     B                60            84
#> 3 1     C                82            18
#> 4 2     A                15            37
#> 5 2     B               100            22
#> 6 2     C                76             8

Created on 2022-12-09 with reprex v2.0.2

  • Related