I am working with the R programming language.
I have a dataset that looks something like this:
x = c("GROUP", "A", "B", "C")
date_1 = c("CLASS 1", 20, 60, 82)
date_1_1 = c("CLASS 2", 37, 22, 8)
date_2 = c("CLASS 1", 15,100,76)
date_2_1 = c("CLASS 2", 84, 18,88)
my_data = data.frame(x, date_1, date_1_1, date_2, date_2_1)
x date_1 date_1_1 date_2 date_2_1
1 GROUP CLASS 1 CLASS 2 CLASS 1 CLASS 2
2 A 20 37 15 84
3 B 60 22 100 18
4 C 82 8 76 88
I am trying to restructure the data so it looks like this:
- note : in the real excel data, date_1 is the same date as date_1_1 and date_2 is the same as date_2_1 ... R wont accept the same names, so I called them differently
Currently, I am manually doing this in Excel using different "tranpose" functions - but I am wondering if there is a way to do this in R (possibly using the DPLYR library).
I have been trying to read different tutorial websites online (Pivoting), but so far nothing seems to match the problem I am trying to work on.
Can someone please show me how to do this?
Thanks!
CodePudding user response:
Made assumptions about your data because of the duplicate column names. For example, if the Column header pattern is CLASS_ClassNum_Date
df<-data.frame(GROUP = c("A", "B", "C"),
CLASS_1_1 = c(20, 60, 82),
CLASS_2_1 = c(37, 22, 8),
CLASS_1_2 = c(15,100,76),
CLASS_2_2 = c(84, 18,88))
library(tidyr)
pivot_longer(df, -GROUP,
names_pattern = "(CLASS_.*)_(.*)",
names_to = c(".value", "Date"))
GROUP Date CLASS_1 CLASS_2 <chr> <chr> <dbl> <dbl> 1 A 1 20 37 2 A 2 15 84 3 B 1 60 22 4 B 2 100 18 5 C 1 82 8 6 C 2 76 88
Edit: Substantially improved pivot_longer by using names_pattern=
correctly
CodePudding user response:
There are lots of ways to achieve your desired outcome, but I don't believe there is an 'easy'/'simple' way. Here is one potential solution:
library(tidyverse)
library(vctrs)
x = c("GROUP", "A", "B", "C")
date_1 = c("CLASS 1", 20, 60, 82)
date_1_1 = c("CLASS 2", 37, 22, 8)
date_2 = c("CLASS 1", 15,100,76)
date_2_1 = c("CLASS 2", 84, 18,88)
my_data = data.frame(x, date_1, date_1_1, date_2, date_2_1)
# Combine column names with the names in the first row
colnames(my_data) <- paste(my_data[1,], colnames(my_data), sep = "-")
my_data %>%
filter(`GROUP-x` != "GROUP") %>% # remove first row (info now in column names)
pivot_longer(everything(), # pivot the data
names_to = c(".value", "Date"),
names_sep = "-") %>%
mutate(GROUP = vec_fill_missing(GROUP, # fill NAs in GROUP introduced by pivoting
direction = "downup")) %>%
filter(Date != "x") %>% # remove "unneeded" rows
mutate(`CLASS 2` = vec_fill_missing(`CLASS 2`, # fill NAs again
direction = "downup")) %>%
na.omit() %>% # remove any remaining NAs
mutate(across(starts_with("CLASS"), ~as.numeric(.x)),
Date = str_extract(Date, "\\d ")) %>%
rename("date" = "Date", # rename the columns
"group" = "GROUP",
"count_class_1" = `CLASS 1`,
"count_class_2" = `CLASS 2`) %>%
arrange(date) # arrange by "date" to get your desired output
#> # A tibble: 6 × 4
#> date group count_class_1 count_class_2
#> <chr> <chr> <dbl> <dbl>
#> 1 1 A 20 37
#> 2 1 B 60 84
#> 3 1 C 82 18
#> 4 2 A 15 37
#> 5 2 B 100 22
#> 6 2 C 76 8
Created on 2022-12-09 with reprex v2.0.2