I have this data in R:
df<-data.frame(GROUP = c("A", "B", "C"),
CLASS_1_1 = c(20, 60, 82),
CLASS_2_1 = c(37, 22, 8),
CLASS_1_2 = c(15,100,76),
CLASS_2_2 = c(84, 18,88))
GROUP CLASS_1_1 CLASS_2_1 CLASS_1_2 CLASS_2_2
1 A 20 37 15 84
2 B 60 22 100 18
3 C 82 8 76 88
I am using this code (from an answer provided here: https://stackoverflow.com/users/13203841/stats-noob) to pivot the data:
library(tidyr)
pivot_longer(df, -GROUP,
names_pattern = "(CLASS_.*)_(.*)",
names_to = c(".value", "Date"))
# A tibble: 6 x 4
GROUP Date CLASS_1 CLASS_2
<chr> <chr> <dbl> <dbl>
1 A 1 20 37
2 A 2 15 84
3 B 1 60 22
4 B 2 100 18
5 C 1 82 8
6 C 2 76 88
Is there some way I can do this without using "names_pattern" and just manually specify the names?
The reason that I am bringing this up is because in instances where the columns do not have a similar naming structure, I feel it would be easier to manually specify columns (e.g. colnames(df)
and then manually copy and paste the results of this statement, or to just reference these columns by d[,c(2,3,4,5)
)
As an example:
#pseudocode - does not work : OPTION 1
pivot_longer(df, -GROUP,
names_pattern = "c(CLASS_1_1, CLASS_2_1, CLASS_1_2, CLASS_2_2)",
names_to = c(".value", "Date"))
#pseudocode - does not work : OPTION 2
pivot_longer(df, -GROUP,
names_pattern = "df[,c(2,3,4,5)]",
names_to = c(".value", "Date"))
Anyone know?
Thanks!!
CodePudding user response:
If you want to type out the names, use a separator that names_sep
can pick out easily and rename the wide columns before the pivot, something like this:
first_part = c("Class_1", "Class_2", "Class_1", "Class_2")
second_part = c(1, 1, 2, 2)
first_part = c("Class_1", "Class_2", "Class_1", "Class_2")
second_part = c(1, 1, 2, 2)
names(df)[2:5] = paste(first_part, second_part, sep = "|")
pivot_longer(df, -GROUP,
names_sep = "\\|", # | is a special regex character so we escape it
names_to = c(".value", "Date"))
# # A tibble: 6 × 4
# GROUP Date Class_1 Class_2
# <chr> <chr> <dbl> <dbl>
# 1 A 1 20 37
# 2 A 2 15 84
# 3 B 1 60 22
# 4 B 2 100 18
# 5 C 1 82 8
# 6 C 2 76 88
Or in your original case you could give name_sep
an integer that will be interpreted as a position, in your example the split is always at the 8th character, but that leaves us with a trailing _
to clean up:
pivot_longer(df, -GROUP,
names_sep = 8,
names_to = c(".value", "Date")) %>%
rename_with(.fn = sub, pattern = "_$", replacement = "")
Or you could write a regex pattern to separate at the last _
, though this is a more complicated regex pattern:
pivot_longer(df, -GROUP,
names_sep = "_(?=[^_]*$)",
names_to = c(".value", "Date"))