Home > Net >  Manually Specifying Columns While Pivoting Tables?
Manually Specifying Columns While Pivoting Tables?

Time:12-10

I have this data in R:

df<-data.frame(GROUP = c("A", "B", "C"),
               CLASS_1_1 = c(20, 60, 82),
               CLASS_2_1 = c(37, 22, 8),
               CLASS_1_2 = c(15,100,76),
               CLASS_2_2 = c(84, 18,88))

  GROUP CLASS_1_1 CLASS_2_1 CLASS_1_2 CLASS_2_2
1     A        20        37        15        84
2     B        60        22       100        18
3     C        82         8        76        88

I am using this code (from an answer provided here: https://stackoverflow.com/users/13203841/stats-noob) to pivot the data:

library(tidyr)
pivot_longer(df, -GROUP, 
             names_pattern = "(CLASS_.*)_(.*)", 
             names_to = c(".value", "Date"))

# A tibble: 6 x 4
  GROUP Date  CLASS_1 CLASS_2
  <chr> <chr>   <dbl>   <dbl>
1 A     1          20      37
2 A     2          15      84
3 B     1          60      22
4 B     2         100      18
5 C     1          82       8
6 C     2          76      88

Is there some way I can do this without using "names_pattern" and just manually specify the names?

The reason that I am bringing this up is because in instances where the columns do not have a similar naming structure, I feel it would be easier to manually specify columns (e.g. colnames(df) and then manually copy and paste the results of this statement, or to just reference these columns by d[,c(2,3,4,5) )

As an example:

#pseudocode - does not work : OPTION 1

pivot_longer(df, -GROUP, 
             names_pattern = "c(CLASS_1_1, CLASS_2_1, CLASS_1_2, CLASS_2_2)", 
             names_to = c(".value", "Date"))

#pseudocode - does not work : OPTION 2

pivot_longer(df, -GROUP, 
             names_pattern = "df[,c(2,3,4,5)]", 
             names_to = c(".value", "Date"))

Anyone know?

Thanks!!

CodePudding user response:

If you want to type out the names, use a separator that names_sep can pick out easily and rename the wide columns before the pivot, something like this:

first_part = c("Class_1", "Class_2", "Class_1", "Class_2")
second_part = c(1, 1, 2, 2)

first_part = c("Class_1", "Class_2", "Class_1", "Class_2")
second_part = c(1, 1, 2, 2)
names(df)[2:5] = paste(first_part, second_part, sep = "|")
pivot_longer(df, -GROUP, 
             names_sep = "\\|", # | is a special regex character so we escape it
             names_to = c(".value", "Date"))
# # A tibble: 6 × 4
#   GROUP Date  Class_1 Class_2
#   <chr> <chr>   <dbl>   <dbl>
# 1 A     1          20      37
# 2 A     2          15      84
# 3 B     1          60      22
# 4 B     2         100      18
# 5 C     1          82       8
# 6 C     2          76      88

Or in your original case you could give name_sep an integer that will be interpreted as a position, in your example the split is always at the 8th character, but that leaves us with a trailing _ to clean up:

pivot_longer(df, -GROUP, 
             names_sep = 8,
             names_to = c(".value", "Date")) %>%
  rename_with(.fn = sub, pattern = "_$", replacement = "")

Or you could write a regex pattern to separate at the last _, though this is a more complicated regex pattern:

pivot_longer(df, -GROUP, 
             names_sep = "_(?=[^_]*$)",
             names_to = c(".value", "Date"))
  • Related