Home > OS >  R: Trouble With Pivot_Wider
R: Trouble With Pivot_Wider

Time:12-10

I am working with the R programming language.

I have a dataset that looks something like this:

# imported as a CSV with read.csv(file, "check.names = FALSE") to avoid duplicate column names

   col 2000-01-01 2000-01-01 2000-01-02 2000-01-02
1 type     type 1     type 2     type 1     type 2
2    A         30         11         74         45
3    B         57         14         63          1
4    C         35          7         14         33

my_data = structure(list(col = c("type", "A", "B", "C"), `2000-01-01` = c("type 1", 
"30", "57", "35"), `2000-01-01` = c("type 2", "11", "14", "7"
), `2000-01-02` = c("type 1", "74", "63", "14"), `2000-01-02` = c("type 2", 
"45", "1", "33")), class = "data.frame", row.names = c(NA, -4L
))

I am trying to transform the above file into the following format:

  col       date type_1_count type_2_count
1   A 2000-01-01           30           11
2   B 2000-01-01           57           14
3   C 2000-01-01           35            7
4   A 2000-01-02           74           45
5   B 2000-01-02           63            1
6   C 2000-01-02           14           33

I am trying to use an answer I received in a previous question (Manually Specifying Columns While Pivoting Tables?) and adapt it for this problem:

library(tidyr)

first_part = colnames(my_data)
second_part = "|"
names(my_data)[2:5] = paste(first_part, second_part)

pivot_longer(df, -col, 
             names_sep = "\\|", 
             names_to = c(".value", "Date"))

The code seems to have run - but the output is not in the correct format:

# A tibble: 8 x 5
  col   Date  `col ` `2000-01-01 ` `2000-01-02 `
  <chr> <chr> <chr>  <chr>         <chr>        
1 type  ""    type 1 type 2        type 2       
2 type  ""    NA     type 1        NA           
3 A     ""    30     11            45           
4 A     ""    NA     74            NA           
5 B     ""    57     14            1            
6 B     ""    NA     63            NA           
7 C     ""    35     7             33           
8 C     ""    NA     14            NA    

Can someone please show me what I can do this fix this?

Thanks!

Note: Suppose if there was no "type" column in the original data

my_data = structure(list(col = c("A", "B", "C"), `2000-01-01` = c(86L, 
43L, 73L), `2000-01-02` = c(99L, 77L, 12L)), class = "data.frame", row.names = c(NA, 
-3L))

  col 2000-01-01 2000-01-02
1   A         86         99
2   B         43         77
3   C         73         12

And the objective was to transform the above data to

        date col count
1 2000-01-01   A    86
2 2000-01-01   B    43
3 2000-01-01   C    73
4 2000-01-02   A    99
5 2000-01-02   B    77
6 2000-01-02   C    12

Would this be application of "pivot_longer" this time? Is this correct?

# how come this seems to works for all columns even though I only specified "2001-01-01"?
my_data %>%
  pivot_longer(!col, names_to = "2001-01-01", values_to = "count")

# A tibble: 6 x 3
  col   `2001-01-01` count
  <chr> <chr>        <int>
1 A     2000-01-01      86
2 A     2000-01-02      99
3 B     2000-01-01      43
4 B     2000-01-02      77
5 C     2000-01-01      73
6 C     2000-01-02      12

CodePudding user response:

Your first column breaks the whole data structure, so a little bit pre-processing is necessary:

library(dplyr)
library(tidyr)

colnames(my_data) <- paste(colnames(my_data), my_data[1,], sep = "_")

my_data %>% 
  tibble() %>% 
  slice(-1) %>% 
  mutate(across(-col_type, as.integer)) %>% 
  pivot_longer(-col_type, names_sep = "_", names_to = c("Date", ".value")) %>% 
  rename_with(~gsub("\\s(\\d)", "_\\1_count", .x), starts_with("type"))

This returns

# A tibble: 6 × 4
  col_type Date       type_1_count type_2_count
  <chr>    <chr>             <int>        <int>
1 A        2000-01-01           30           11
2 A        2000-01-02           74           45
3 B        2000-01-01           57           14
4 B        2000-01-02           63            1
5 C        2000-01-01           35            7
6 C        2000-01-02           14           33
  • At first we extract the first row and construc new column names for your data.frame.
  • Next we remove the first row and convert the remaining numerical data into integer values.
  • Then we use pivot_longer to create your desired data.frame
  • Related