R: Trouble With Pivot_Wider-CodePudding

I am working with the R programming language.

I have a dataset that looks something like this:

# imported as a CSV with read.csv(file, "check.names = FALSE") to avoid duplicate column names

   col 2000-01-01 2000-01-01 2000-01-02 2000-01-02
1 type     type 1     type 2     type 1     type 2
2    A         30         11         74         45
3    B         57         14         63          1
4    C         35          7         14         33

my_data = structure(list(col = c("type", "A", "B", "C"), `2000-01-01` = c("type 1", 
"30", "57", "35"), `2000-01-01` = c("type 2", "11", "14", "7"
), `2000-01-02` = c("type 1", "74", "63", "14"), `2000-01-02` = c("type 2", 
"45", "1", "33")), class = "data.frame", row.names = c(NA, -4L
))

I am trying to transform the above file into the following format:

  col       date type_1_count type_2_count
1   A 2000-01-01           30           11
2   B 2000-01-01           57           14
3   C 2000-01-01           35            7
4   A 2000-01-02           74           45
5   B 2000-01-02           63            1
6   C 2000-01-02           14           33

I am trying to use an answer I received in a previous question (Manually Specifying Columns While Pivoting Tables?) and adapt it for this problem:

library(tidyr)

first_part = colnames(my_data)
second_part = "|"
names(my_data)[2:5] = paste(first_part, second_part)

pivot_longer(df, -col, 
             names_sep = "\\|", 
             names_to = c(".value", "Date"))

The code seems to have run - but the output is not in the correct format:

# A tibble: 8 x 5
  col   Date  `col ` `2000-01-01 ` `2000-01-02 `
  <chr> <chr> <chr>  <chr>         <chr>        
1 type  ""    type 1 type 2        type 2       
2 type  ""    NA     type 1        NA           
3 A     ""    30     11            45           
4 A     ""    NA     74            NA           
5 B     ""    57     14            1            
6 B     ""    NA     63            NA           
7 C     ""    35     7             33           
8 C     ""    NA     14            NA

Can someone please show me what I can do this fix this?

Thanks!

Note: Suppose if there was no "type" column in the original data

my_data = structure(list(col = c("A", "B", "C"), `2000-01-01` = c(86L, 
43L, 73L), `2000-01-02` = c(99L, 77L, 12L)), class = "data.frame", row.names = c(NA, 
-3L))

  col 2000-01-01 2000-01-02
1   A         86         99
2   B         43         77
3   C         73         12

And the objective was to transform the above data to

        date col count
1 2000-01-01   A    86
2 2000-01-01   B    43
3 2000-01-01   C    73
4 2000-01-02   A    99
5 2000-01-02   B    77
6 2000-01-02   C    12

Would this be application of "pivot_longer" this time? Is this correct?

# how come this seems to works for all columns even though I only specified "2001-01-01"?
my_data %>%
  pivot_longer(!col, names_to = "2001-01-01", values_to = "count")

# A tibble: 6 x 3
  col   `2001-01-01` count
  <chr> <chr>        <int>
1 A     2000-01-01      86
2 A     2000-01-02      99
3 B     2000-01-01      43
4 B     2000-01-02      77
5 C     2000-01-01      73
6 C     2000-01-02      12

CodePudding user response：

Your first column breaks the whole data structure, so a little bit pre-processing is necessary:

library(dplyr)
library(tidyr)

colnames(my_data) <- paste(colnames(my_data), my_data[1,], sep = "_")

my_data %>% 
  tibble() %>% 
  slice(-1) %>% 
  mutate(across(-col_type, as.integer)) %>% 
  pivot_longer(-col_type, names_sep = "_", names_to = c("Date", ".value")) %>% 
  rename_with(~gsub("\\s(\\d)", "_\\1_count", .x), starts_with("type"))

This returns

# A tibble: 6 × 4
  col_type Date       type_1_count type_2_count
  <chr>    <chr>             <int>        <int>
1 A        2000-01-01           30           11
2 A        2000-01-02           74           45
3 B        2000-01-01           57           14
4 B        2000-01-02           63            1
5 C        2000-01-01           35            7
6 C        2000-01-02           14           33

At first we extract the first row and construc new column names for your data.frame.
Next we remove the first row and convert the remaining numerical data into integer values.
Then we use pivot_longer to create your desired data.frame