I am working with the R programming language.
I have a dataset that looks something like this:
# imported as a CSV with read.csv(file, "check.names = FALSE") to avoid duplicate column names
col 2000-01-01 2000-01-01 2000-01-02 2000-01-02
1 type type 1 type 2 type 1 type 2
2 A 30 11 74 45
3 B 57 14 63 1
4 C 35 7 14 33
my_data = structure(list(col = c("type", "A", "B", "C"), `2000-01-01` = c("type 1",
"30", "57", "35"), `2000-01-01` = c("type 2", "11", "14", "7"
), `2000-01-02` = c("type 1", "74", "63", "14"), `2000-01-02` = c("type 2",
"45", "1", "33")), class = "data.frame", row.names = c(NA, -4L
))
I am trying to transform the above file into the following format:
col date type_1_count type_2_count
1 A 2000-01-01 30 11
2 B 2000-01-01 57 14
3 C 2000-01-01 35 7
4 A 2000-01-02 74 45
5 B 2000-01-02 63 1
6 C 2000-01-02 14 33
I am trying to use an answer I received in a previous question (Manually Specifying Columns While Pivoting Tables?) and adapt it for this problem:
library(tidyr)
first_part = colnames(my_data)
second_part = "|"
names(my_data)[2:5] = paste(first_part, second_part)
pivot_longer(df, -col,
names_sep = "\\|",
names_to = c(".value", "Date"))
The code seems to have run - but the output is not in the correct format:
# A tibble: 8 x 5
col Date `col ` `2000-01-01 ` `2000-01-02 `
<chr> <chr> <chr> <chr> <chr>
1 type "" type 1 type 2 type 2
2 type "" NA type 1 NA
3 A "" 30 11 45
4 A "" NA 74 NA
5 B "" 57 14 1
6 B "" NA 63 NA
7 C "" 35 7 33
8 C "" NA 14 NA
Can someone please show me what I can do this fix this?
Thanks!
Note: Suppose if there was no "type" column in the original data
my_data = structure(list(col = c("A", "B", "C"), `2000-01-01` = c(86L,
43L, 73L), `2000-01-02` = c(99L, 77L, 12L)), class = "data.frame", row.names = c(NA,
-3L))
col 2000-01-01 2000-01-02
1 A 86 99
2 B 43 77
3 C 73 12
And the objective was to transform the above data to
date col count
1 2000-01-01 A 86
2 2000-01-01 B 43
3 2000-01-01 C 73
4 2000-01-02 A 99
5 2000-01-02 B 77
6 2000-01-02 C 12
Would this be application of "pivot_longer" this time? Is this correct?
# how come this seems to works for all columns even though I only specified "2001-01-01"?
my_data %>%
pivot_longer(!col, names_to = "2001-01-01", values_to = "count")
# A tibble: 6 x 3
col `2001-01-01` count
<chr> <chr> <int>
1 A 2000-01-01 86
2 A 2000-01-02 99
3 B 2000-01-01 43
4 B 2000-01-02 77
5 C 2000-01-01 73
6 C 2000-01-02 12
CodePudding user response:
Your first column breaks the whole data structure, so a little bit pre-processing is necessary:
library(dplyr)
library(tidyr)
colnames(my_data) <- paste(colnames(my_data), my_data[1,], sep = "_")
my_data %>%
tibble() %>%
slice(-1) %>%
mutate(across(-col_type, as.integer)) %>%
pivot_longer(-col_type, names_sep = "_", names_to = c("Date", ".value")) %>%
rename_with(~gsub("\\s(\\d)", "_\\1_count", .x), starts_with("type"))
This returns
# A tibble: 6 × 4
col_type Date type_1_count type_2_count
<chr> <chr> <int> <int>
1 A 2000-01-01 30 11
2 A 2000-01-02 74 45
3 B 2000-01-01 57 14
4 B 2000-01-02 63 1
5 C 2000-01-01 35 7
6 C 2000-01-02 14 33
- At first we extract the first row and construc new column names for your data.frame.
- Next we remove the first row and convert the remaining numerical data into integer values.
- Then we use
pivot_longer
to create your desired data.frame