I have a df similar to the following:
df <- data.frame("name" = c("sue","joe","tom"),
"x0_date" = c("2020-1-20", "2020-2-20", "2020-3-20"),
"x0_class" = c("biology", "physics", "chemistry"),
"x1_date" = c("2021-4-18", NA, "2021-6-18"),
"x1_class" = c("english", NA, "drama"))
df
name x0_date x0_class x1_date x1_class
1 sue 2020-1-20 biology 2021-4-18 english
2 joe 2020-2-20 physics <NA> <NA>
3 tom 2020-3-20 chemistry 2021-6-18 drama
I am trying to pivot longer such that my outcome is as follows:
name date class
1 sue 2020-1-20 biology
2 sue 2021-4-18 english
3 joe 2020-2-20 physics
4 tom 2020-3-20 chemistry
5 tom 2021-6-18 drama
I guess I've only ever pivoted when some of the data is numeric, so I'm having trouble. I've tried something like:
test <- df %>%
pivot_longer(matches("date|class"),
names_to = c("date", "class"),
names_pattern = "(\\d )_(. )",
values_drop_na = TRUE)
But the output isn't quite right. If it's relevant, my wide data is extremely wide ("x0" to "x_400"). Any help is appreciated!
CodePudding user response:
We need names_to
as .value
. According to ?pivot_longer
names_to - If length >1, multiple columns will be created. In this case, one of names_sep or names_pattern must be supplied to specify how the column names should be split. There are also two additional character values you can take advantage of:
NA will discard the corresponding component of the column name.
".value" indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overriding values_to entirely.
Thus, specify that as .value
and use names_pattern
to capture only the characters that succeeds the _
i.e. (_(.*)
)
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -name, names_to = c(".value"),
names_pattern = ".*_(.*)", values_drop_na = TRUE)
-output
# A tibble: 5 × 3
name date class
<chr> <chr> <chr>
1 sue 2020-1-20 biology
2 sue 2021-4-18 english
3 joe 2020-2-20 physics
4 tom 2020-3-20 chemistry
5 tom 2021-6-18 drama
If we need the x0
, x1
prefix as another column, then we need to specify the column name for the prefix and also to capture the prefix part
df %>%
pivot_longer(cols = -name, names_to = c("categ", ".value"),
names_pattern = "(.*)_(.*)", values_drop_na = TRUE)
-output
# A tibble: 5 × 4
name categ date class
<chr> <chr> <chr> <chr>
1 sue x0 2020-1-20 biology
2 sue x1 2021-4-18 english
3 joe x0 2020-2-20 physics
4 tom x0 2020-3-20 chemistry
5 tom x1 2021-6-18 drama
The reason the OP's code didn't work
df %>%
pivot_longer(matches("date|class"),
names_to = c("date", "class"),
names_pattern = "(\\d )_(. )",
values_drop_na = TRUE)
# A tibble: 10 × 4
name date class value
<chr> <chr> <chr> <chr>
1 sue 0 date 2020-1-20
2 sue 0 class biology
3 sue 1 date 2021-4-18
4 sue 1 class english
5 joe 0 date 2020-2-20
6 joe 0 class physics
7 tom 0 date 2020-3-20
8 tom 0 class chemistry
9 tom 1 date 2021-6-18
10 tom 1 class drama
is because it is capturing one or more digits (\\d
) before the _
and then the rest (.
) as the column prefix, suffix, thus we get the 0, 1, from 'x0', 'x1' in 'date', the suffix part in 'class' and as we didn't specify the .value
, the default column value
will return with the values of the columns
CodePudding user response:
- We can use
Base R
rbind
function by splittingdf
to list ofdata.frame
s with column namesname , data , class
lapply(1:((ncol(df)-1)/2) , \(i) df[c(1,2*i,2*i 1)]) |>
lapply(\(x) {colnames(x) <- c("name" , "data" , "class") ; x}) |>
do.call(rbind , args = _) |> na.omit()
- Output
name data class
1 sue 2020-1-20 biology
2 joe 2020-2-20 physics
3 tom 2020-3-20 chemistry
4 sue 2021-4-18 english
6 tom 2021-6-18 drama