I have a large excel spreadsheet where rows have an unequal number of colunmns. The name of columns repeat itself and these columns store data either in various formats (numeric, character, date etc). How can I reshapee this data into a long, tidy format?
Here what my dataframe looks like
df <- tibble(id = c("T1", "T2", "T3"), x = c(4:6), y = c("A", "B", "C"), x = c(7, 8, NA), y = c("A", "B", NA), x = c(NA, 4, NA), y= c(NA, "F", NA), .name_repair = "minimal")
df
I would like this type of ouput
ID | X | Y |
---|---|---|
T1 | 4 | A |
T1 | 7 | A |
T2 | 5 | B |
T2 | 8 | B |
T2 | 4 | F |
T3 | 6 | C |
Thank you very much for your help!
CodePudding user response:
You don't need to pivot here, just bind rows for each set of columns separately. You could manually do it just doing:
library(tidyverse)
bind_rows(
df[,1:3],
df[,c(1,4:5)],
df[,c(1,6:7)]
)
Then just filter out the rows with NA
values. If you have additional columns to do it, you can instead use purrr::map_dfr
on a numeric vector for column indexing to automatically select the correct columns and then bind them together. Then just use dplyr::filter(across(...)
to drop the rows with all NA
.
map_dfr(
seq(2,6,2),
~df[, c(1, .x, .x 1)]
) %>%
filter(across(c(x,y), ~ !is.na(.x))) %>%
arrange(id, y, x)
#> # A tibble: 6 × 3
#> id x y
#> <chr> <dbl> <chr>
#> 1 T1 4 A
#> 2 T1 7 A
#> 3 T2 5 B
#> 4 T2 8 B
#> 5 T2 4 F
#> 6 T3 6 C
I added the final dplyr::arrange()
call to match your output, you can adjust to how you actually want to order your data.