I have a dataframe arranged like below, columns are separated by months (enero, febrero, marzo, etc.) and every row corresponds to a value that I need to extract from the time series. Each pair of Month/Caudal varies in size depending of the amount of days of the month.
Also, based on the original dataset, each pair of Month/Caudal is separated by an empty column of NAs.
enero Caudal X febrero Caudal.1 X.1 marzo Caudal.2 X.2
1 1/1/2003 00:15 - NA 1/2/2003 00:15 - NA 1/3/2003 00:15 1.68 NA
2 1/1/2003 00:30 - NA 1/2/2003 00:30 - NA 1/3/2003 00:30 1.69 NA
3 1/1/2003 00:45 - NA 1/2/2003 00:45 - NA 1/3/2003 00:45 1.68 NA
4 1/1/2003 01:00 - NA 1/2/2003 01:00 - NA 1/3/2003 01:00 1.68 NA
5 1/1/2003 01:15 - NA 1/2/2003 01:15 - NA 1/3/2003 01:15 1.68 NA
6 1/1/2003 01:30 - NA 1/2/2003 01:30 - NA 1/3/2003 01:30 1.68 NA
My desired result is a time series with only two columns: Date and Caudal.
Date Caudal
1 1/1/2003 00:15 -
2 1/1/2003 00:30 -
3 1/1/2003 00:45 -
4 1/1/2003 01:00 -
5 1/1/2003 01:15 -
6 1/1/2003 01:30 -
7 1/2/2003 00:15 -
8 1/2/2003 00:30 -
9 1/2/2003 00:45 -
10 1/2/2003 01:00 -
11 1/2/2003 01:15 -
12 1/2/2003 01:30 -
13 1/3/2003 00:15 1.68
14 1/3/2003 00:30 1.69
15 1/3/2003 00:45 1.68
16 1/3/2003 01:00 1.68
17 1/3/2003 01:15 1.68
18 1/3/2003 01:30 1.68
I need to do this for 40 .txt files with the exact same format. How could I make this arrangement for it to concatenate all my files into one continuous df?
Sample data:
structure(list(enero = c("1/1/2003 00:15", "1/1/2003 00:30",
"1/1/2003 00:45", "1/1/2003 01:00", "1/1/2003 01:15", "1/1/2003 01:30"
), Caudal = c(" - ", " - ", " - ", " - ", " - ", " - "
), X = c(NA, NA, NA, NA, NA, NA), febrero = c("1/2/2003 00:15",
"1/2/2003 00:30", "1/2/2003 00:45", "1/2/2003 01:00", "1/2/2003 01:15",
"1/2/2003 01:30"), Caudal.1 = c(" - ", " - ", " - ", " - ",
" - ", " - "), X.1 = c(NA, NA, NA, NA, NA, NA), marzo = c("1/3/2003 00:15",
"1/3/2003 00:30", "1/3/2003 00:45", "1/3/2003 01:00", "1/3/2003 01:15",
"1/3/2003 01:30"), Caudal.2 = c(" 1.68 ", " 1.69 ", " 1.68 ",
" 1.68 ", " 1.68 ", " 1.68 "), X.2 = c(NA, NA, NA, NA, NA, NA
)), row.names = c(NA, 6L), class = "data.frame")
CodePudding user response:
We can first remove the empty columns, then it is easiest to rename the sets of columns (i.e., Date and Caudal). Then, we can pivot into long form using _
as the names separator.
library(tidyverse)
df %>%
select(-starts_with("X")) %>%
rename_with(~paste0("Date_", seq_along(.)),
-starts_with("Caudal")) %>%
rename_with(~paste0("Caudal_", seq_along(.)),
starts_with("Caudal")) %>%
pivot_longer(everything(),
names_to = c(".value", "time"),
names_sep = "_",
values_drop_na = TRUE) %>%
select(-time) %>%
arrange(Date)
Output
Date Caudal
<chr> <chr>
1 1/1/2003 00:15 " - "
2 1/1/2003 00:30 " - "
3 1/1/2003 00:45 " - "
4 1/1/2003 01:00 " - "
5 1/1/2003 01:15 " - "
6 1/1/2003 01:30 " - "
7 1/2/2003 00:15 " - "
8 1/2/2003 00:30 " - "
9 1/2/2003 00:45 " - "
10 1/2/2003 01:00 " - "
11 1/2/2003 01:15 " - "
12 1/2/2003 01:30 " - "
13 1/3/2003 00:15 " 1.68 "
14 1/3/2003 00:30 " 1.69 "
15 1/3/2003 00:45 " 1.68 "
16 1/3/2003 01:00 " 1.68 "
17 1/3/2003 01:15 " 1.68 "
18 1/3/2003 01:30 " 1.68 "