Merge multiple columns with X repeating attributes into X columns-CodePudding

I have a dataframe arranged like below, columns are separated by months (enero, febrero, marzo, etc.) and every row corresponds to a value that I need to extract from the time series. Each pair of Month/Caudal varies in size depending of the amount of days of the month.

Also, based on the original dataset, each pair of Month/Caudal is separated by an empty column of NAs.

           enero Caudal  X        febrero Caudal.1 X.1          marzo Caudal.2 X.2
1 1/1/2003 00:15   -    NA 1/2/2003 00:15     -     NA 1/3/2003 00:15    1.68   NA
2 1/1/2003 00:30   -    NA 1/2/2003 00:30     -     NA 1/3/2003 00:30    1.69   NA
3 1/1/2003 00:45   -    NA 1/2/2003 00:45     -     NA 1/3/2003 00:45    1.68   NA
4 1/1/2003 01:00   -    NA 1/2/2003 01:00     -     NA 1/3/2003 01:00    1.68   NA
5 1/1/2003 01:15   -    NA 1/2/2003 01:15     -     NA 1/3/2003 01:15    1.68   NA
6 1/1/2003 01:30   -    NA 1/2/2003 01:30     -     NA 1/3/2003 01:30    1.68   NA

My desired result is a time series with only two columns: Date and Caudal.

       Date         Caudal
1 1/1/2003  00:15     -   
2 1/1/2003  00:30     -   
3 1/1/2003  00:45     -   
4 1/1/2003  01:00     -   
5 1/1/2003  01:15     -   
6 1/1/2003  01:30     - 
7 1/2/2003  00:15     -   
8 1/2/2003  00:30     -   
9 1/2/2003  00:45     -   
10 1/2/2003 01:00     -   
11 1/2/2003 01:15     -   
12 1/2/2003 01:30     -   
13 1/3/2003 00:15    1.68 
14 1/3/2003 00:30    1.69 
15 1/3/2003 00:45    1.68 
16 1/3/2003 01:00    1.68 
17 1/3/2003 01:15    1.68 
18 1/3/2003 01:30    1.68

I need to do this for 40 .txt files with the exact same format. How could I make this arrangement for it to concatenate all my files into one continuous df?

Sample data:

structure(list(enero = c("1/1/2003 00:15", "1/1/2003 00:30", 
"1/1/2003 00:45", "1/1/2003 01:00", "1/1/2003 01:15", "1/1/2003 01:30"
), Caudal = c(" -   ", " -   ", " -   ", " -   ", " -   ", " -   "
), X = c(NA, NA, NA, NA, NA, NA), febrero = c("1/2/2003 00:15", 
"1/2/2003 00:30", "1/2/2003 00:45", "1/2/2003 01:00", "1/2/2003 01:15", 
"1/2/2003 01:30"), Caudal.1 = c(" -   ", " -   ", " -   ", " -   ", 
" -   ", " -   "), X.1 = c(NA, NA, NA, NA, NA, NA), marzo = c("1/3/2003 00:15", 
"1/3/2003 00:30", "1/3/2003 00:45", "1/3/2003 01:00", "1/3/2003 01:15", 
"1/3/2003 01:30"), Caudal.2 = c(" 1.68 ", " 1.69 ", " 1.68 ", 
" 1.68 ", " 1.68 ", " 1.68 "), X.2 = c(NA, NA, NA, NA, NA, NA
)), row.names = c(NA, 6L), class = "data.frame")

CodePudding user response：

We can first remove the empty columns, then it is easiest to rename the sets of columns (i.e., Date and Caudal). Then, we can pivot into long form using _ as the names separator.

library(tidyverse)

df %>%
  select(-starts_with("X")) %>%
  rename_with(~paste0("Date_", seq_along(.)),
              -starts_with("Caudal")) %>%
  rename_with(~paste0("Caudal_", seq_along(.)),
              starts_with("Caudal")) %>%
  pivot_longer(everything(),
               names_to = c(".value", "time"),
               names_sep = "_",
               values_drop_na = TRUE) %>% 
  select(-time) %>% 
  arrange(Date)

Output

   Date           Caudal  
   <chr>          <chr>   
 1 1/1/2003 00:15 " -   " 
 2 1/1/2003 00:30 " -   " 
 3 1/1/2003 00:45 " -   " 
 4 1/1/2003 01:00 " -   " 
 5 1/1/2003 01:15 " -   " 
 6 1/1/2003 01:30 " -   " 
 7 1/2/2003 00:15 " -   " 
 8 1/2/2003 00:30 " -   " 
 9 1/2/2003 00:45 " -   " 
10 1/2/2003 01:00 " -   " 
11 1/2/2003 01:15 " -   " 
12 1/2/2003 01:30 " -   " 
13 1/3/2003 00:15 " 1.68 "
14 1/3/2003 00:30 " 1.69 "
15 1/3/2003 00:45 " 1.68 "
16 1/3/2003 01:00 " 1.68 "
17 1/3/2003 01:15 " 1.68 "
18 1/3/2003 01:30 " 1.68 "