Home > OS >  Merge multiple columns with X repeating attributes into X columns
Merge multiple columns with X repeating attributes into X columns

Time:04-07

I have a dataframe arranged like below, columns are separated by months (enero, febrero, marzo, etc.) and every row corresponds to a value that I need to extract from the time series. Each pair of Month/Caudal varies in size depending of the amount of days of the month.

Also, based on the original dataset, each pair of Month/Caudal is separated by an empty column of NAs.

           enero Caudal  X        febrero Caudal.1 X.1          marzo Caudal.2 X.2
1 1/1/2003 00:15   -    NA 1/2/2003 00:15     -     NA 1/3/2003 00:15    1.68   NA
2 1/1/2003 00:30   -    NA 1/2/2003 00:30     -     NA 1/3/2003 00:30    1.69   NA
3 1/1/2003 00:45   -    NA 1/2/2003 00:45     -     NA 1/3/2003 00:45    1.68   NA
4 1/1/2003 01:00   -    NA 1/2/2003 01:00     -     NA 1/3/2003 01:00    1.68   NA
5 1/1/2003 01:15   -    NA 1/2/2003 01:15     -     NA 1/3/2003 01:15    1.68   NA
6 1/1/2003 01:30   -    NA 1/2/2003 01:30     -     NA 1/3/2003 01:30    1.68   NA

My desired result is a time series with only two columns: Date and Caudal.

       Date         Caudal
1 1/1/2003  00:15     -   
2 1/1/2003  00:30     -   
3 1/1/2003  00:45     -   
4 1/1/2003  01:00     -   
5 1/1/2003  01:15     -   
6 1/1/2003  01:30     - 
7 1/2/2003  00:15     -   
8 1/2/2003  00:30     -   
9 1/2/2003  00:45     -   
10 1/2/2003 01:00     -   
11 1/2/2003 01:15     -   
12 1/2/2003 01:30     -   
13 1/3/2003 00:15    1.68 
14 1/3/2003 00:30    1.69 
15 1/3/2003 00:45    1.68 
16 1/3/2003 01:00    1.68 
17 1/3/2003 01:15    1.68 
18 1/3/2003 01:30    1.68 

I need to do this for 40 .txt files with the exact same format. How could I make this arrangement for it to concatenate all my files into one continuous df?

Sample data:

structure(list(enero = c("1/1/2003 00:15", "1/1/2003 00:30", 
"1/1/2003 00:45", "1/1/2003 01:00", "1/1/2003 01:15", "1/1/2003 01:30"
), Caudal = c(" -   ", " -   ", " -   ", " -   ", " -   ", " -   "
), X = c(NA, NA, NA, NA, NA, NA), febrero = c("1/2/2003 00:15", 
"1/2/2003 00:30", "1/2/2003 00:45", "1/2/2003 01:00", "1/2/2003 01:15", 
"1/2/2003 01:30"), Caudal.1 = c(" -   ", " -   ", " -   ", " -   ", 
" -   ", " -   "), X.1 = c(NA, NA, NA, NA, NA, NA), marzo = c("1/3/2003 00:15", 
"1/3/2003 00:30", "1/3/2003 00:45", "1/3/2003 01:00", "1/3/2003 01:15", 
"1/3/2003 01:30"), Caudal.2 = c(" 1.68 ", " 1.69 ", " 1.68 ", 
" 1.68 ", " 1.68 ", " 1.68 "), X.2 = c(NA, NA, NA, NA, NA, NA
)), row.names = c(NA, 6L), class = "data.frame")

CodePudding user response:

We can first remove the empty columns, then it is easiest to rename the sets of columns (i.e., Date and Caudal). Then, we can pivot into long form using _ as the names separator.

library(tidyverse)

df %>%
  select(-starts_with("X")) %>%
  rename_with(~paste0("Date_", seq_along(.)),
              -starts_with("Caudal")) %>%
  rename_with(~paste0("Caudal_", seq_along(.)),
              starts_with("Caudal")) %>%
  pivot_longer(everything(),
               names_to = c(".value", "time"),
               names_sep = "_",
               values_drop_na = TRUE) %>% 
  select(-time) %>% 
  arrange(Date)

Output

   Date           Caudal  
   <chr>          <chr>   
 1 1/1/2003 00:15 " -   " 
 2 1/1/2003 00:30 " -   " 
 3 1/1/2003 00:45 " -   " 
 4 1/1/2003 01:00 " -   " 
 5 1/1/2003 01:15 " -   " 
 6 1/1/2003 01:30 " -   " 
 7 1/2/2003 00:15 " -   " 
 8 1/2/2003 00:30 " -   " 
 9 1/2/2003 00:45 " -   " 
10 1/2/2003 01:00 " -   " 
11 1/2/2003 01:15 " -   " 
12 1/2/2003 01:30 " -   " 
13 1/3/2003 00:15 " 1.68 "
14 1/3/2003 00:30 " 1.69 "
15 1/3/2003 00:45 " 1.68 "
16 1/3/2003 01:00 " 1.68 "
17 1/3/2003 01:15 " 1.68 "
18 1/3/2003 01:30 " 1.68 "
  • Related