Home > database >  R Combine two rows into one for multiple columns in a dataframe
R Combine two rows into one for multiple columns in a dataframe

Time:04-26

I have a dataset that I am trying to tidy up using different approaches. For step one I want to merge every two rows in each of the columns to a single row as shown in the desired output.

How can I do this in R by the tidy way?

Sample Data

Date = c("SB",
         "1/4/2021", 
         "HC/SB",
         "1/5/2021",
         "NC",
         "1/6/2021",
         "HC",
         "1/13/2021")

Date_Approved = c(" ",
                  "1/4/2021",
                  " ",
                  "1/8/2021",
                  " ",
                  "1/12/2021",
                  " ",
                  "1/15/2021")

SR = c(" ",
       "1A",
       " ",
       "1B",
       " ",
       "1C",
       " ",
       "1D")

Permit = c(" ",
       "AAA",
       " ",
       "BBB",
       " ",
       "CCC",
       " ",
       "DDD") 

Owner_Agent = c("Joe",
                "Joey",
                "Ross",
                "Chandler",
                "Monica",
                "Rachel",
                "Ed",
                "Edd",
                "Eddy")

Address = c("1111 W. Broward Boulevard",
            "Plantation, 33333",
            "2222 N 23 Avenue",
            "Hollywood, FL 33322",
            "3333 Taylor Street",
            "Hollywood, 33311",
            "44444 NW 19th St",
            "5555 Oak St",
            "Pembroke Pines, 33300") 

The original data looks like this:

enter image description here

Desired Output

Date                 Date_Approved  SR       Permit     Owner_Agent
 SB    1/4/2021      1/4/2021      1A        AAA        Joe, Joey
 HC/SB 1/5/2021      1/8/2021      1B        BBB        Chandler, Monica
 NC    1/6/2021      1/12/2021     1C        CCC        Rachel, Ed
 HC    1/13/2021     1/15/2021     1D        DDD        Edd, Eddy
 Address
 1111 W. Broward Boulevard Plantation, 33333
 2222 N 23 Avenue Hollywood, FL 33322
 3333 Taylor Street Hollywood, 33311
 44444 NW 19th St Pembroke Pines, 33300

I have looked up this and this, but using group_by messes up the df.

Code

library(tidyverse)

df = data.frame(Date,
                Date_Approved,
                SR,
                Permit,
                Owner_Agent,
                Address)

# Tidy up the df

df = df %>% 

CodePudding user response:

You can try to create a row identifier, group by that id, and use summarize(across()) as below:

df %>% 
  mutate(id=rep(1:(n()/2), each=2)) %>% 
  group_by(id) %>% 
  summarize(across(Date:Address, ~trimws(paste0(.x, collapse=" "))))

Output:

# A tibble: 4 × 7
     id Date           Date_Approved SR    Permit Owner_Agent   Address                                    
  <int> <chr>          <chr>         <chr> <chr>  <chr>         <chr>                                      
1     1 SB 1/4/2021    1/4/2021      1A    AAA    Joe Joey      1111 W. Broward Boulevard Plantation, 33333
2     2 HC/SB 1/5/2021 1/8/2021      1B    BBB    Ross Chandler 2222 N 23 Avenue Hollywood, FL 33322       
3     3 NC 1/6/2021    1/12/2021     1C    CCC    Monica Rachel 3333 Taylor Street Hollywood, 33311        
4     4 HC 1/13/2021   1/15/2021     1D    DDD    Ed Eddy       44444 NW 19th St Pembroke Pines, 33300  

Input:

structure(list(Date = c("SB", "1/4/2021", "HC/SB", "1/5/2021", 
"NC", "1/6/2021", "HC", "1/13/2021"), Date_Approved = c(" ", 
"1/4/2021", " ", "1/8/2021", " ", "1/12/2021", " ", "1/15/2021"
), SR = c(" ", "1A", " ", "1B", " ", "1C", " ", "1D"), Permit = c(" ", 
"AAA", " ", "BBB", " ", "CCC", " ", "DDD"), Owner_Agent = c("Joe", 
"Joey", "Ross", "Chandler", "Monica", "Rachel", "Ed", "Eddy"), 
    Address = c("1111 W. Broward Boulevard", "Plantation, 33333", 
    "2222 N 23 Avenue", "Hollywood, FL 33322", "3333 Taylor Street", 
    "Hollywood, 33311", "44444 NW 19th St", "Pembroke Pines, 33300"
    )), class = "data.frame", row.names = c(NA, -8L))
  • Related