Remove unnecessary rows in four columns and make a new column from one of them in R-CodePudding

I have a CSV file that has multiple columns. Each ID in the date column has two rows. One with an acronym and the second row with the date. I want to remove all the even rows that have the acronyms in the date column and make another column out of these rows. Moreover, I would also like to remove the blank rows in columns Date_Approved, SR and Permit.

How can I do this in R using the tidy ways?

Sample Data

    Date = c("SB",
         "1/4/2021", 
         "HC/SB",
         "1/5/2021",
         "NC",
         "1/6/2021",
         "HC",
         "1/13/2021")

Date_Approved = c(" ",
                  "1/4/2021",
                  " ",
                  "1/8/2021",
                  " ",
                  "1/12/2021",
                  " ",
                  "1/15/2021")

SR = c(" ",
       "1A",
       " ",
       "1B",
       " ",
       "1C",
       " ",
       "1D")

Permit = c(" ",
       "AAA",
       " ",
       "BBB",
       " ",
       "CCC",
       " ",
       "DDD") 

Owner_Agent = c("Joe",
                "Joey",
                "Ross",
                "Chandler",
                "Monica",
                "Rachel",
                "Ed",
                "Edd")

Address = c("1111 W. Broward Boulevard",
            "Plantation, 33317",
            "2222 N 23 Avenue",
            "Hollywood, FL 33020",
            "3333 Taylor Street",
            "Hollywood, 33021",
            "44444 NW 19th St",
            "5555 Oak St")     

df = data.frame(Date,
                Date_Approved,
                SR,
                Permit,
                Owner_Agent,
                Address)

Snapshot

Code

library(tidyverse)

df = data.frame(Date,
                Date_Approved,
                SR,
                Permit,
                Owner_Agent,
                Address)

# Remove even rows with Acronyms and make a column out of them   
# Stuck

CodePudding user response：

This is very simple using the lag() function, assuming that there is only one space between every "correct" row.

See below:

library(tidyverse)

data.frame(Date, Date_Approved, SR, Permit, `Owner/Agent`, Address) %>% 
  mutate(Acronym = lag(Date)) %>% 
  filter(Date_Approved != " ")

This is also assuming that the empty value in the other columns is a " " and not NA

If it is NA replace the last line with filter(!is.na(Date_Approved))

Resulting data frame below:

       Date Date_Approved Acronym .....
1  1/4/2021      1/4/2021      SB
2  1/5/2021      1/8/2021   HC/SB
3  1/6/2021     1/12/2021      NC
4 1/13/2021     1/15/2021      HC