I have a CSV
file that has multiple columns. Each ID
in the date
column has two rows. One with an acronym
and the second row with the date
. I want to remove all the even rows that have the acronyms
in the date
column and make another
column
out of these rows
. Moreover, I would also like to remove the blank rows in columns Date_Approved
, SR
and Permit
.
How can I do this in R
using the tidy
ways?
Sample Data
Date = c("SB",
"1/4/2021",
"HC/SB",
"1/5/2021",
"NC",
"1/6/2021",
"HC",
"1/13/2021")
Date_Approved = c(" ",
"1/4/2021",
" ",
"1/8/2021",
" ",
"1/12/2021",
" ",
"1/15/2021")
SR = c(" ",
"1A",
" ",
"1B",
" ",
"1C",
" ",
"1D")
Permit = c(" ",
"AAA",
" ",
"BBB",
" ",
"CCC",
" ",
"DDD")
Owner_Agent = c("Joe",
"Joey",
"Ross",
"Chandler",
"Monica",
"Rachel",
"Ed",
"Edd")
Address = c("1111 W. Broward Boulevard",
"Plantation, 33317",
"2222 N 23 Avenue",
"Hollywood, FL 33020",
"3333 Taylor Street",
"Hollywood, 33021",
"44444 NW 19th St",
"5555 Oak St")
df = data.frame(Date,
Date_Approved,
SR,
Permit,
Owner_Agent,
Address)
Snapshot
Code
library(tidyverse)
df = data.frame(Date,
Date_Approved,
SR,
Permit,
Owner_Agent,
Address)
# Remove even rows with Acronyms and make a column out of them
# Stuck
CodePudding user response:
This is very simple using the lag()
function, assuming that there is only one space between every "correct" row.
See below:
library(tidyverse)
data.frame(Date, Date_Approved, SR, Permit, `Owner/Agent`, Address) %>%
mutate(Acronym = lag(Date)) %>%
filter(Date_Approved != " ")
This is also assuming that the empty value in the other columns is a " " and not NA
If it is NA
replace the last line with filter(!is.na(Date_Approved))
Resulting data frame below:
Date Date_Approved Acronym .....
1 1/4/2021 1/4/2021 SB
2 1/5/2021 1/8/2021 HC/SB
3 1/6/2021 1/12/2021 NC
4 1/13/2021 1/15/2021 HC