I have a column
in a dataset
in which I am wanting to remove the first two characters from the rows. Now, the thing is not all rows have these characters, so I don't want to change those rows and some rows are empty
.
How can I replace the characters in the rows that have them along with removing the rows that are empty and not effect the rows that don't need any modification?
Please note that the original dataset has 305
rows.
Sample Data
Date = c("AA 1/27/2020",
"BB 1/29/2020",
"CC 1/30/2020",
"DD 2/1/2020",
"2/9/2020",
"2/15/2020",
" ",
" ",
"EE 2/16/2020",
"VV 2/17/2020",
"2/18/2020",
"2/22/2020",
"2/25/2020",
"2/28/2020")
Date_Approved = c("1/28/2020",
"1/30/2020",
"1/31/2020",
"2/2/2020",
"2/10/2020",
"2/16/2020",
"2/17/2020",
"2/18/2020",
"2/17/2020",
"2/19/2020",
"2/20/2020",
"2/23/2020",
"2/26/2020",
"2/29/2020")
Code
library(tidyverse)
df = data.frame(Date, Date_Approved)
# Normally I would use
# Remove Acronyms from date.received column
df = Date %>%
mutate(Date_New= str_sub(Date[], 3, -1))
CodePudding user response:
If we want to substring and filter, an option is to use trimws
(trims out the characters by default whitespace at either end of the string - if we want only left or right, specify the which
by default is 'both') with whitespace
as regex i.e. matching zero or more upper case letters followed by zero or more spaces ([A-Z]*\\s*
), and then filter
the rows where the elements are not blank
library(dplyr)
df %>%
mutate(Date = trimws(Date, whitespace = "[A-Z]*\\s*")) %>%
filter(nzchar(Date))
-output
Date Date_Approved
1 1/27/2020 1/28/2020
2 1/29/2020 1/30/2020
3 1/30/2020 1/31/2020
4 2/1/2020 2/2/2020
5 2/9/2020 2/10/2020
6 2/15/2020 2/16/2020
7 2/16/2020 2/17/2020
8 2/17/2020 2/19/2020
9 2/18/2020 2/20/2020
10 2/22/2020 2/23/2020
11 2/25/2020 2/26/2020
12 2/28/2020 2/29/2020