Home > database >  R Remove string characters from a range of rows in a column
R Remove string characters from a range of rows in a column

Time:04-26

I have a column in a dataset in which I am wanting to remove the first two characters from the rows. Now, the thing is not all rows have these characters, so I don't want to change those rows and some rows are empty.

How can I replace the characters in the rows that have them along with removing the rows that are empty and not effect the rows that don't need any modification?

Please note that the original dataset has 305 rows.

Sample Data

    Date = c("AA 1/27/2020",
             "BB 1/29/2020",
             "CC 1/30/2020",
             "DD 2/1/2020",
             "2/9/2020",
             "2/15/2020",
             " ",
             " ",
             "EE 2/16/2020",
             "VV 2/17/2020",
             "2/18/2020",
             "2/22/2020",
             "2/25/2020",
             "2/28/2020") 

Date_Approved = c("1/28/2020",
             "1/30/2020",
             "1/31/2020",
             "2/2/2020",
             "2/10/2020",
             "2/16/2020",
             "2/17/2020",
             "2/18/2020",
             "2/17/2020",
             "2/19/2020",
             "2/20/2020",
             "2/23/2020",
             "2/26/2020",
             "2/29/2020") 

Code

    library(tidyverse)
    
   df = data.frame(Date, Date_Approved)

    # Normally I would use
    # Remove Acronyms from date.received column
    df = Date %>% 
             mutate(Date_New= str_sub(Date[], 3, -1))
          

CodePudding user response:

If we want to substring and filter, an option is to use trimws (trims out the characters by default whitespace at either end of the string - if we want only left or right, specify the which by default is 'both') with whitespace as regex i.e. matching zero or more upper case letters followed by zero or more spaces ([A-Z]*\\s*), and then filter the rows where the elements are not blank

library(dplyr)
df %>% 
  mutate(Date = trimws(Date, whitespace = "[A-Z]*\\s*")) %>% 
  filter(nzchar(Date))

-output

       Date Date_Approved
1  1/27/2020     1/28/2020
2  1/29/2020     1/30/2020
3  1/30/2020     1/31/2020
4   2/1/2020      2/2/2020
5   2/9/2020     2/10/2020
6  2/15/2020     2/16/2020
7  2/16/2020     2/17/2020
8  2/17/2020     2/19/2020
9  2/18/2020     2/20/2020
10 2/22/2020     2/23/2020
11 2/25/2020     2/26/2020
12 2/28/2020     2/29/2020
  • Related