Home > Software design >  Replacing values based on conditions
Replacing values based on conditions

Time:12-17

I have a dataframe one of the cols is id and some of the values have been messed up during the recording of the data.

here's an example of the type of data

dput(df)
structure(list(Id = c("'110171786'", "'1103fbfd5'", "'0700edf6dc'", 
"'1103fad09'", "'01103fc9bb'", "''", "''", "0000fba2b'", "'01103fb169'", 
"'01103fd723'", "'01103f9c34'", "''", "''", "''", "'01103fc088'", 
"'01103fa6d8'", "'01103fb374'", "'01103fce8c'", "'01103f955d'", 
"'011016e633'", "'01103fa0da'", "''", "''", "''", "'01103fa4bd'", 
"'01103fb5c4'", "'01103fd0d7'", "'01103f9e2e'", "'01103fc657'", 
"'01103fd4d1'", "'011016e78e'", "'01103fbda2'", "'01103fbae7'", 
"'011016ee23'", "'01103fc847'", "'01103fbfbb'", "''", "'01103fb8bb'", 
"'01103fc853'", "''", "'01103fbcd5'", "'011016e690'", "'01103fb253'", 
"'01103fcb19'", "'01103fb446'", "'01103fa4fa'", "'011016cfbd'", 
"'01103fd250'", "'01103fac7d'", "'011016a86e'"), Weight = c(11.5, 
11.3, 11.3, 10.6, 10.6, 8.9, 18.7, 10.9, 11.3, 18.9, 18.9, 8.6, 
8.8, 8.4, 11, 10.4, 10.4, 10.8, 11.2, 11, 10.3, 9.5, 8.1, 9.3, 
10.2, 10.5, 11.2, 21.9, 18, 17.8, 11.3, 11.5, 10.8, 10.5, 12.8, 
10.9, 8.9, 10.3, 10.8, 8.9, 10.9, 9.9, 19, 11.6, 11.3, 11.7, 
10.9, 12.1, 11.3, 10.6)), class = "data.frame", row.names = c(NA, 
-50L))
> 

What I would like to do is search through the id column and replace the following mistakes

  • some of the values have a zero missing off the front, all of these would start with a 1 now instead which makes finding them easily. So basically anything that has a character length of 9 and starts with a 1 needs a 0 as the first character.
  • some of the values are less than 10 characters long, these need to be removed.
  • some have more than one leading 0 and these need to be removed.

CodePudding user response:

df$Id <- gsub("^('?)(1.{8}')$", "\\10\\2", df$Id)
df[ !grepl("^'?(00|'$)", df$Id),]
#              Id Weight
# 1  '0110171786'   11.5
# 2  '01103fbfd5'   11.3
# 3  '0700edf6dc'   11.3
# 4  '01103fad09'   10.6
# 5  '01103fc9bb'   10.6
# 9  '01103fb169'   11.3
# 10 '01103fd723'   18.9
# 11 '01103f9c34'   18.9
# 15 '01103fc088'   11.0
# 16 '01103fa6d8'   10.4
# 17 '01103fb374'   10.4
# 18 '01103fce8c'   10.8
# 19 '01103f955d'   11.2
# 20 '011016e633'   11.0
# 21 '01103fa0da'   10.3
# 25 '01103fa4bd'   10.2
# 26 '01103fb5c4'   10.5
# 27 '01103fd0d7'   11.2
# 28 '01103f9e2e'   21.9
# 29 '01103fc657'   18.0
# 30 '01103fd4d1'   17.8
# 31 '011016e78e'   11.3
# 32 '01103fbda2'   11.5
# 33 '01103fbae7'   10.8
# 34 '011016ee23'   10.5
# 35 '01103fc847'   12.8
# 36 '01103fbfbb'   10.9
# 38 '01103fb8bb'   10.3
# 39 '01103fc853'   10.8
# 41 '01103fbcd5'   10.9
# 42 '011016e690'    9.9
# 43 '01103fb253'   19.0
# 44 '01103fcb19'   11.6
# 45 '01103fb446'   11.3
# 46 '01103fa4fa'   11.7
# 47 '011016cfbd'   10.9
# 48 '01103fd250'   12.1
# 49 '01103fac7d'   11.3
# 50 '011016a86e'   10.6
  • Related