Home > Enterprise >  Sub function in R not working as expected
Sub function in R not working as expected

Time:08-31

> mysentence <- "UK is Beautiful. UK is not the part of EU since 2016"
> gsub("[0-9]*", "", mysentence)
[1] "UK is Beautiful. UK is not the part of EU since "
> mysentence <- "UK is Beautiful. UK is not the part of EU since 2016"
> sub("[0-9]*", "", mysentence)
[1] "UK is Beautiful. UK is not the part of EU since 2016"
> mysentence <- "UK is Beautiful. UK is not the part of EU since 2016"
> sub("[0-9] ", "", mysentence)
[1] "UK is Beautiful. UK is not the part of EU since "

Here, while using gsub, I get the expected output, but when replaced with sub, the output still has 2016 in it, which should have been removed. On performing the same command with instead of *, the output is as expected. Why is the second example, i.e

sub("[0-9]*", "", mysentence)

not giving the same output as the other examples?

CodePudding user response:

The issue is that the * quantifier is 0 or more. So [0-9]* will match Nothing, as well as 1 or more digits. sub only replaces the first match, so sub("[0-9]*", "", mysentence) matches 1 Nothing, right at the beginning, replaces it with "" (also nothing), and is done.

We can see this more easily if we put a non-nothing replacement:

sub("[0-9]*", "HI", mysentence)
# [1] "HIUK is Beautiful. UK is not the part of EU since 2016"

gsub replaces every occurrence, and if we had a non-nothing replacement it gets pretty absurd, as it matches Nothing at every position:

gsub("[0-9]*", "HI", mysentence)
# [1] "HIUHIKHI HIiHIsHI HIBHIeHIaHIuHItHIiHIfHIuHIlHI.HI HIUHIKHI
#  HIiHIsHI HInHIoHItHI HItHIhHIeHI HIpHIaHIrHItHI HIoHIfHI HIEHIUHI
#  HIsHIiHInHIcHIeHI HI"

Using the quantifier, which is 1 or more, means that Nothing is not matched, and in this 1-match case sub as gsub behave identically:

gsub("[0-9] ", "HI", mysentence)
# [1] "UK is Beautiful. UK is not the part of EU since HI"

sub("[0-9] ", "HI", mysentence)
# [1] "UK is Beautiful. UK is not the part of EU since HI"
  •  Tags:  
  • r
  • Related