In R, how to match string that has irregular space-CodePudding

We are loading data from an excel file. Have the following issue going on:

> dput(names_col[54])
" Calvin Ridley SUS"
> dput(substr(names_col[54], 15, 18))
" SUS"
> substr(names_col[54], 15, 18) == " SUS"
[1] FALSE

> zed = " Calvin Ridley SUS"
> substr(zed, 15, 18) == " SUS"
[1] TRUE

Our hypothesis is that the space in the first code block is something along the lines of an irregular space, due to the loading from excel. How can we fix this so we can match the substring in the first code block?

CodePudding user response：

It seems your string contains a "non-breaking space".

You can match using the unicode escape string:

target <- "\u00a0Calvin Ridley\u00a0SUS"
grepl("\u00a0SUS",target)
[1] TRUE

As user2554330 mentions in the comments, you can also use the raw hex codes, but it's more convoluted:

grepl(paste0(rawToChar(as.raw(c(0xc2, 0xa0))),"SUS"),target)
[1] TRUE