vector<-c("0.78953744969927742", "0.46557689748480685", "0.19740881059705201",
"9.7073839462985714E-2", "4.9051709747422199E-2", "0.1167420589551126",
"0.12679434401288708", "0.51370748568563795", "0.1925345466801483",
"0.48287163643195624", "4.211984449707315E-2", "blablablab",
"0.10553766233766231", "7.8187250996015922E-2", "0.20718689788053954",
"1.6450511945392491E-2", "0.51752961082910309", "0.10978571428571428",
"0.42610062893081763", "0.52208333333333334", "0.27569868995633184",
"7.7189939288811793E-2", "0.53982300884955747", "38.25% (blablabla) blablablablablablablablablablablabla","0.22324159021406728")
I have to transform all observations into numerical values. Those consisting only of words in NA. If there are words after an observation starting with a number; retrieve only the numbers. If there are percentages after the number, eliminate these percentages and keep only the number
CodePudding user response:
With readr
s parse_number
library(readr)
vec_num <- parse_number(vector)
Warning: 1 parsing failure.
row col expected actual
12 -- a number blablablab
vec_num
[1] 0.78953745 0.46557690 0.19740881 0.09707384 0.04905171 0.11674206
[7] 0.12679434 0.51370749 0.19253455 0.48287164 0.04211984 NA
[13] 0.10553766 0.07818725 0.20718690 0.01645051 0.51752961 0.10978571
[19] 0.42610063 0.52208333 0.27569869 0.07718994 0.53982301 38.25000000
[25] 0.22324159
attr(,"problems")
# A tibble: 1 × 4
row col expected actual
<int> <int> <chr> <chr>
1 12 NA a number blablablab
vec_num[24]
[1] 38.25
CodePudding user response:
Removing all the trash
> as.numeric(gsub("[^0-9\\.\\E\\-]","",vector))
[1] 0.78953745 0.46557690 0.19740881 0.09707384 0.04905171 0.11674206
[7] 0.12679434 0.51370749 0.19253455 0.48287164 0.04211984 NA
[13] 0.10553766 0.07818725 0.20718690 0.01645051 0.51752961 0.10978571
[19] 0.42610063 0.52208333 0.27569869 0.07718994 0.53982301 38.25000000
[25] 0.22324159
CodePudding user response:
You can use
as.numeric(stringr::str_extract(vector, '[\\d .\\-\\E] '))
CodePudding user response:
In R, you can use the as.numeric()
function to convert the elements of the vector to numerical values. However, this function will return NA
for elements that cannot be converted to a number. To filter out elements that consist only of words, you can use the is.na()
function to identify the NA
values and remove them from the vector.
Here's some code that demonstrates how to do this:
# convert elements of vector to numeric values
vector <- as.numeric(vector)
# remove elements that are NA
vector <- vector[!is.na(vector)]
# remove the trailing % and (blablabla) from the elements
vector <- gsub("% \\(.*\\)","",vector)
# remove the trailing % from the elements
vector <- gsub("%","",vector)
CodePudding user response:
Depends a bit on what kind of regexp exactly, but this will do:
\d (.\d (E[ -]\d )?)
Check and refine at regex101.com