I want to retrieve only numeric values-CodePudding

vector<-c("0.78953744969927742", "0.46557689748480685", "0.19740881059705201", 
  "9.7073839462985714E-2", "4.9051709747422199E-2", "0.1167420589551126", 
  "0.12679434401288708", "0.51370748568563795", "0.1925345466801483", 
  "0.48287163643195624", "4.211984449707315E-2", "blablablab", 
  "0.10553766233766231", "7.8187250996015922E-2", "0.20718689788053954", 
  "1.6450511945392491E-2", "0.51752961082910309", "0.10978571428571428", 
  "0.42610062893081763", "0.52208333333333334", "0.27569868995633184", 
  "7.7189939288811793E-2", "0.53982300884955747", "38.25% (blablabla) blablablablablablablablablablablabla","0.22324159021406728")

I have to transform all observations into numerical values. Those consisting only of words in NA. If there are words after an observation starting with a number; retrieve only the numbers. If there are percentages after the number, eliminate these percentages and keep only the number

CodePudding user response：

With readrs parse_number

library(readr)

vec_num <- parse_number(vector)
Warning: 1 parsing failure.
row col expected     actual
 12  -- a number blablablab

vec_num
 [1]  0.78953745  0.46557690  0.19740881  0.09707384  0.04905171  0.11674206
 [7]  0.12679434  0.51370749  0.19253455  0.48287164  0.04211984          NA
[13]  0.10553766  0.07818725  0.20718690  0.01645051  0.51752961  0.10978571
[19]  0.42610063  0.52208333  0.27569869  0.07718994  0.53982301 38.25000000
[25]  0.22324159
attr(,"problems")
# A tibble: 1 × 4
    row   col expected actual    
  <int> <int> <chr>    <chr>     
1    12    NA a number blablablab

vec_num[24]
[1] 38.25

CodePudding user response：

Removing all the trash

> as.numeric(gsub("[^0-9\\.\\E\\-]","",vector))
 [1]  0.78953745  0.46557690  0.19740881  0.09707384  0.04905171  0.11674206
 [7]  0.12679434  0.51370749  0.19253455  0.48287164  0.04211984          NA
[13]  0.10553766  0.07818725  0.20718690  0.01645051  0.51752961  0.10978571
[19]  0.42610063  0.52208333  0.27569869  0.07718994  0.53982301 38.25000000
[25]  0.22324159

CodePudding user response：

You can use

as.numeric(stringr::str_extract(vector, '[\\d .\\-\\E] '))

CodePudding user response：

In R, you can use the as.numeric() function to convert the elements of the vector to numerical values. However, this function will return NA for elements that cannot be converted to a number. To filter out elements that consist only of words, you can use the is.na() function to identify the NA values and remove them from the vector.

Here's some code that demonstrates how to do this:

# convert elements of vector to numeric values
vector <- as.numeric(vector)

# remove elements that are NA
vector <- vector[!is.na(vector)]

# remove the trailing % and (blablabla) from the elements
vector <- gsub("% \\(.*\\)","",vector)

# remove the trailing % from the elements
vector <- gsub("%","",vector)

CodePudding user response：

Depends a bit on what kind of regexp exactly, but this will do:

\d (.\d (E[ -]\d )?)

Check and refine at regex101.com