I have small data set with one column in char format. Below you can see data.
test<-structure(list(txtVALUE = c("<5", "<5", "8", "<5", "9", "12",
"45", "5", "<5", "<5", "11,478", "117", "1,526", "1,642", "3,920",
"98", "8", "<5", "<5", "<5", "<5")), row.names = c(NA, -21L), class = c("tbl_df",
"tbl", "data.frame"))
Now I want to convert this data from chr
format in numeric.I tried with this command below
test$txtVALUE<-as.numeric(test$txtVALUE)
Warning message:
NAs introduced by coercion
But this command does not convert data as I expected. Namely, numbers such as "1,526", "1,642", and "3,920" are converted in NAN values, although they are numbers.
So can anybody help me how to convert this data from char to numeric in the proper way without NaN for numbers?
CodePudding user response:
Your data appears to be counts so I have taken a slight liberty of assuming that it's always whole numbers. If it is not do not use this approach as it will delete decimal points as well.
However, if it is, as you want "<5"
to be NA
, you can use gsub()
to replace all values that contain "<"
with a blank string, and also delete anything which is not a number (e.g. commas in "11,478"
).
Of course gsub()
produces a character vector so wrap this in as.integer()
.
as.integer(gsub("\\D|<. ", "", test$txtVALUE))
# [1] NA NA 8 NA 9 12 45 5 NA NA 11478 117 1526 1642 3920
# [16] 98 8 NA NA NA NA