I have a simple problem in that I have a very long data frame which reports 0 as a char "nothing" in the data frame column. How would I replace all of these to a numeric 0. A sample data frame is below
Group | Candy |
---|---|
A | 5 |
B | nothing |
And this is what I want to change it into
Group | Candy |
---|---|
A | 5 |
B | 0 |
Keeping in mind my actual dataset is 100s of rows long.
My own attempt was to use is.na but apparently it only works for NA and can convert those into zeros with ease but wasn't sure if there's a solution for actual character datatypes.
Thanks
CodePudding user response:
The best way is to read the data in right, not with "nothing"
for missing values. This can be done with argument na.strings
of functions read.table
or read.csv
. Then change the NA
's to zero.
The following function is probably slow for large data.frames but replaces the "nothing"
values by zeros.
nothing_zero <- function(x){
tc <- textConnection("nothing", "w")
sink(tc) # divert output to tc connection
print(x) # print in string "nothing" instead of console
sink() # set the output back to console
close(tc) # close connection
tc <- textConnection(nothing, "r")
y <- read.table(tc, na.strings = "nothing", header = TRUE)
close(tc) # close connection
y[is.na(y)] <- 0
y
}
nothing_zero(df1)
# Group Candy
#1 A 5
#2 B 0
The main advantage is to read numeric data as numeric.
str(nothing_zero(df1))
#'data.frame': 2 obs. of 2 variables:
# $ Group: chr "A" "B"
# $ Candy: num 5 0
Data
df1 <- read.table(text = "
Group Candy
A 5
B nothing", header = TRUE)
CodePudding user response:
sapply(df,function(x) {x <- gsub("nothing",0,x)})
Output
a
[1,] "0"
[2,] "5"
[3,] "6"
[4,] "0"
Data
df <- structure(list(a = c("nothing", "5", "6", "nothing")),
class = "data.frame",
row.names = c(NA,-4L))
Another option
df[] <- lapply(df, gsub, pattern = "nothing", replacement = "0", fixed = TRUE)
If you are only wanting to apply to one column
library(tidyverse)
df$a <- str_replace(df$a,"nothing","0")
Or applying to one column in base R
df$a <- gsub("nothing","0",df$a)