Home > Blockchain >  Convert certain columns from char to numeric in R
Convert certain columns from char to numeric in R

Time:09-30

I have this data frame which ended up all as characters. I need to convert the Date column to a date format and the rest as numeric.

> df <- data.frame(Date = c("1996-01-01", "1996-01-05", "1996-01-29"),
                   SD = c("11", "12", "13"),
                   SF = c("624", "625", "626"),
                   LA = c("1", "2", "3"),
                   IR = c("107", "108", "109"))
> df
        Date SD  SF LA  IR
1 1996-01-01 11 624  1 107
2 1996-01-05 12 625  2 108
3 1996-01-29 13 626  3 109
> str(df)
'data.frame':   3 obs. of  5 variables:
 $ Date: chr  "1996-01-01" "1996-01-05" "1996-01-29"
 $ SD  : chr  "11" "12" "13"
 $ SF  : chr  "624" "625" "626"
 $ LA  : chr  "1" "2" "3"
 $ IR  : chr  "107" "108" "109"

Tried this to convert only columns 2:5 but ended with Date as num and coerced to "NA".

> df$Date <- as.Date(df$Date)
> df2 <- df
> columns <- c(1, 2:5)
> df2[ , columns] <- apply(df[ , columns], 2, function(x) as.numeric(x))
Warning message:
In FUN(newX[, i], ...) : NAs introduced by coercion
> df2
  Date SD  SF LA  IR
1   NA 11 624  1 107
2   NA 12 625  2 108
3   NA 13 626  3 109
> str(df2)
'data.frame':   3 obs. of  5 variables:
 $ Date: num  NA NA NA
 $ SD  : num  11 12 13
 $ SF  : num  624 625 626
 $ LA  : num  1 2 3
 $ IR  : num  107 108 109

Any ideas where I got it wrong or any ideas how I can do this better? Thanks in advance.

CodePudding user response:

For this I would suggest using type.convert() on the whole data.frame, and then use as.Date() on the Date column.

Use the as.is = TRUE argument to ensure strings (your dates) are not converted to factors.

df <- data.frame(
  Date = c("1996-01-01", "1996-01-05", "1996-01-29"),
  SD = c("11", "12", "13"),
  SF = c("624", "625", "626"),
  LA = c("1", "2", "3"),
  IR = c("107", "108", "109")
)
str(df)
#> 'data.frame':    3 obs. of  5 variables:
#>  $ Date: chr  "1996-01-01" "1996-01-05" "1996-01-29"
#>  $ SD  : chr  "11" "12" "13"
#>  $ SF  : chr  "624" "625" "626"
#>  $ LA  : chr  "1" "2" "3"
#>  $ IR  : chr  "107" "108" "109"

df2 <- type.convert(df, as.is = TRUE)
str(df2)
#> 'data.frame':    3 obs. of  5 variables:
#>  $ Date: chr  "1996-01-01" "1996-01-05" "1996-01-29"
#>  $ SD  : int  11 12 13
#>  $ SF  : int  624 625 626
#>  $ LA  : int  1 2 3
#>  $ IR  : int  107 108 109

df2$Date <- as.Date(df2$Date)
str(df2)
#> 'data.frame':    3 obs. of  5 variables:
#>  $ Date: Date, format: "1996-01-01" "1996-01-05" ...
#>  $ SD  : int  11 12 13
#>  $ SF  : int  624 625 626
#>  $ LA  : int  1 2 3
#>  $ IR  : int  107 108 109

CodePudding user response:

Currently your logic is including all columns:

columns <- c(1, 2:5)  # same as c(1:5)

But you want to exclude the first column of dates, so use this version:

columns <- c(2:5)
df2[ , columns] <- apply(df[ , columns], 2, function(x) as.numeric(x))
  •  Tags:  
  • r
  • Related