I have this data frame which ended up all as characters. I need to convert the Date column to a date format and the rest as numeric.
> df <- data.frame(Date = c("1996-01-01", "1996-01-05", "1996-01-29"),
SD = c("11", "12", "13"),
SF = c("624", "625", "626"),
LA = c("1", "2", "3"),
IR = c("107", "108", "109"))
> df
Date SD SF LA IR
1 1996-01-01 11 624 1 107
2 1996-01-05 12 625 2 108
3 1996-01-29 13 626 3 109
> str(df)
'data.frame': 3 obs. of 5 variables:
$ Date: chr "1996-01-01" "1996-01-05" "1996-01-29"
$ SD : chr "11" "12" "13"
$ SF : chr "624" "625" "626"
$ LA : chr "1" "2" "3"
$ IR : chr "107" "108" "109"
Tried this to convert only columns 2:5 but ended with Date as num and coerced to "NA".
> df$Date <- as.Date(df$Date)
> df2 <- df
> columns <- c(1, 2:5)
> df2[ , columns] <- apply(df[ , columns], 2, function(x) as.numeric(x))
Warning message:
In FUN(newX[, i], ...) : NAs introduced by coercion
> df2
Date SD SF LA IR
1 NA 11 624 1 107
2 NA 12 625 2 108
3 NA 13 626 3 109
> str(df2)
'data.frame': 3 obs. of 5 variables:
$ Date: num NA NA NA
$ SD : num 11 12 13
$ SF : num 624 625 626
$ LA : num 1 2 3
$ IR : num 107 108 109
Any ideas where I got it wrong or any ideas how I can do this better? Thanks in advance.
CodePudding user response:
For this I would suggest using type.convert()
on the whole data.frame, and then use as.Date()
on the Date
column.
Use the as.is = TRUE
argument to ensure strings (your dates) are not converted to factors.
df <- data.frame(
Date = c("1996-01-01", "1996-01-05", "1996-01-29"),
SD = c("11", "12", "13"),
SF = c("624", "625", "626"),
LA = c("1", "2", "3"),
IR = c("107", "108", "109")
)
str(df)
#> 'data.frame': 3 obs. of 5 variables:
#> $ Date: chr "1996-01-01" "1996-01-05" "1996-01-29"
#> $ SD : chr "11" "12" "13"
#> $ SF : chr "624" "625" "626"
#> $ LA : chr "1" "2" "3"
#> $ IR : chr "107" "108" "109"
df2 <- type.convert(df, as.is = TRUE)
str(df2)
#> 'data.frame': 3 obs. of 5 variables:
#> $ Date: chr "1996-01-01" "1996-01-05" "1996-01-29"
#> $ SD : int 11 12 13
#> $ SF : int 624 625 626
#> $ LA : int 1 2 3
#> $ IR : int 107 108 109
df2$Date <- as.Date(df2$Date)
str(df2)
#> 'data.frame': 3 obs. of 5 variables:
#> $ Date: Date, format: "1996-01-01" "1996-01-05" ...
#> $ SD : int 11 12 13
#> $ SF : int 624 625 626
#> $ LA : int 1 2 3
#> $ IR : int 107 108 109
CodePudding user response:
Currently your logic is including all columns:
columns <- c(1, 2:5) # same as c(1:5)
But you want to exclude the first column of dates, so use this version:
columns <- c(2:5)
df2[ , columns] <- apply(df[ , columns], 2, function(x) as.numeric(x))