Home > Software design >  read in csv file and 2007.1 (2007 Jan) changed to 2007.10 (2007 Oct) and as.data does not work
read in csv file and 2007.1 (2007 Jan) changed to 2007.10 (2007 Oct) and as.data does not work

Time:10-30

The first pic what it looks like on csv file: enter image description here

The second pic is what it reads in R: enter image description here

The first date is changed from "2007.1" to "2007.10". If the month is January it will be changed to October when I read in the csv file. I am trying to make this variable into a data frame. Can someone help?

Then my code looks like this:

df <- read.csv("PS_09_orangutans.csv")
n.time <- paste(df$date, ".01", sep = '')         # add day 
n.time
n.time <- as.Date(n.time)            # make Date class

The error is: Error in charToDate(x) : character string is not in a standard unambiguous format

CodePudding user response:

You can use the anydate() function of the anytime package:

 > anytime::anydate(c("2007.10", "2007.11", "2007.12"))
 [1] "2007-10-01" "2007-11-01" "2007-12-01" 
 > 

You will have to turn your numeric input into character first:

 > anytime::anydate(as.character(c(2007.10, 2007.11, 2007.12)))  
 [1] "2007-10-01" "2007-11-01" "2007-12-01" 
 > 

The anydate() function can convert from numeric but it assumes a numeric representation (i.e.

> as.numeric(Sys.Date())   
[1] 19294                  
>  

Generally speaking your input is of course incomplete as a date needs a day besides a month and a year.

CodePudding user response:

df_raw <- structure(list(date = c("2007.1", "2007.10")), 
                    row.names = c(NA, -2L), class = "data.frame")
df_raw
#     date
#1  2007.1
#2 2007.10
write.csv(df_raw, "df_raw.csv")

read.csv("df_raw.csv") # uh oh
#  X   date
#1 1 2007.1
#2 2 2007.1

read.csv("df_raw.csv", colClasses = "character") # better
#  X    date
#1 1  2007.1
#2 2 2007.10


# with conversion suggested by @zephryl
df_loaded <- read.csv("df_raw.csv", colClasses = "character") 
df_loaded$date = lubridate::ym(df_loaded$date)
df_loaded
#  X       date
#1 1 2007-01-01
#2 2 2007-10-01

CodePudding user response:

The root of the issue is the date column is being read in as numeric, and 2007.1 becomes 2007.10 because some values have two places after the decimal. There is no way to distinguish these values from “true Octobers” — e.g., 2007.10 in the original data file — once the data have been read in. Instead, you have to read them in as character from the get-go. This example does so using the col_types argument to readr::read_csv(). I then converts to date using lubridate::ym(), since the values are in year-month format.

library(readr)
library(lubridate)

# stand in for raw data file

df_raw <- "date,val
2007.1,1
2007.11,2
2007.12,3
2008.01,4
2008.02,5
2008.10,6
"

df1 <- read_csv(
  df_raw,
  col_types = cols(date = col_character())
)

ym(df1$date)
# [1] "2007-01-01" "2007-11-01" "2007-12-01" "2008-01-01" "2008-02-01"
# [6] "2008-10-01"

You can do the same with the colClasses argument to utils::read.csv():

df1 <- read.csv(
  "filename.csv", 
  colClasses = c(date = "character")
)
  •  Tags:  
  • r
  • Related