Home > Net >  Writing R dataframe to an xlsx file is destroying date information
Writing R dataframe to an xlsx file is destroying date information

Time:12-30

Just getting started in R and am working with data from the NHC.

I'm trying to convert a dataframe to an xlsx file and when it converts, all dates before 1800 are turning into #### in excel. Here's my code:

hurricanes_3 <- read.csv("atlantic.csv")

hurricanes_3 <- transform(hurricanes_3, Date = as.Date(as.character(Date), "%Y%m%d"))

install.packages("openxlsx")
library("openxlsx")
hurricanes_cleaned <- write.xlsx(hurricanes_3,"Cleaned Hurricane Data.xlsx")

dput(head(hurricanes_3)) # as requested

structure(list(Name = c("            UNNAMED", "            UNNAMED", 
"            UNNAMED", "            UNNAMED", "            UNNAMED", 
"            UNNAMED"), Date = structure(c(-42861, -42488, -42119, 
-42118, -42118, -41765), class = "Date"), Time = c(600L, 1200L, 
1200L, 1800L, 2000L, 0L), Wind = c(100L, 130L, 110L, 100L, 100L, 
50L), Pressure = c(961L, 924L, 938L, 950L, 950L, 997L)), row.names = c(NA, 
6L), class = "data.frame")

hurricanes_3 contains a Date column that is formatted as a date but I don't know why this code is selectively destroying the dates for everything before 1900. The whole date column in Excel (including the properly formatted data) is of 'Custom' type, not sure if that has anything to do with it. Any help would be greatly appreciated!

Image of Excel Issue

CodePudding user response:

A date is a postive integer number of days from a starting date. In Excel, that start date is Jan 1, 1900 or Dec 31, 1899 (depending on how the bug in Excel is handled, Feb 29, 1900 did not exist, but Excel thinks it did.), thus Excel cannot represent dates earlier than the origin.

The workaround is to convert the date column to a string with the format() function before exporting.

  •  Tags:  
  • r
  • Related