Home > front end >  Formatting Dates in R - Cleaning my Data so That it Only Includes Certain Months of the Year
Formatting Dates in R - Cleaning my Data so That it Only Includes Certain Months of the Year

Time:12-07

Thank you for viewing this post first and foremost. I am struggling with formatting some dates with corresponding values in R. I am using a dataset that includes monthly snowfall (VALUE) with its respective year/month (month).

Currently, I have it to where it includes all months of the year, but I would like to limit it down to just including October-March. I am using library(dplyr) to help with the cleaning. Currently, I have the following code:

snow <- 
  snoworig %>% pivot_longer(cols = c(JUL:JUN)) %>% 
  select(SEASON, name, value) %>% 
  rename(MONTH = "name", VALUE = "value") %>% tidyr::unite("month", 1:2, sep = " ")

When running dput(head(snoworig,10)), I get the following:

structure(list(SEASON = c("1869-70", "1870-71", "1871-72", "1872-73", 
"1873-74", "1874-75", "1875-76", "1876-77", "1877-78", "1878-79"
), JUL = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), AUG = c(0, 0, 0, 0, 
0, 0, 0, 0, 0, 0), SEP = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), OCT = c(0, 
0, 0, 0, 0, 0, 0, 0.5, 0, 0), NOV = c(0, 0, 0.3, 3.5, 2, 0, 0, 
0.1, 0, 0.1), DEC = c(5.3, 3, 3.9, 27, 9.3, 10, 0.5, 12.4, 0, 
5.5), JAN = c(1.1, 15.9, 1.8, 10.6, 6.6, 14.5, 1.5, 20.5, 6.1, 
17.3), FEB = c(9.3, 12.1, 3, 18.8, 19, 4.5, 12.5, 0.4, 2, 11.3
), MAR = c(9.6, 0.1, 5.1, 0.4, 0, 15.3, 3.8, 6.5, 0, 1.5), APR = c(2.5, 
2, 0, 0, 0, 13.5, 0, 0, 0, 0), MAY = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 0), JUN = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), TOTAL = c(27.8, 
33.1, 14.1, 60.3, 36.9, 57.8, 18.3, 40.4, 8.1, 35.7)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Like I said, I am only wanting now to consider the months October-March. Is this a simple fix with the code I currently have, or should I be interested in starting from scratch (with the data from the original excel file)?

All help is appreciated, and if you have further questions please let me know. Thanks!

CodePudding user response:

base::grepl() does this nicely (?grepl):

snow %>% 
    filter(grepl("OCT|NOV|DEC|JAN|FEB|MAR", month))

# A tibble: 60 × 2
   month       VALUE
   <chr>       <dbl>
 1 1869-70 OCT   0  
 2 1869-70 NOV   0  
 3 1869-70 DEC   5.3
 4 1869-70 JAN   1.1
 5 1869-70 FEB   9.3
 6 1869-70 MAR   9.6
 7 1870-71 OCT   0  
 8 1870-71 NOV   0  
 9 1870-71 DEC   3  
10 1870-71 JAN  15.9
# … with 50 more rows
  •  Tags:  
  • r
  • Related