I created these lines (function) to modify a specific column of a data frame, I want to use this function to run it for different column and data frame, but the function does not work, I got a error code message.
change.date <- function(df_date,col_nb,first.year, second.year){
df_date$col_nb <- gsub(first.year, second.year, df_date$col_nb)
df_date$col_nb <- as.Date(df_date$col_nb)
df_date$col_nb <- as.numeric(df_date$col_nb)
}
change.date(df_2020,df_2020[1], "2020","2020")
Error in $<-.data.frame`(*tmp*`, "col_nb", value = character(0)):
replacement table has 0 rows, replaced table has 7265
my reproducible data are:
df_2020 <- dput(test_qst)
structure(list(Date = structure(c(1588809600, 1588809600, 1588809600,
1588809600, 1588809600, 1588809600, 1588809600, 1588809600, 1588809600,
1588809600, 1588809600, 1588809600, 1588809600, 1588809600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Depth = c(1.72, 3.07, 3.65, 4.58,
5.39, 6.31, 7.27, 8.57, 9.73, 10.78, 11.71, 12.81, 13.79, 14.96
), salinity = c(34.7299999999999, 34.79, 34.76, 34.78, 34.77,
34.79, 34.76, 34.71, 34.78, 34.78, 34.7999999999999, 34.86, 34.7999999999999,
34.83)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-14L))
CodePudding user response:
You may try
change.date <- function(df_date,col_nb,first.year, second.year){
df_date[[col_nb]] <- gsub(first.year, second.year, df_date[[col_nb]])
df_date[[col_nb]] <- as.Date(df_date[[col_nb]])
df_date[[col_nb]] <- as.numeric(df_date[[col_nb]])
df_date
}
change.date(df_2020, "Date", "2020","2020")
Date Depth salinity
<dbl> <dbl> <dbl>
1 18389 1.72 34.7
2 18389 3.07 34.8
3 18389 3.65 34.8
4 18389 4.58 34.8
5 18389 5.39 34.8
6 18389 6.31 34.8
7 18389 7.27 34.8
8 18389 8.57 34.7
9 18389 9.73 34.8
10 18389 10.8 34.8
11 18389 11.7 34.8
12 18389 12.8 34.9
13 18389 13.8 34.8
14 18389 15.0 34.8
CodePudding user response:
One issue you may find when using gsub is that you lose the dates. Unless you need a numerical timescale, then it may be better to keep dates for plotting and analysis.
Using dplyr
, this extracts the years, changes them, and then creates dates again, (even if they are the same year):
library(dplyr)
change.date <- function(df_date, col_nb = "Date", first.year, second.year) {
col_nb <- which(colnames(df_date) %in% col_nb)
df_date %>%
mutate(year = lubridate::year(.[[col_nb]])) %>%
mutate(year = ifelse(year == first.year, second.year, year)) %>%
mutate(Date = lubridate::make_date(year, lubridate::month(.[[col_nb]]), lubridate::day(.[[col_nb]]))) %>%
select(-year)
}
change.date(df_2020, "Date", 2020, 2020)
# A tibble: 14 x 3
Date Depth salinity
<date> <dbl> <dbl>
1 2020-05-07 1.72 34.7
2 2020-05-07 3.07 34.8
3 2020-05-07 3.65 34.8
4 2020-05-07 4.58 34.8
5 2020-05-07 5.39 34.8
6 2020-05-07 6.31 34.8
7 2020-05-07 7.27 34.8
8 2020-05-07 8.57 34.7
9 2020-05-07 9.73 34.8
10 2020-05-07 10.8 34.8
11 2020-05-07 11.7 34.8
12 2020-05-07 12.8 34.9
13 2020-05-07 13.8 34.8
14 2020-05-07 15.0 34.8
If you do want numerical dates, then use this instead of the second last line:
mutate(Date = as.numeric(lubridate::make_date(year, lubridate::month(.[[col_nb]]), lubridate::day(.[[col_nb]])))) %>%
One comment on your function is to be consistent on the case. Camel case, snake case or, less so, dot case are all acceptable, but using a combination makes it harder to keep track of variables, e.g. df_date
versus first.year
.