Home > Back-end >  Modify R column by creating function, code error
Modify R column by creating function, code error

Time:06-15

I created these lines (function) to modify a specific column of a data frame, I want to use this function to run it for different column and data frame, but the function does not work, I got a error code message.

change.date <-  function(df_date,col_nb,first.year, second.year){
  df_date$col_nb <- gsub(first.year, second.year,  df_date$col_nb)
  df_date$col_nb <- as.Date(df_date$col_nb)
  df_date$col_nb <-  as.numeric(df_date$col_nb)
    
}

change.date(df_2020,df_2020[1], "2020","2020")

Error in $<-.data.frame`(*tmp*`, "col_nb", value = character(0)):
replacement table has 0 rows, replaced table has 7265

my reproducible data are:

df_2020 <- dput(test_qst)
structure(list(Date = structure(c(1588809600, 1588809600, 1588809600, 
1588809600, 1588809600, 1588809600, 1588809600, 1588809600, 1588809600, 
1588809600, 1588809600, 1588809600, 1588809600, 1588809600), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Depth = c(1.72, 3.07, 3.65, 4.58, 
5.39, 6.31, 7.27, 8.57, 9.73, 10.78, 11.71, 12.81, 13.79, 14.96
), salinity = c(34.7299999999999, 34.79, 34.76, 34.78, 34.77, 
34.79, 34.76, 34.71, 34.78, 34.78, 34.7999999999999, 34.86, 34.7999999999999, 
34.83)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-14L))

CodePudding user response:

You may try

change.date <-  function(df_date,col_nb,first.year, second.year){
  df_date[[col_nb]] <- gsub(first.year, second.year,  df_date[[col_nb]])
  
   
  df_date[[col_nb]] <- as.Date(df_date[[col_nb]])
  df_date[[col_nb]] <-  as.numeric(df_date[[col_nb]])
  df_date
}

change.date(df_2020, "Date", "2020","2020")

    Date Depth salinity
   <dbl> <dbl>    <dbl>
 1 18389  1.72     34.7
 2 18389  3.07     34.8
 3 18389  3.65     34.8
 4 18389  4.58     34.8
 5 18389  5.39     34.8
 6 18389  6.31     34.8
 7 18389  7.27     34.8
 8 18389  8.57     34.7
 9 18389  9.73     34.8
10 18389 10.8      34.8
11 18389 11.7      34.8
12 18389 12.8      34.9
13 18389 13.8      34.8
14 18389 15.0      34.8

CodePudding user response:

One issue you may find when using gsub is that you lose the dates. Unless you need a numerical timescale, then it may be better to keep dates for plotting and analysis.

Using dplyr, this extracts the years, changes them, and then creates dates again, (even if they are the same year):

library(dplyr)

change.date <-  function(df_date, col_nb = "Date", first.year, second.year) {

  col_nb <- which(colnames(df_date) %in% col_nb)      
  
  df_date %>% 
    mutate(year = lubridate::year(.[[col_nb]])) %>% 
    mutate(year = ifelse(year == first.year, second.year, year)) %>% 
    mutate(Date = lubridate::make_date(year, lubridate::month(.[[col_nb]]), lubridate::day(.[[col_nb]]))) %>% 
    select(-year)
}

change.date(df_2020, "Date", 2020, 2020)

# A tibble: 14 x 3

   Date       Depth salinity
   <date>     <dbl>    <dbl>
 1 2020-05-07  1.72     34.7
 2 2020-05-07  3.07     34.8
 3 2020-05-07  3.65     34.8
 4 2020-05-07  4.58     34.8
 5 2020-05-07  5.39     34.8
 6 2020-05-07  6.31     34.8
 7 2020-05-07  7.27     34.8
 8 2020-05-07  8.57     34.7
 9 2020-05-07  9.73     34.8
10 2020-05-07 10.8      34.8
11 2020-05-07 11.7      34.8
12 2020-05-07 12.8      34.9
13 2020-05-07 13.8      34.8
14 2020-05-07 15.0      34.8

If you do want numerical dates, then use this instead of the second last line:

mutate(Date = as.numeric(lubridate::make_date(year, lubridate::month(.[[col_nb]]), lubridate::day(.[[col_nb]])))) %>% 

One comment on your function is to be consistent on the case. Camel case, snake case or, less so, dot case are all acceptable, but using a combination makes it harder to keep track of variables, e.g. df_date versus first.year.

  • Related