Home > Back-end >  can't use mutate with a DATE
can't use mutate with a DATE

Time:03-26

I have this database:

structure(list(id_mujer = c(8528, 8528, 11711, 11711, 11818,

11818), hpv_post = structure(c(1339459200, 1458172800, 1443571200,

1443571200, 1354838400, 1525392000), tzone = "UTC", class = c("POSIXct",

"POSIXt")), hpv_post_res = c("NEG", "NEG", "POS", "POS", "NEG",

"NEG")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"

))

# A tibble: 6 x 3

id_mujer hpv_post hpv_post_res

<dbl> <dttm> <chr>

1 8528 2012-06-12 00:00:00 NEG

2 8528 2016-03-17 00:00:00 NEG

3 11711 2015-09-30 00:00:00 POS

4 11711 2015-09-30 00:00:00 POS

5 11818 2012-12-07 00:00:00 NEG

6 11818 2018-05-04 00:00:00 NEG

I need to create a column called "positive" where the hpv_post_res of an observation is "POS" put the hpv_post of that observation, and if not a 0. The problem is that when I do this:

base_contactos_3 %>%

mutate(positivo= if_else(hpv_post_res=="POS", hpv_post, 0))

For some reason I get this error:

Error: Problem with `mutate()` column `positivo`.

i `positivo = if_else(hpv_post_res == "POS", hpv_post, 0)`.

x 'origin' must be supplied

Run `rlang::last_error()` to see where the error occurred.

When a say that if the condition is false put a 0 is only because i dont know what to do and it is not transcendent. But if instead of a 0 i put a string like:

base_contactos_3 %>%

mutate(positivo= if_else(hpv_post_res=="POS", hpv_post, "negative"))

It gives me this error:

Error: Problem with `mutate()` column `positivo`.

i `positivo = if_else(hpv_post_res == "POS", hpv_post, "negative")`.

x `false` must be a `POSIXct/POSIXt` object, not a character vector.

Run `rlang::last_error()` to see where the error occurred.

And if a put a NA instead of a 0 (which is the option i prefer) it gives me this error:

Error: Problem with `mutate()` column `positivo`.

i `positivo = if_else(hpv_post_res == "POS", hpv_post, NA)`.

x `false` must be a `POSIXct/POSIXt` object, not a logical vector.

Run `rlang::last_error()` to see where the error occurred.

CodePudding user response:

The error is telling you the reason : The column you "want to mutate" is character type "POS" if TRUE but if FALSE you are providing date. thats why if you give a character it works.

1st idea that worked for me: if you don't find it useful, please let me know.

 base_contactos_3%>%
  mutate(positivo= if_else(hpv_post_res=="POS", as.character(hpv_post), as.character(0)))

gives the following result. Since you are merging date and character as choices for TRUE/FALSE, you can have only 1 in resulting column. (Since time is 0 only so only dates are taken)

# A tibble: 6 x 4
  id_mujer hpv_post            hpv_post_res positivo  
     <dbl> <dttm>              <chr>        <chr>     
1     8528 2012-06-12 00:00:00 NEG          0         
2     8528 2016-03-17 00:00:00 NEG          0         
3    11711 2015-09-30 00:00:00 POS          2015-09-30
4    11711 2015-09-30 00:00:00 POS          2015-09-30
5    11818 2012-12-07 00:00:00 NEG          0         
6    11818 2018-05-04 00:00:00 NEG          0

CodePudding user response:

There are two causes to you probelm:

  • Every entry in a column of a data frame needs to be of the same type.
  • if_else requires that the yes and no outputs have the same type (see here).

Each of your attempts fails for some combination of these two reasons:

  1. if_else(hpv_post_res == "POS", hpv_post, 0) identifies that the output is type date. As zero is not a date it looks to convert 0 to a date. If an origin date exists then it can convert numeric values to dates. But there is no origin, hence it errors.
  2. if_else(hpv_post_res == "POS", hpv_post, "negative") is trying to output both date and string types. if_else will not permit this.
  3. if_else(hpv_post_res == "POS", hpv_post, NA) would work in most instances as columns of a data frame can take values and NA. But in the strictest sense NA is of type logical. Hence, the error.

Solution options:

  • As @anuanand suggests you can convert to character.
  • ifelse is similar to if_else but does not require data types to be identical. However, it converts dates to numeric. So this would be a solution if your input column was not of type date.
  • Following this solution you can find out what the origin is. You can then use the ifelse approach: as.Date(ifelse(hpv_post_res == "POS", hpv_post, NA), origin = lubridate::origin()).
  • if you are willing to step outside of dplyr you can use logical indexing:
base_contactos_3$positivo = base_contactos_3$hpv_post
base_contactos_3$positivo[base_contactos_3$hpv_post_res != "POS"] = NA
  • Related