I have this database:
structure(list(id_mujer = c(8528, 8528, 11711, 11711, 11818,
11818), hpv_post = structure(c(1339459200, 1458172800, 1443571200,
1443571200, 1354838400, 1525392000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), hpv_post_res = c("NEG", "NEG", "POS", "POS", "NEG",
"NEG")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
# A tibble: 6 x 3
id_mujer hpv_post hpv_post_res
<dbl> <dttm> <chr>
1 8528 2012-06-12 00:00:00 NEG
2 8528 2016-03-17 00:00:00 NEG
3 11711 2015-09-30 00:00:00 POS
4 11711 2015-09-30 00:00:00 POS
5 11818 2012-12-07 00:00:00 NEG
6 11818 2018-05-04 00:00:00 NEG
I need to create a column called "positive" where the hpv_post_res of an observation is "POS" put the hpv_post of that observation, and if not a 0. The problem is that when I do this:
base_contactos_3 %>%
mutate(positivo= if_else(hpv_post_res=="POS", hpv_post, 0))
For some reason I get this error:
Error: Problem with `mutate()` column `positivo`.
i `positivo = if_else(hpv_post_res == "POS", hpv_post, 0)`.
x 'origin' must be supplied
Run `rlang::last_error()` to see where the error occurred.
When a say that if the condition is false put a 0 is only because i dont know what to do and it is not transcendent. But if instead of a 0 i put a string like:
base_contactos_3 %>%
mutate(positivo= if_else(hpv_post_res=="POS", hpv_post, "negative"))
It gives me this error:
Error: Problem with `mutate()` column `positivo`.
i `positivo = if_else(hpv_post_res == "POS", hpv_post, "negative")`.
x `false` must be a `POSIXct/POSIXt` object, not a character vector.
Run `rlang::last_error()` to see where the error occurred.
And if a put a NA instead of a 0 (which is the option i prefer) it gives me this error:
Error: Problem with `mutate()` column `positivo`.
i `positivo = if_else(hpv_post_res == "POS", hpv_post, NA)`.
x `false` must be a `POSIXct/POSIXt` object, not a logical vector.
Run `rlang::last_error()` to see where the error occurred.
CodePudding user response:
The error is telling you the reason : The column you "want to mutate" is character type "POS" if TRUE but if FALSE you are providing date. thats why if you give a character it works.
1st idea that worked for me: if you don't find it useful, please let me know.
base_contactos_3%>%
mutate(positivo= if_else(hpv_post_res=="POS", as.character(hpv_post), as.character(0)))
gives the following result. Since you are merging date and character as choices for TRUE/FALSE, you can have only 1 in resulting column. (Since time is 0 only so only dates are taken)
# A tibble: 6 x 4
id_mujer hpv_post hpv_post_res positivo
<dbl> <dttm> <chr> <chr>
1 8528 2012-06-12 00:00:00 NEG 0
2 8528 2016-03-17 00:00:00 NEG 0
3 11711 2015-09-30 00:00:00 POS 2015-09-30
4 11711 2015-09-30 00:00:00 POS 2015-09-30
5 11818 2012-12-07 00:00:00 NEG 0
6 11818 2018-05-04 00:00:00 NEG 0
CodePudding user response:
There are two causes to you probelm:
- Every entry in a column of a data frame needs to be of the same type.
if_else
requires that the yes and no outputs have the same type (see here).
Each of your attempts fails for some combination of these two reasons:
if_else(hpv_post_res == "POS", hpv_post, 0)
identifies that the output is type date. As zero is not a date it looks to convert 0 to a date. If an origin date exists then it can convert numeric values to dates. But there is no origin, hence it errors.if_else(hpv_post_res == "POS", hpv_post, "negative")
is trying to output both date and string types.if_else
will not permit this.if_else(hpv_post_res == "POS", hpv_post, NA)
would work in most instances as columns of a data frame can take values and NA. But in the strictest senseNA
is of type logical. Hence, the error.
Solution options:
- As @anuanand suggests you can convert to character.
ifelse
is similar toif_else
but does not require data types to be identical. However, it converts dates to numeric. So this would be a solution if your input column was not of type date.- Following this solution you can find out what the origin is. You can then use the
ifelse
approach:as.Date(ifelse(hpv_post_res == "POS", hpv_post, NA), origin = lubridate::origin())
. - if you are willing to step outside of dplyr you can use logical indexing:
base_contactos_3$positivo = base_contactos_3$hpv_post
base_contactos_3$positivo[base_contactos_3$hpv_post_res != "POS"] = NA