I think I'm confusing something with mapply
usage but I cannot see what. I'm trying to categorize multiple groups according a specific cutoff for each one...
> dt <- source("https://pastebin.com/raw/pX0XVBSB")$value
> dt$aux <- mapply(x = unique(dt$group), y = c(rep(0.02, 2), rep(0.2, 4)),
function(x,y){
ifelse(dt$var[dt$group == x] < x, 0, 1)
}) %>% unlist
> head(dt[is.na(dt$var),])
# group var aux
# 52 g3 NA 0
# 66 g4 NA 0
# 287 g3 NA 0
# 336 g3 NA 0
# 337 g3 NA 0
# 363 g6 NA 0
... but something is happening with NAs, I expected that var = NA
would be also NA (the rest of values are correctly categorized).
Any idea about what I'm doing wrong, please?
EDIT
I'd expect a correct var
categorization: 0 if below the specific cutoff and 1 if equal or higher.
# group var aux
# 1 g1 0.010 0 #below cutoff for g1, 0.02
# 2 g1 0.210 1 #above cutoff for g1, 0.02
# 3 g1 0.021 1
# 4 g1 0.021 1
# 5 g3 0.001 0 #below cutoff for g3, 0.2
# 6 g3 3.100 1 #above cutoff for g3, 0.2
CodePudding user response:
Is this what you are looking for? It would be easier to just set up two conditional statements:
library(tidyverse)
dt <- source("https://pastebin.com/raw/pX0XVBSB")$value |>
as_tibble()
dt |>
mutate(aux = case_when(
group %in% c("g1", "g2") ~ ifelse(var < 0.02, 0, 1),
T ~ ifelse(var < 0.2, 0, 1)
))
#> # A tibble: 512 x 3
#> group var aux
#> <chr> <dbl> <dbl>
#> 1 g1 0.01 0
#> 2 g1 0.01 0
#> 3 g1 0 0
#> 4 g1 0 0
#> 5 g1 0.021 1
#> 6 g1 0.021 1
#> 7 g1 0.0008 0
#> 8 g1 0.0008 0
#> 9 g1 0.0014 0
#> 10 g1 0.0014 0
#> # ... with 502 more rows
EDIT
Here is a base R way
dt$aux <- ifelse(dt$group %in% c("g1", "g2"),
ifelse(dt$var < 0.02, 0, 1),
ifelse(dt$var < 0.2, 0, 1))
head(dt)
#> # A tibble: 6 x 3
#> group var aux
#> <chr> <dbl> <dbl>
#> 1 g1 0.01 0
#> 2 g1 0.01 0
#> 3 g1 0 0
#> 4 g1 0 0
#> 5 g1 0.021 1
#> 6 g1 0.021 1
EDIT 2
library(tidyverse)
vals <- map2(unique(dt$group),
c(rep(0.02, 2), rep(0.2, 4)),
\(x,y) (ifelse(dt[dt$group == x,"var"] < y, 0, 1))) |>
unlist()
dt|>
arrange(group) |>
mutate(aux = vals)
#> # A tibble: 512 x 3
#> group var aux
#> <chr> <dbl> <dbl>
#> 1 g1 0.01 0
#> 2 g1 0.01 0
#> 3 g1 0 0
#> 4 g1 0 0
#> 5 g1 0.021 1
#> 6 g1 0.021 1
#> 7 g1 0.0008 0
#> 8 g1 0.0008 0
#> 9 g1 0.0014 0
#> 10 g1 0.0014 0
#> # ... with 502 more rows
The problem with this method is that you arrange the values in a different order than is in your dataset, so you need to resort the data before you add the new variable to the dataset.