Home > Back-end >  Subsetting the closest number to zero for negative numbers only
Subsetting the closest number to zero for negative numbers only

Time:11-08

I am trying to find the closest number to zero amongst negative numbers only. When all columns accident1:accident3 are positive, return NA. When all number are negative, return max of the negative numbers in accident1:accident3. When both positive and negative numbers are present, only return the largest number below zero (including zero).

Data:

df <- data.frame(id=1:4, accident1=c(-1,-1,3, NA), accident2=c(-5,100, 2, NA), accident3=c(-4,-3,1,NA))

> df
  id accident1 accident2 accident3
1  1        -1        -5        -4
2  2        -1       100        -3
3  3         3         2         1
4  4        NA        NA        NA

Attempt:

df %>%
  rowwise() %>%
  mutate(magic=
           case_when(
             accident1 < 0 & accident2 < 0 & accident3 < 0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
             accident1 > 0 & accident2 > 0 & accident3 > 0 ~ NA_real_,
             (accident1 >0 |accident2<0 |accident3<0) & (accident1 >0 |accident2>0 | accident3>0) ~ 
      # need max for cell <0
      as.numeric(pmax(accident1, accident2, accident3, na.rm=T)), TRUE~NA_real_))

Result:

     id accident1 accident2 accident3 magic
  <int>     <dbl>     <dbl>     <dbl> <dbl>
1     1        -1        -5        -4    -1
2     2        -1       100        -3   100
3     3         3         2         1    NA
4     4        NA        NA        NA    NA

Desired:

     id accident1 accident2 accident3 magic
  <int>     <dbl>     <dbl>     <dbl> <dbl>
1     1        -1        -5        -4    -1
2     2        -1       100        -3    -1
3     3         3         2         1    NA
4     4        NA        NA        NA    NA

CodePudding user response:

If you convert the data frame to long format you can do this a little bit more compactly. (It also generalizes to any number of accident types, missing accident types within id, etc. ...)

sfun <- function(x) { 
   x <- na.omit(x)
   ## if x has no non-NA values, all(x>0) will be TRUE
   if (all(x>0))  NA_real_ else max(x[x<=0]) 
}
(df 
   ## convert to long format
   %>% pivot_longer(-id) 
   %>% group_by(id) 
   ## apply summary function to values within id
   %>% summarise(magic=sfun(value))
   ## add original columns back in 
   %>% full_join(df, by = "id")
)

The only difference is that the magic column is before the rest of the data, not after it (you could add a call to relocate() if you like)

CodePudding user response:

Using sfun function from @Ben Bolker's answer you can also do this keeping the data in wide format. Using rowwise and c_across.

library(dplyr)

sfun <- function(x) { 
  x <- na.omit(x)
  if (all(x>0))  NA_real_ else max(x[x<=0]) 
}

df %>%
  rowwise() %>%
  mutate(magic = sfun(c_across(starts_with('accident')))) %>%
  ungroup

#     id accident1 accident2 accident3 magic
#  <int>     <dbl>     <dbl>     <dbl> <dbl>
#1     1        -1        -5        -4    -1
#2     2        -1       100        -3    -1
#3     3         3         2         1    NA
#4     4        NA        NA        NA    NA
  • Related