Home > Enterprise >  Selecting the group with the most observations in R
Selecting the group with the most observations in R

Time:07-29

I'm trying to create a new column in my data frame that is the result of which of the three different columns has the highest value.

The data looks like this:

head(primary_game_fb, n = 20)
    pitcher game_date  ff_num si_num fc_num
     <dbl> <date>      <dbl>  <dbl>  <dbl>
 1  405395 2022-05-15      0      0      0
 2  424144 2022-04-07      1      8      0
 3  424144 2022-04-08      1      2      0
 4  424144 2022-04-13      0      2      0
 5  424144 2022-04-17      0     10      0
 6  424144 2022-04-19      0      4      0
 7  424144 2022-04-22      0     11      0
 8  424144 2022-04-24      1      6      0
 9  425794 2022-04-07      6     28     15
10  425794 2022-04-14      8     22     20
11  425794 2022-04-19      3     25     27
12  425794 2022-04-24      7     28     20
13  425794 2022-04-29     10     26     21
14  425794 2022-05-04      8     28     24
15  425794 2022-05-15      6     31     17
16  425794 2022-05-20     10     25     19
17  425794 2022-05-26     14     25     27
18  425794 2022-05-31     25     30     21
19  425794 2022-06-05     11     30     25
20  425794 2022-06-11      6     37     34

My attempts so far have been:

primary_game_fb <- primary_game_fb%>%
dplyr::mutate(primary_fb = case_when(ff_num > si_num & fc_num ~ "FF",
                                   si_num > ff_num & fc_num ~ "SI",
                                   fc_num > ff_num & si_num ~ "FC",
                                   TRUE ~ NA_real_))

which returns the error:

    Error in `dplyr::mutate()`:
! Problem while computing `primary_fb = case_when(...)`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]

and

primary_game_fb <- primary_game_fb%>%
dplyr::mutate(primary_fb = ifelse(ff_num > si_num & fc_num, "FF",
                                ifelse(si_num > ff_num & fc_num, "SI",
                                       ifelse(fc_num > ff_num & si_num, "FC", NA))))

which doesn't return an error, but doesn't do what I want it to:

    head(primary_game_fb, n = 20)
# A tibble: 20 x 6
   pitcher game_date  ff_num si_num fc_num primary_fb
     <dbl> <date>      <dbl>  <dbl>  <dbl> <chr>     
 1  405395 2022-05-15      0      0      0 NA        
 2  424144 2022-04-07      1      8      0 NA        
 3  424144 2022-04-08      1      2      0 NA        
 4  424144 2022-04-13      0      2      0 NA        
 5  424144 2022-04-17      0     10      0 NA        
 6  424144 2022-04-19      0      4      0 NA        
 7  424144 2022-04-22      0     11      0 NA        
 8  424144 2022-04-24      1      6      0 NA        
 9  425794 2022-04-07      6     28     15 SI        
10  425794 2022-04-14      8     22     20 SI        
11  425794 2022-04-19      3     25     27 SI        
12  425794 2022-04-24      7     28     20 SI        
13  425794 2022-04-29     10     26     21 SI        
14  425794 2022-05-04      8     28     24 SI        
15  425794 2022-05-15      6     31     17 SI        
16  425794 2022-05-20     10     25     19 SI        
17  425794 2022-05-26     14     25     27 SI        
18  425794 2022-05-31     25     30     21 SI        
19  425794 2022-06-05     11     30     25 SI        
20  425794 2022-06-11      6     37     34 SI  

The second solution works on some rows, but not all, such as the 5th row, where primary_fb should be equal to "SI" because it has more observations that num_ff or num_fc. What am I doing wrong?

CodePudding user response:

We may need to change the logical expressions as

library(dplyr)
primary_game_fb%>%
  dplyr::mutate(primary_fb = ifelse(ff_num > si_num & ff_num > fc_num, 
  "FF",                                
 ifelse(si_num > ff_num & si_num > fc_num, "SI",   
                  ifelse(fc_num > ff_num & fc_num > si_num, "FC", NA))))
  • Related