Home > front end >  How to filter df based on column in R
How to filter df based on column in R

Time:09-08

I am trying to filter a df based on a specific column, but the criteria is not always the same.

I am looking at a large NBA dataset and some players have been on multiple teams, so they have one observation for each team they played for as well as a total (TOT) row. I would like to pull the TOT obs. for players who have that. If the player only played for one team then they do not have a TOT in the team column so I would like to keep that observation.

The example data might make this easier to understand. I know how to filter based on a column, but not sure how to adjust for those that might not have a TOT in that column.

library ("dplyr")
  
# declaring a dataframe
data_frame = data.frame(Player = c("Luka","Steph","Anderson","Anderson","Anderson") , 
                        Games= c(60, 59, 42, 30, 12), 
                        Team= c('Dallas', 'Warriors', 'TOT', 'CLE', 'IND'))
  
print ("Original dataframe")
print (data_frame)
  
# checking which values of col1 
# are equivalent to b or e
data_frame_mod <- filter(data_frame, Team == 'TOT')
  
print ("Modified dataframe")
print (data_frame_mod)

This code produces:

Filtered for TOT

However, I would also like to include the Luka and Steph rows because they only played for one team. Below is the expected output:

Expected df

CodePudding user response:

You could use slice_max, since the TOT column is the sum of all the games played:

library(dplyr)
data_frame %>% 
   group_by(Player) %>% 
   slice_max(Games)

output

  Player   Games Team    
  <chr>    <dbl> <chr>   
1 Anderson    42 TOT     
2 Luka        60 Dallas  
3 Steph       59 Warriors

CodePudding user response:

We could do a grouping by 'Player' and use the condition to check number of rows (n() ==1)

library(dplyr)
data_frame %>% 
 group_by(Player) %>%
 filter(n() ==1| Team == 'TOT') %>%
 ungroup

-output

# A tibble: 3 × 3
  Player   Games Team    
  <chr>    <dbl> <chr>   
1 Luka        60 Dallas  
2 Steph       59 Warriors
3 Anderson    42 TOT    

CodePudding user response:

A slightly variation by using group_by arrange and slice:

library(dplyr)

data_frame %>% 
  group_by(Player) %>% 
  arrange(Player, .by_group = TRUE) %>% 
  slice(1)

  Player   Games Team    
  <chr>    <dbl> <chr>   
1 Anderson    42 TOT     
2 Luka        60 Dallas  
3 Steph       59 Warriors
  • Related