I have the following df
df <- data.frame(value = c(1,2,3,4,5,6,7,8,9,10), win=c(1,1,1,2,2,3,4,4,5,5))
> df
value win
1 1 1
2 2 1
3 3 1
4 4 2
5 5 2
6 6 3
7 7 4
8 8 4
9 9 5
10 10 5
And I wanted to keep only the rows where the variable win is in more that 3 rows. So if I look into
> table(df$win)
1 2 3 4 5
3 2 1 2 2
I know that I will only want to keep the rows where win=1. But how do I do that for a big data frame ?
I was thinking of having a vector which would give me the unique values of df$win
xx <- unique(df$win)
> xx
[1] 1 2 3 4 5
And somehow make a loop where it would count which rows does df$win == xx and then extract only those rows but I wasn't able to make it come true so if any of you could help me I would be very thankfull !
Edit
Expected output [only for this example tho so doing subset(df, win =="1") is not useful as I don't know which "win" will be in more than 3 rows]
> new_df
value win
1 1 1
2 2 1
3 3 1
CodePudding user response:
If you have a big dataset, use data.table
library(data.table)
setDT(df)[, if(.N>=3) .SD, win]
Output:
win value
1: 1 1
2: 1 2
3: 1 3