Home > Software engineering >  keep only ids that have all three values of the column Mode
keep only ids that have all three values of the column Mode

Time:12-22

I have a pandas dataframe with multiple columns, which looks like the following:

Index ID Year Code Type Mode
0 100 2018 ABC 1 1
1 100 2019 DEF 2 2
2 100 2019 GHI 3 3
3 102 2018 JKL 4 1
4 103 2019 MNO 5 1
5 103 2018 PQR 6 2
6 102 2019 PQR 3 2

I only want to keep ids that have rows against all the values for the column Mode. An example would look like this:

Index ID Year Code Type Mode
0 100 2018 ABC 1 1
1 100 2019 DEF 2 2
2 100 2019 GHI 3 3

I have already tried doing so by using the following code:

df = data.groupby('ID').filter(lambda x: {1, 2, 3}.issubset(x['Mode']))

but this returns an empty result. Can someone help me here? TIA

CodePudding user response:

You can try

out = df.groupby('ID').filter(lambda x : pd.Series([1,2,3]).isin(x['Mode']).all())
Out[9]: 
   Index   ID  Year Code  Type  Mode
0      0  100  2018  ABC     1     1
1      1  100  2019  DEF     2     2
2      2  100  2019  GHI     3     3

CodePudding user response:

Your code works just fine (on pandas 1.3, python 3.9):

out = df.groupby('ID').filter(lambda x: {1,2,3}.issubset(x['Mode']))

Output:

   Index   ID  Year Code  Type  Mode
0      0  100  2018  ABC     1     1
1      1  100  2019  DEF     2     2
2      2  100  2019  GHI     3     3
  • Related