I have a pandas dataframe with multiple columns, which looks like the following:
Index | ID | Year | Code | Type | Mode |
---|---|---|---|---|---|
0 | 100 | 2018 | ABC | 1 | 1 |
1 | 100 | 2019 | DEF | 2 | 2 |
2 | 100 | 2019 | GHI | 3 | 3 |
3 | 102 | 2018 | JKL | 4 | 1 |
4 | 103 | 2019 | MNO | 5 | 1 |
5 | 103 | 2018 | PQR | 6 | 2 |
6 | 102 | 2019 | PQR | 3 | 2 |
I only want to keep ids that have rows against all the values for the column Mode
. An example would look like this:
Index | ID | Year | Code | Type | Mode |
---|---|---|---|---|---|
0 | 100 | 2018 | ABC | 1 | 1 |
1 | 100 | 2019 | DEF | 2 | 2 |
2 | 100 | 2019 | GHI | 3 | 3 |
I have already tried doing so by using the following code:
df = data.groupby('ID').filter(lambda x: {1, 2, 3}.issubset(x['Mode']))
but this returns an empty result. Can someone help me here? TIA
CodePudding user response:
You can try
out = df.groupby('ID').filter(lambda x : pd.Series([1,2,3]).isin(x['Mode']).all())
Out[9]:
Index ID Year Code Type Mode
0 0 100 2018 ABC 1 1
1 1 100 2019 DEF 2 2
2 2 100 2019 GHI 3 3
CodePudding user response:
Your code works just fine (on pandas 1.3, python 3.9):
out = df.groupby('ID').filter(lambda x: {1,2,3}.issubset(x['Mode']))
Output:
Index ID Year Code Type Mode
0 0 100 2018 ABC 1 1
1 1 100 2019 DEF 2 2
2 2 100 2019 GHI 3 3