Home > Software design >  PANDAS groupby 2 columns with condition
PANDAS groupby 2 columns with condition

Time:01-25

I have a data frame and I need to group by 2 columns and create a new column based on condition. My data looks like this:

ID week day_num
1 1 2
1 1 3
1 2 4
1 2 1
2 1 1
2 2 2
3 1 4

I need to group by the columns ID & week so there's a row for each ID for each week. The groupby is based on condition- if for a certain week an ID has the value 1 in column day_num, the value will be 1 under groupby, otherwise 0. For example, ID 1 has 2 & 3 under both rows so it equals 0 under groupby, for week 2 ID 1 it has a row with value 1, so 1.

The output I need looks like this:

ID week day1
1 1 0
1 2 1
2 1 1
2 2 0
3 1 0

I searched and found this code, but it uses count, where I just need to write the value 1 or 0.

df1=df1.groupby('ID','week')['day_num'].apply(lambda x: (x=='1').count())

Is there a way to do this?

Thanks!

CodePudding user response:

You can approach from the other way: check equality against 1 in "day_num" and group that by ID & week. Then aggregate with any to see if there was any 1 in the groups. Lastly convert True/Falses to 1/0 and move groupers to columns.

df["day_num"].eq(1).groupby([df["ID"], df["week"]]).any().astype(int).reset_index()

   ID  week  day_num
0   1     1        0
1   1     2        1
2   2     1        1
3   2     2        0
4   3     1        0

CodePudding user response:

import pandas as pd

src = pd.DataFrame({'ID': [1, 1, 1, 1, 2, 2, 3],    
                    'week': [1, 1, 2, 2, 1, 2, 1],    
                    'day_num': [2, 3, 4, 1, 1, 2, 4],
                    })


src['day_num'] = (~(src['day_num']-1).astype(bool)).astype(int)
r = src.sort_values(by=['day_num']).drop_duplicates(['ID', 'week'], keep='last').sort_index().reset_index(drop=True)
print(r)

Result

   ID  week  day_num
0   1     1        0
1   1     2        1
2   2     1        1
3   2     2        0
4   3     1        0
  • Related