I have a data frame and I need to group by 2 columns and create a new column based on condition. My data looks like this:
ID | week | day_num |
---|---|---|
1 | 1 | 2 |
1 | 1 | 3 |
1 | 2 | 4 |
1 | 2 | 1 |
2 | 1 | 1 |
2 | 2 | 2 |
3 | 1 | 4 |
I need to group by the columns ID & week so there's a row for each ID for each week. The groupby is based on condition- if for a certain week an ID has the value 1 in column day_num, the value will be 1 under groupby, otherwise 0. For example, ID 1 has 2 & 3 under both rows so it equals 0 under groupby, for week 2 ID 1 it has a row with value 1, so 1.
The output I need looks like this:
ID | week | day1 |
---|---|---|
1 | 1 | 0 |
1 | 2 | 1 |
2 | 1 | 1 |
2 | 2 | 0 |
3 | 1 | 0 |
I searched and found this code, but it uses count, where I just need to write the value 1 or 0.
df1=df1.groupby('ID','week')['day_num'].apply(lambda x: (x=='1').count())
Is there a way to do this?
Thanks!
CodePudding user response:
You can approach from the other way: check equality against 1 in "day_num" and group that by ID & week. Then aggregate with any
to see if there was any 1 in the groups. Lastly convert True/Falses to 1/0 and move groupers to columns.
df["day_num"].eq(1).groupby([df["ID"], df["week"]]).any().astype(int).reset_index()
ID week day_num
0 1 1 0
1 1 2 1
2 2 1 1
3 2 2 0
4 3 1 0
CodePudding user response:
import pandas as pd
src = pd.DataFrame({'ID': [1, 1, 1, 1, 2, 2, 3],
'week': [1, 1, 2, 2, 1, 2, 1],
'day_num': [2, 3, 4, 1, 1, 2, 4],
})
src['day_num'] = (~(src['day_num']-1).astype(bool)).astype(int)
r = src.sort_values(by=['day_num']).drop_duplicates(['ID', 'week'], keep='last').sort_index().reset_index(drop=True)
print(r)
Result
ID week day_num
0 1 1 0
1 1 2 1
2 2 1 1
3 2 2 0
4 3 1 0