I already posted a question, but I presented it badly. Here my problem:
I have a dataframe like that:
Col1 | Col2 | Col3 | Col4 | DESIRED COLUMN |
---|---|---|---|---|
SF | 123 | 01/02 | UP | UP, WA |
BF | 543 | 30/12 | DO | DO, AF |
QW | 241 | 30/12 | AF | DO, AF |
SF | 123 | 01/02 | WA | UP, WA |
QW | 789 | 20/11 | D | D |
SF | 678 | 31/12 | OT | OT |
I wish to obtain the DESIRED COLUMN. The logic is the following:
- Have a condition that distinguishes the cases when Col1 is equal to SF or not
- In case Col1 is NOT equal to SF then I would groupby for Col3 and I would aggregate items of Col4
- In case Col1 is equal to SF then I would groupby for Col2 and Col3 and I would aggregate items of Col4
Where is my problem? I don't know if I overthought but using groupby pandas reduces the rows of the dataframe. My objective is simply to add a column, like I showed above.
Thank you in advance!!!
CodePudding user response:
Use GroupBy.transform
with numpy.where
:
m = df.Col1.eq('SF')
s1 = df.groupby(['Col2','Col3'])['Col4'].transform(', '.join)
s2 = df.groupby(['Col3'])['Col4'].transform(', '.join)
df['DESIRED COLUMN'] = np.where(m, s1, s2)
print (df)
Col1 Col2 Col3 Col4 DESIRED COLUMN
0 SF 123 01/02 UP UP, WA
1 BF 543 30/12 DO DO, AF
2 QW 241 30/12 AF DO, AF
3 SF 123 01/02 WA UP, WA
4 QW 789 20/11 D D
5 SF 678 31/12 OT OT
Possible solution if NaN
s in Col2, Col3
:
m = df.Col1.eq('SF')
df1 = df.fillna({'Col2':'nan', 'Col3':'nan'})
s1 = df1.groupby(['Col2','Col3'])['Col4'].transform(', '.join)
s2 = df1.groupby(['Col3'])['Col4'].transform(', '.join)
df['DESIRED COLUMN'] = np.where(m, s1, s2)