Data Frame is:
df=pd.DataFrame([[1,'a','c1',30,'s1','e1'],[1,'b','c1',60,'s1','e1'],[1,'b','c2',40,'s1','e1'],[2,'g','c1',40,'s2','e2'],[2,'g','c3',9,'s1','e1'],[3,'k','c2',20,'s1','e1'],[3,'k','c2',69,'s2','e1'],[3,'k','c1',29,'s1','e1'],[3,'f','c3',99,'s2','e1']], columns = ['Lot','Item','Code','Date','Shelf','Emp'])
Lot Item Code Date Shelf Emp
0 1 a c1 30 s1 e1
1 1 b c1 60 s1 e1
2 1 b c2 40 s1 e1
3 2 g c1 40 s2 e2
4 2 g c3 9 s1 e1
5 3 k c2 20 s1 e1
6 3 k c2 69 s2 e1
7 3 k c1 29 s1 e1
8 3 f c3 99 s2 e1
I need to
- Group By Lot and then Item
- Get the First Item in each Lot
- Finally Get the Max Date for each First Item in Each Lot
The OUTPUT should be:
Lot Item Code Date Shelf Emp
0 1 a c1 30 s1 e1
3 2 g c2 40 s2 e1
6 3 k c2 69 s2 e1
Thank you !!!
CodePudding user response:
I have done this using a few intermediary steps (there may be a simpler way, but this is how I would do it).
Firstly, group your df by Lot and return the first item for each lot using df.groupby
to create a temporary df:
temp_df1 = df.groupby('Lot', as_index=False).first()[['Lot', 'Item']]
Lot Item
0 1 a
1 2 g
2 3 k
I then used df.merge()
to merge the original df onto temp_df to get only the rows from df that contain the first item for each lot:
temp_df2 = df.merge(temp_df1, on=['Lot', 'Item'], how='inner')
Lot Item Code Date Shelf Emp
0 1 a c1 30 s1 e1
1 2 g c1 40 s2 e2
2 2 g c3 9 s1 e1
3 3 k c2 20 s1 e1
4 3 k c2 69 s2 e1
5 3 k c1 29 s1 e1
Then you can group by Lot and Item on this data frame to get your desired output
df_out = temp_df2.groupby(['Lot', 'Item'], as_index=False).max()
Lot Item Code Date Shelf Emp
0 1 a c1 30 s1 e1
1 2 g c3 40 s2 e2
2 3 k c2 69 s2 e1
Full code:
temp_df1 = df.groupby('Lot', as_index=False).first()[['Lot', 'Item']]
temp_df2 = df.merge(temp_df1, on=['Lot', 'Item'], how='inner')
df_out = temp_df2.groupby(['Lot', 'Item'], as_index=False).max()
Or the below without creating temporary dfs:
df_out = (df.merge(df.groupby('Lot', as_index=False).first()[['Lot', 'Item']],
on=['Lot', 'Item'], how='inner')
.groupby(['Lot', 'Item'], as_index=False).max())
CodePudding user response:
Another method:
df1 = df.groupby(['Lot']).first()['Item'].reset_index()
df1['Date'] = [df['Date'].loc[df['Item'] == x].max() for x in df1['Date']]
df1
Result:
index | Lot | Item | Date |
---|---|---|---|
0 | 1 | a | 30 |
1 | 2 | g | 40 |
2 | 3 | k | 69 |
Edit: even better using a "mask" and no temp df
m = df.groupby(['Lot']).first()['Item'].tolist()
df[df.Item.isin(m)].groupby(['Lot', 'Item'], as_index=False).max()
index | Lot | Item | Code | Date | Shelf | Emp |
---|---|---|---|---|---|---|
0 | 1 | a | c1 | 30 | s1 | e1 |
1 | 2 | g | c3 | 40 | s2 | e2 |
2 | 3 | k | c2 | 69 | s2 | e1 |