I have a DataFrame with items that belongs to a document issue. For example, doc1 issue A has items 1 and 2; doc1 issue B has items 5 and 8.
df_source=pd.DataFrame([('doc1','A',1,4),('doc1','A',2,0),('doc1','B',5,6),('doc1','B',8,6), ('doc1','C',8,4),('doc1','C',4,4), ('doc2','B',0,5),('doc2','B',1,5), ('doc3','B',5,6),('doc3','K',4,4),('doc3','K',10,4)], columns=['Doc_name','Doc_Issue','item','prop2'])
Doc_name Doc_Issue item prop2
0 doc1 A 1 4
1 doc1 A 2 0
2 doc1 B 5 6
3 doc1 B 8 6
4 doc1 C 8 4
5 doc1 C 4 4
6 doc2 B 0 5
7 doc2 B 1 5
8 doc3 B 5 6
9 doc3 K 4 4
10 doc3 K 10 4
I would like to filter the DataFrame so I get only the items at all latest issue:
df_result=pd.DataFrame([('doc1','C',8,4),('doc1','C',4,4), ('doc2','B',0,5),('doc2','B',1,5), ('doc3','K',4,4),('doc3','K',10,4)], columns=['Doc_name','Doc_Issue','item','prop2'])
Doc_name Doc_Issue item prop2
0 doc1 C 8 4
1 doc1 C 4 4
2 doc2 B 0 5
3 doc2 B 1 5
4 doc3 K 4 4
5 doc3 K 10 4
CodePudding user response:
In your case do transform
last
out = df[df.Doc_Issue ==df.groupby('Doc_name')['Doc_Issue'].transform('last')]
Out[60]:
Doc_name Doc_Issue item prop2
4 doc1 C 8 4
5 doc1 C 4 4
6 doc2 B 0 5
7 doc2 B 1 5
9 doc3 K 4 4
10 doc3 K 10 4
CodePudding user response:
Get the last two in each group and filter them using the loc accessor. Pass the index values to use the loc accccesor. Code below
df_source.loc[df_source.groupby('Doc_name')['Doc_Issue'].tail(2).index.get_level_values(0),:]
Doc_name Doc_Issue item prop2
4 doc1 C 8 4
5 doc1 C 4 4
6 doc2 B 0 5
7 doc2 B 1 5
9 doc3 K 4 4
10 doc3 K 10 4