New to Python.
I have 2 dataframes
The first is called df_my and it looks like this
ICD Distance ID Grp
-------------------------
AB 0 1 1
JG 0 4 1
NH 1 5 2
UY 1 7 1
SS 2 9 1
ED 2 14 2
KK 2 15 3
QR 3 18 3
XZ 3 19 1
FD 4 22 1
LN 5 24 4
TT 5 31 2
the other dataframe called df_Map and looks like this
0 1
--------
JG 838
LN 228
SS 699
FD 100
ED 483
QR 594
AB 349
NH 777
KK 505
TT 866
What I am trying to do is this
First, to get the first record of each group from df_my ordered by id or records with distance bigger than 3
so it should become like this
ICD Distance ID Grp
-------------------------
AB 0 1 1 <== First of its group
NH 1 5 2 <== First of its group
KK 2 15 3 <== First of its group
FD 4 22 1 <== Distance > 3
LN 5 24 4 <== Distance > 3
TT 5 31 2 <== Distance > 3
after that I want to match it with df_Map
so it will be
ICD Distance ID Grp Map
--------------------------------
AB 0 1 1 349
NH 1 5 2 777
KK 2 15 3 505
FD 4 22 1 100
LN 5 24 4 228
TT 5 31 2 866
I wonder how to do that
I started with this
df_Final = df_my(Distance>3)
but I did not know how to start with the first record in the group?
CodePudding user response:
Assuming pandas, you can use masks for boolean indexing, then merge
:
m1 = df_my.groupby('Grp').cumcount().eq(0)
m2 = df_my['Distance'].gt(3)
out = df_my[m1|m2].merge(df_Map.set_axis(['ICD', 'Map'], axis=1),
on='ICD')
Output:
ICD Distance ID Grp Map
0 AB 0 1 1 349
1 NH 1 5 2 777
2 KK 2 15 3 505
3 FD 4 22 1 100
4 LN 5 24 4 228
5 TT 5 31 2 866