I have 2 dataframes
df_My which looks like this
Seq Grp
-----------
1 1
2 1
3 2
4 1
5 2
6 2
7 3
8 3
9 1
.
.
135 3
136 1
137 1
138 2
139 3
The other dataframe is df_Filer
df_Filer
Seq Grp Alg
-------------------
1 1 J
3 2 F
7 3 Z
136 1 M
137 1 R
138 2 T
139 3 Y
I want to merge both dataframes into 1 where
1- if df_My Seq number has is already in df_Filer then it should merge with that number is df_My based on Seq in both df_My and df_Filter
2- if df_My Seq number is not in df_Filer then it should join with the smallest Seq in the same Grp
so the end result should be as follow
Seq Grp Alg
------------------
1 1 J
2 1 J
3 2 F
4 1 J
5 2 F
6 2 F
7 3 Z
8 3 Z
9 1 J
.
.
135 3 Z
136 1 M
137 1 R
138 2 T
139 3 Y
I tried this but it does not give me the expected result
df_Out = df_My.merge(df_Filter, axis=1), on='Grp')
Any idea how to achieve this?
CodePudding user response:
As @sammywemmy suggested, you could use merge_asof
:
out = pd.merge_asof(df1, df2, on='Seq', by='Grp')
Another options is, you can left-merge
, then use groupby
idxmin
to create a mapping from Grp to Alg for the missing values, then fill:
df_merged = df1.merge(df2, on='Seq', how='left', suffixes=('','_')).drop(columns=['Grp_'])
no_nan = df_merged.dropna()
mapping = df_merged['Grp'].map(no_nan.loc[no_nan.groupby('Grp')['Seq'].idxmin(), ['Grp','Alg']].set_index('Grp')['Alg'])
df_merged['Alg'] = df_merged['Alg'].fillna(mapping)
Output:
Seq Grp Alg
0 1 1 J
1 2 1 J
2 3 2 F
3 4 1 J
4 5 2 F
5 6 2 F
6 7 3 Z
7 8 3 Z
8 9 1 J
9 135 3 Z
10 136 1 M
11 137 1 R
12 138 2 T
13 139 3 Y