I have two dataframes, the first with id
and cnpj
field and the second with just cnpj
. This second dataframe can have new values in the cnpj
column so I need to create a new dataframe with the cpfs that it doesn't have in the first dataframe and add 1 to the id after the maximum value.
How can I do this?
Dataframe 1:
id | cnpj |
---|---|
11 | 552030405560 |
12 | 709050358572 |
13 | 999680558533 |
14 | 585680558542 |
Dataframe 2:
cnpj |
---|
552030405560 |
709050358572 |
999680558533 |
585680558542 |
658540805552 |
498546543258 |
Expected output:
id | cnpj |
---|---|
11 | 552030405560 |
12 | 709050358572 |
13 | 999680558533 |
14 | 585680558542 |
15 | 658540805552 |
16 | 498546543258 |
CodePudding user response:
IIUC, you can use merge
and fill with a custom function:
def fillmax(s):
MAX = s.max()
mask = s.isna()
return s.fillna(MAX mask.cumsum().where(mask, 0)).convert_dtypes()
out = (df1
.merge(df2, on='cnpj', how='right')
.assign(id=lambda d: fillmax(d['id']))
)
output:
01
id cnpj
0 11 552030405560
1 12 709050358572
2 13 999680558533
3 14 585680558542
4 15 658540805552
5 16 498546543258
CodePudding user response:
you can do a merge and add it to the index
df_final = pd.merge(df2, df1, how = 'left').reset_index()
df_final['index'] = df_final['index'].add(df_final.iloc[0]['id']).astype(int)
df_final
CodePudding user response:
You could right merge on cnpj
and then fill the null values ias needed:
merged = df1.merge(df2, how="right")
merged["id"] = merged["id"].where(merged["id"].notnull(), merged["id"].isnull().cumsum().add(merged["id"].max()))
>>> merged
id cnpj
0 11.0 552030405560
1 12.0 709050358572
2 13.0 999680558533
3 14.0 585680558542
4 15.0 658540805552
5 16.0 498546543258