I have two pandas
dataframe
, one has columns a
, b
, c
& other has column a
, b
, d
.
df1
id month c
1 1 TE
2 1 TE
1 1 NTE
2 1 NTE
df2
id month price
1 1 4
2 1 6
I want to merge these dataframes on id
& month
columns combinedly. so i did following -
df1.merge(df2, how='left', left_on=['id', 'month'], right_on=['id', 'month'])
results of above code is as expected. now, what i want is, after merge, only one pair of id
& month
should have price
, other pair have price as 0
so the result should be like
id month c price
1 1 TE 4
2 1 TE 6
1 1 NTE 0
2 1 NTE 0
Above can be done, while calling a check one each row, which i think is very expensive, n*n
complexity.
Any leads with less expense is most welcome.
CodePudding user response:
df.price *= ~df.groupby(["id", "month"]).cumcount().astype(bool)
i use .cumcount()
as a "is this first in the group" mask:
In [89]: df
Out[89]:
id month c price
0 1 1 TE 4
1 2 1 TE 6
2 1 1 NTE 4
3 2 1 NTE 6
In [90]: df.groupby(["id", "month"]).cumcount()
Out[90]:
0 0
1 0
2 1
3 1
dtype: int64
In [91]: ~_.astype(bool)
Out[91]:
0 True
1 True
2 False
3 False
dtype: bool
In [92]: df.price *= _
In [93]: df
Out[93]:
id month c price
0 1 1 TE 4
1 2 1 TE 6
2 1 1 NTE 0
3 2 1 NTE 0
CodePudding user response:
merged = df1.merge(df2, how='left', on=['id', 'month'])
def f(x):
x.iloc[1:] = 0
return x
merged['price'] = merged.groupby(['id', 'month'])['price'].transform(f)
Note: if they are the same, you don't have to specify
left_on
and right_on
separately.
CodePudding user response:
You can use a cumcount
as grouper and fill the rest:
cols = ['id', 'month']
(df1.assign(rank=df1.groupby(cols).cumcount())
.merge(df2.assign(rank=0), how='left',
on=cols ['rank'])
.fillna({'price': 0}, downcast='infer')
.drop(columns='rank')
)
output:
id month c price
0 1 1 TE 4
1 2 1 TE 6
2 1 1 NTE 0
3 2 1 NTE 0
different column names
cols_left = ['id', 'month']
cols_right = ['id', 'month']
(df1.assign(rank=df1.groupby(cols_left).cumcount())
.merge(df2.assign(rank=0), how='left',
left_on=cols_left ['rank'],
right_on=cols_right ['rank'])
.fillna({'price': 0}, downcast='infer')
.drop(columns='rank')
)