For context, I am trying to measure using regression whether the presence of a competitor ad in a given week affects the metrics of an ad. I am not sure how to consolidate the weeks , or assign Boolean values (1 or 0) based on the presence of a brand during a week, but on a different row.
import pandas as pd
df = pd.DataFrame({'week': ['2019-11-11', '2019-11-11', '2019-11-18', '2019-11-25', '2019-11-11', '2019-11-18', '2019-11-11'],
'brand':['X', 'X-2', 'X', 'X', 'Y', 'Y', 'Z'],
'score': [.34, .25, .54, .23, .22, .34, .44]})
Desired result:
df = pd.DataFrame({'week': ['2019-11-11', '2019-11-11', '2019-11-18', '2019-11-25', '2019-11-11', '2019-11-18', '2019-11-11'],
'brand':['X', 'X-2', 'X', 'X', 'Y', 'Y', 'Z'],
'score': [.34, .25, .54, .23, .22, .34, .44],
'presence_dummy_Y': [1, 1, 1, 0, 1, 1, 1],
'presence_dummy_Z': [1, 1, 0, 0, 1, 0, 1]})
CodePudding user response:
You can get_dummies
, filter the competitors with loc
/filter
, get 1 if there is at least a 1 in the week with GroupBy.max
:
df.join(pd
.get_dummies(df['brand']) # transform to dummies
.filter(regex='^(?!X)') # keep only brands not starting with X
.groupby(pd.to_datetime(df['week']).dt.to_period('W')) # groupby week
.transform('max') # 1 if at least a 1
.add_prefix('dummy_') # rename columns
)
output:
week brand score dummy_Y dummy_Z
0 2019-11-11 X 0.34 1 1
1 2019-11-11 X-2 0.25 1 1
2 2019-11-18 X 0.54 1 0
3 2019-11-25 X 0.23 0 0
4 2019-11-11 Y 0.22 1 1
5 2019-11-18 Y 0.34 1 0
6 2019-11-11 Z 0.44 1 1
CodePudding user response:
Let's try
out = df.join(df['brand'].str.get_dummies()
.groupby(df['week']).transform('any').astype(int)
.pipe(lambda df: df.filter(regex='Y|Z'))
.add_prefix('presence_dummy_'))
print(out)
week brand score presence_dummy_Y presence_dummy_Z
0 2019-11-11 X 0.34 1 1
1 2019-11-11 X-2 0.25 1 1
2 2019-11-18 X 0.54 1 0
3 2019-11-25 X 0.23 0 0
4 2019-11-11 Y 0.22 1 1
5 2019-11-18 Y 0.34 1 0
6 2019-11-11 Z 0.44 1 1