I have a dataframe below:
df = pd.DataFrame({'channels' : ['EMAIL','FAX','MAIL','PHONE','Marketing','SMS','VISIT','Profiling','Approved_Email','EMAIL','FAX','MAIL','PHONE','Marketing','SMS','VISIT','Profiling','Approved_Email_vod'],
'ID' : [1001, 1002, 1003, 1004, 1005, 1006, 1001, 1002, 1003, 1004, 1005, 1006, 1001, 1002, 1003, 1004, 1005, 1006],
'INTR_COUNT' : [1,1,1,1,1,1,1,2,3,4,5,6,1,2,3,4,5,6],
'PERSONA' : ['A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B','C']})
What i want to do is define a function that will take a df such as the above and create new columns based on the unique categories in 'channels'.
ID | FAX | PHONE | SMS | VISIT | Marketing | Approved_email | Persona | ||
---|---|---|---|---|---|---|---|---|---|
1001 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | A |
And so on for the rest of the IDs
I've written this function so far:
def channel_pivot(df: pd.DataFrame):
#where df is the df stated above
x = df
#subsetting to pivot only on focus cols
y = df[['channels', 'INTR', 'HCP']]
#pivot operation
y = set_index('ID')
y1 = y.pivot(columns='channels', values = sum('INTR')).apply(lambda x: pd.Series(x.dropna().values))
df1 = y1.merge(x, left_index=True, right_on='ID')
return df1
I can't get the sum function to work inside the pivot function though, so how can I sum up the count of interactions for a given ID for each channel?
CodePudding user response:
You can use .groupby()
with .agg()
:
x = (
df.groupby(["ID", "channels"])
.agg({"INTR_COUNT": "sum", "PERSONA": "first"})
.set_index("PERSONA", append=True)
.unstack(level=1)
.droplevel(0, axis=1)
.fillna(0)
.astype(int)
.reset_index()
)
x.columns.name = None
print(x)
Prints:
ID | PERSONA | Approved_Email | Approved_Email_vod | FAX | Marketing | PHONE | Profiling | SMS | VISIT | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1001 | A | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
1 | 1002 | B | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 2 | 0 | 0 |
2 | 1003 | C | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 3 | 0 |
3 | 1004 | A | 0 | 0 | 4 | 0 | 0 | 0 | 1 | 0 | 0 | 4 |
4 | 1005 | B | 0 | 0 | 0 | 5 | 0 | 1 | 0 | 5 | 0 | 0 |
5 | 1006 | C | 0 | 6 | 0 | 0 | 6 | 0 | 0 | 0 | 1 | 0 |