I'm currently working with large .csv files with dates assigned to column headers, repeat metrics types, and client ID's.
Here is a small sample of data similar to what I am working with...
data = {'Client Id': {0: 123987, 1: 123987, 2: 123987, 3: 9871234, 4: 9871234, 5: 9871234},
'Metrics': {0: 'Spend', 1: 'Views', 2: 'Clicks', 3: 'Spend', 4: 'Views', 5: 'Clicks'},
'3/1/22': {0: '$0.00', 1: '6', 2: '1', 3: '$20.00', 4: '3', 5: '1'},
'3/2/22': {0: '$0.00', 1: '3', 2: '2', 3: '$10.00', 4: '2', 5: '2'},
'3/3/22': {0: '$0.00', 1: '3', 2: '3', 3: '$3.00', 4: '5', 5: '4'}}
df = pd.DataFrame(data)
Currently, my dataframe looks like this...
Client Id | Metrics | 3/1/22 | 3/2/22 | 3/3/22 |
---|---|---|---|---|
123987 | Spend | $0.00 | $0.00 | $0.00 |
123987 | Views | 6 | 3 | 3 |
123987 | Clicks | 1 | 2 | 3 |
9871234 | Spend | $20.00 | $10.00 | $3.00 |
9871234 | Views | 3 | 2 | 5 |
9871234 | Clicks | 1 | 2 | 4 |
I need to pivot the date columns and set them as rows in the first col. I also need to break out the unique 'Metrics' types and display them as col headers (with each row displaying their corresponding values).
My goal is to have my final DF look something like this...
Date | Client Id | Spend | Views | Clicks |
---|---|---|---|---|
3/1/2022 | 123987 | $0.00 | 6 | 1 |
3/2/2022 | 1234567 | $0.00 | 3 | 2 |
3/3/2022 | 1234567 | $0.00 | 3 | 3 |
3/1/2022 | 9871234 | $20.00 | 3 | 1 |
3/2/2022 | 9871234 | $10.00 | 2 | 2 |
3/3/2022 | 9871234 | $3.00 | 5 | 4 |
I tried using df.melt() to no avail, but possibly I am missing something?
Any help is appreciated!
CodePudding user response:
First melt the date columns, then pivot:
new_df = (
df
.melt(id_vars=['Client Id', 'Metrics'], var_name='Date')
.pivot(columns='Metrics', index=['Client Id', 'Date'], values='value')
.reset_index()
.rename_axis(None, axis=1)
)
Output:
>>> new_df
Client Id Date Clicks Spend Views
0 123987 3/1/22 1 $0.00 6
1 123987 3/2/22 2 $0.00 3
2 123987 3/3/22 3 $0.00 3
3 9871234 3/1/22 1 $20.00 3
4 9871234 3/2/22 2 $10.00 2
5 9871234 3/3/22 4 $3.00 5
CodePudding user response:
You can use:
(df.pivot(index='Client Id', columns='Metrics')
.stack(0)
.reset_index()
.rename({'level_1': 'Date'})
.rename_axis(None, axis=1)
)
output:
Client Id level_1 Clicks Spend Views
0 123987 3/1/22 1 $0.00 6
1 123987 3/2/22 2 $0.00 3
2 123987 3/3/22 3 $0.00 3
3 9871234 3/1/22 1 $20.00 3
4 9871234 3/2/22 2 $10.00 2
5 9871234 3/3/22 4 $3.00 5