Home > OS >  How to make a pivot table from a dataframe with multiple columns?
How to make a pivot table from a dataframe with multiple columns?

Time:02-14

Please help me.

My dataframe looks like this:

date account action
2021-01-11 504 login
2021-01-11 504 edit_profile
2021-01-11 504 logout
2021-01-12 11 login
2021-01-12 11 login
2021-01-14 303 edit_profile
2021-01-14 303 logout

What I want to achieve is this:

date account login edit_profile logout
2021-01-11 504 1 1 1
2021-01-12 11 2 0 0
2021-01-14 303 0 1 1

The first thing I thought about was pivot table, but the problem is that I don't have total count for certain 'account's daily actions. How to count the number of occurrences within a time range for each value? Please help.

CodePudding user response:

For performance, try to avoid loops with pandas. There are plenty of vectorized functions:

import pandas as pd

#recreating your sample data
from io import StringIO
data1 = """
date         account   action         
2021-01-11     504     login         
2021-01-11     504     edit_profile         
2021-01-11     504     logout         
2021-01-12     11      login         
2021-01-12     11      login         
2021-01-14     303     edit_profile         
2021-01-14     303     logout
"""
df = pd.read_csv(StringIO(data1), sep = "\s{2,}", engine="python")
df["date"] = pd.to_datetime(df["date"])


df1 = df.groupby(by=["date", "account", "action"]).size().unstack(fill_value=0)

print(df1) 

Sample output:

action              edit_profile  login  logout
date       account                             
2021-01-11 504                 1      1       1
2021-01-12 11                  0      2       0
2021-01-14 303                 1      0       1

Please note that you now have a MultiIndex dataframe. You can remove the index levels by adding .reset_index()

CodePudding user response:

Could this work?

Get the piece of dataframe during a certain period of time:

new_df = df[start_date < df['date'] < end_date]

new_df now has all the rows during a certain period of time. Get all the unique account values:

accounts = new_df['account'].unique()

Then create a for loop to go through all of the accounts action:

# Dataframe that keeps track of the actions
actions_df 

for  account in accounts:
    # Get all the rows with certain user
    user_df = new_df[new_df['account'] == account]
    # We now have all the rows that has certain account in user_df
    for action in user_df['action']:
        if action == "login":
            actions_df[action_df['account'] == account]['login']  = 1

Now we have all the actions in action_df dataframe

Hope this helps in any way!

  • Related