Please help me.
My dataframe looks like this:
date | account | action | ||
---|---|---|---|---|
2021-01-11 | 504 | login | ||
2021-01-11 | 504 | edit_profile | ||
2021-01-11 | 504 | logout | ||
2021-01-12 | 11 | login | ||
2021-01-12 | 11 | login | ||
2021-01-14 | 303 | edit_profile | ||
2021-01-14 | 303 | logout |
What I want to achieve is this:
date | account | login | edit_profile | logout |
---|---|---|---|---|
2021-01-11 | 504 | 1 | 1 | 1 |
2021-01-12 | 11 | 2 | 0 | 0 |
2021-01-14 | 303 | 0 | 1 | 1 |
The first thing I thought about was pivot table, but the problem is that I don't have total count for certain 'account's daily actions. How to count the number of occurrences within a time range for each value? Please help.
CodePudding user response:
For performance, try to avoid loops with pandas. There are plenty of vectorized functions:
import pandas as pd
#recreating your sample data
from io import StringIO
data1 = """
date account action
2021-01-11 504 login
2021-01-11 504 edit_profile
2021-01-11 504 logout
2021-01-12 11 login
2021-01-12 11 login
2021-01-14 303 edit_profile
2021-01-14 303 logout
"""
df = pd.read_csv(StringIO(data1), sep = "\s{2,}", engine="python")
df["date"] = pd.to_datetime(df["date"])
df1 = df.groupby(by=["date", "account", "action"]).size().unstack(fill_value=0)
print(df1)
Sample output:
action edit_profile login logout
date account
2021-01-11 504 1 1 1
2021-01-12 11 0 2 0
2021-01-14 303 1 0 1
Please note that you now have a MultiIndex dataframe. You can remove the index levels by adding .reset_index()
CodePudding user response:
Could this work?
Get the piece of dataframe during a certain period of time:
new_df = df[start_date < df['date'] < end_date]
new_df now has all the rows during a certain period of time. Get all the unique account values:
accounts = new_df['account'].unique()
Then create a for loop to go through all of the accounts action:
# Dataframe that keeps track of the actions
actions_df
for account in accounts:
# Get all the rows with certain user
user_df = new_df[new_df['account'] == account]
# We now have all the rows that has certain account in user_df
for action in user_df['action']:
if action == "login":
actions_df[action_df['account'] == account]['login'] = 1
Now we have all the actions in action_df dataframe
Hope this helps in any way!