I have a .csv output dictating the confirmed times every ticket in a day was created. I am trying to get total tickets confirmed for each person for each day. Here is sample data where the actual # of rows is in the 1000s:
Date_Created,Confirmed_By
4/6/2022 10:35,Bob
4/6/2022 11:39,Bob
4/6/2022 12:19,Tim
4/7/2022 7:08,Bob
4/7/2022 7:30,Sally
4/7/2022 7:35,Bob
4/8/2022 8:09,Sally
4/8/2022 8:28,Jack
4/8/2022 8:40,Jack
4/8/2022 10:00,Sally
4/8/2022 10:23,Jack
I would like to see a table with the first column as a list of dates and the subsequent columns being names with the total confirmations for each date associated:
Dates | Bob | Tim | Jack | Sally |
---|---|---|---|---|
2022-04-06 | 2 | 1 | 0 | 0 |
2022-04-07 | 2 | 0 | 0 | 1 |
2022-04-08 | 0 | 0 | 3 | 2 |
I tried pulling the csv into a dataframe and using value_counts() to create series for individual people, but I was never going to be 100% on the names as we would have random fillins that wouldn't be accounted for.
Bob = df[df["Confirmed By"].str.contains("Bob")]
Bob_Days = Bob["Date_Confirmed"].value_counts().rename("Bob")
conf_per_day = pd.concat(
[Bob_Days, Tim_Days, Jack_Days, Sally_Days], axis=1
)
Then I tried a df.value_count().to_frame('counts').reset_index()
which gave dates, names, and totals each on individual lines, but could not figure out how to spin those names into columns while not having repeating dates. Any ideas for totaling confirmations per person per day?
CodePudding user response:
Use crosstab
>>> pd.crosstab(df['Date_Created'].dt.date, df['Confirmed_By'])
Confirmed_By Bob Jack Sally Tim
Date_Created
2022-04-06 2 0 0 1
2022-04-07 2 0 1 0
2022-04-08 0 3 2 0
CodePudding user response:
I would do something like that:
import pandas as pd
df = pd.read_csv('your_test_data.csv')
# create date column
df['Date'] = pd.to_datetime(df['Date_Created']).dt.date
# pivot table with counting
result = df.pivot_table(index='Date', columns='Confirmed_By', aggfunc='count').fillna(0)
CodePudding user response:
One possibile way is to use resample()
and value_counts()
:
df['Date_Created'] = pd.to_datetime(df['Date_Created'])
df.resample('D',on = 'Date_Created')['Confirmed_By'].value_counts().unstack().fillna(0)
Output:
Confirmed_By Bob Jack Sally Tim
Date_Created
2022-04-06 2.0 0.0 0.0 1.0
2022-04-07 2.0 0.0 1.0 0.0
2022-04-08 0.0 3.0 2.0 0.0