I have a big date file that I'm trying to extract data from. I have two columns Start Time
& Date
What I would like to do is display each Date followed by each Start Time followed by a count of each of those start times. So the output would look like this:
Date Start Time
30/12/2021 15:00 2
30/12/2021 16:00 6
30/12/2021 17:00 3
This is what I've tried:
df = pd.read_excel(xls)
counter = df['Start Time'].value_counts()
date_counter = df['Date'].value_counts()
total = (df['Start Time']).groupby(df['Date']).sum()
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(total)
input()
But this outputs like this:
Date Start Time
30/12/2021 15:0016:0016:0017:0018:0018:00
Any suggestions are much appreciated!
CodePudding user response:
You're only grouping by 1 column. You need to group-by both columns and get the count using size()
df.groupby(['Date', 'Start Time']).size()
CodePudding user response:
You can value count with the 2 keys
counts = df[['Date','Start Time']].value_counts()
for this input
Date Start Time
0 30/12/21 15:00
1 30/12/21 16:00
2 31/12/21 15:00
3 30/12/21 15:00
4 31/12/21 16:00
5 30/12/21 18:00
6 30/12/21 13:00
7 31/12/21 15:00
throws
Date Start Time
31/12/21 15:00 2
30/12/21 15:00 2
31/12/21 16:00 1
30/12/21 18:00 1
16:00 1
13:00 1
CodePudding user response:
Some other ways that you can get the frequency of a combination of columns:
pd.value_counts()
df.value_counts(["Date", "Start Time"])
pd.crosstab()
pd.crosstab(data['Date'],data['Start Time'])
pd.pivot_table()
df.pivot_table(index=['Date','Start Time'], aggfunc='size')