Home > Blockchain >  Counting values by two columns
Counting values by two columns

Time:10-23

I have a big date file that I'm trying to extract data from. I have two columns Start Time & Date What I would like to do is display each Date followed by each Start Time followed by a count of each of those start times. So the output would look like this:

Date         Start Time     
30/12/2021   15:00          2
30/12/2021   16:00          6
30/12/2021   17:00          3

This is what I've tried:

df = pd.read_excel(xls)

counter = df['Start Time'].value_counts()
date_counter = df['Date'].value_counts()
total = (df['Start Time']).groupby(df['Date']).sum()
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(total)

input()

But this outputs like this:

Date         Start Time     
30/12/2021   15:0016:0016:0017:0018:0018:00        

Any suggestions are much appreciated!

CodePudding user response:

You're only grouping by 1 column. You need to group-by both columns and get the count using size()

df.groupby(['Date', 'Start Time']).size()

CodePudding user response:

You can value count with the 2 keys

counts = df[['Date','Start Time']].value_counts()

for this input

     Date Start Time
0  30/12/21      15:00
1  30/12/21      16:00
2  31/12/21      15:00
3  30/12/21      15:00
4  31/12/21      16:00
5  30/12/21      18:00
6  30/12/21      13:00
7  31/12/21      15:00

throws

Date      Start Time
31/12/21  15:00         2
30/12/21  15:00         2
31/12/21  16:00         1
30/12/21  18:00         1
          16:00         1
          13:00         1

CodePudding user response:

Some other ways that you can get the frequency of a combination of columns:

  • pd.value_counts()
df.value_counts(["Date", "Start Time"])
  • pd.crosstab()
pd.crosstab(data['Date'],data['Start Time']) 
  • pd.pivot_table()
df.pivot_table(index=['Date','Start Time'], aggfunc='size')
  • Related