Home > Blockchain >  Calculate hourly turnout with pandas and Matplotlib
Calculate hourly turnout with pandas and Matplotlib

Time:02-18

I am trying to find the number of times a certain value "PAY" appears in a hourly range column.

I have made the dataframe with pandas:

df = pd.read_json('test.json')
print(df.head(3))
print(df.dtypes)
  TransactionCode                           Date  
1             PAY  2021-12-09T10:23:29.242 01:00  
2             PAY  2021-12-09T10:23:02.978 01:00  
3             PAY  2021-12-09T10:22:48.659 01:00 

TransactionCode       object
Date                  object 

After I split the column Date in two column, "Date" and "Time":

df['Time'] = pd.to_datetime(df['Date']).dt.time
df['Date'] = pd.to_datetime(df['Date']).dt.date
print(df.head())
print(df.dtypes)
  TransactionCode        Date             Time
1             PAY  2021-12-09  10:23:29.242000
2             PAY  2021-12-09  10:23:02.978000
3             PAY  2021-12-09  10:22:48.659000
4             PAY  2021-12-09  11:32:48.659000
5             PAY  2021-12-09  11:45:12.659000


TransactionCode       object
Date                  object
Time                  object

I wish to iterate over column "Time" to know for each day how many times appear the "PAY" value. I need this to build a hourly turnout plot with Matplotlib.

CodePudding user response:

You could convert to datetime, set_index on Date and resample to hours, finally plot:

df2 = (
 df.assign(Date=pd.to_datetime(df['Date']))
   .set_index('Date')
   .loc[lambda d: d['TransactionCode'].eq('PAY')]
   .resample('1H').count()
)

output:

                           TransactionCode
Date                                      
2021-12-09 10:00:00 01:00                3
2021-12-09 11:00:00 01:00                2

Plot:

df2.plot.bar()

enter image description here

CodePudding user response:

Considering the following dataframe:

    TransactionCode     Date   Time
1   PAY 2021-12-09  10:23:29.242000
2   PAY 2021-12-09  10:23:02.978000
3   PAY 2021-12-09  10:22:48.659000
4   ERR 2021-12-09  11:32:48.659000
5   PAY 2021-12-09  11:45:12.659000

You could do this:

df['Hour'] = [time.hour for time in df['Time']]
df[df['TransactionCode'] == 'PAY'][['TransactionCode','Hour']].groupby('Hour').count().plot(kind='bar');

enter image description here

  • Related