I am trying to find the number of times a certain value "PAY" appears in a hourly range column.
I have made the dataframe with pandas:
df = pd.read_json('test.json')
print(df.head(3))
print(df.dtypes)
TransactionCode Date
1 PAY 2021-12-09T10:23:29.242 01:00
2 PAY 2021-12-09T10:23:02.978 01:00
3 PAY 2021-12-09T10:22:48.659 01:00
TransactionCode object
Date object
After I split the column Date in two column, "Date" and "Time":
df['Time'] = pd.to_datetime(df['Date']).dt.time
df['Date'] = pd.to_datetime(df['Date']).dt.date
print(df.head())
print(df.dtypes)
TransactionCode Date Time
1 PAY 2021-12-09 10:23:29.242000
2 PAY 2021-12-09 10:23:02.978000
3 PAY 2021-12-09 10:22:48.659000
4 PAY 2021-12-09 11:32:48.659000
5 PAY 2021-12-09 11:45:12.659000
TransactionCode object
Date object
Time object
I wish to iterate over column "Time" to know for each day how many times appear the "PAY" value. I need this to build a hourly turnout plot with Matplotlib.
CodePudding user response:
You could convert to datetime, set_index
on Date and resample
to hours, finally plot:
df2 = (
df.assign(Date=pd.to_datetime(df['Date']))
.set_index('Date')
.loc[lambda d: d['TransactionCode'].eq('PAY')]
.resample('1H').count()
)
output:
TransactionCode
Date
2021-12-09 10:00:00 01:00 3
2021-12-09 11:00:00 01:00 2
Plot:
df2.plot.bar()
CodePudding user response:
Considering the following dataframe:
TransactionCode Date Time
1 PAY 2021-12-09 10:23:29.242000
2 PAY 2021-12-09 10:23:02.978000
3 PAY 2021-12-09 10:22:48.659000
4 ERR 2021-12-09 11:32:48.659000
5 PAY 2021-12-09 11:45:12.659000
You could do this:
df['Hour'] = [time.hour for time in df['Time']]
df[df['TransactionCode'] == 'PAY'][['TransactionCode','Hour']].groupby('Hour').count().plot(kind='bar');