I have a requirement where I need to group report execution counts. The input data looks like below.
REPORT,TIME_SEC
Report1,1
Report1,5
Report3,4
Report2,158
Report2,20
Report3,131
I need to group the reports and show the count of execution for each of the following time ranges '0-10sec
','10-30sec
','30-60sec
','1-2min','>2min'
.
My code is like below.
import pandas as pd
infile = "/Users/user1/data1.csv"
time_column = "TIME_SEC"
range_column = "TIME_RANGE"
groupby_column = "REPORT"
df = pd.read_csv(infile)
bin1 = [0,10,30,60,120,7200]
label1 = ['0-10sec','10-30sec','30-60sec','1-2min','>2min']
# print(f"df=\n{df}")
df[range_column] = pd.cut(df[time_column], bins=bin1, labels=label1, include_lowest=True)
print(f"df=\n{df}")
df_final = pd.crosstab(df[groupby_column], df[range_column])
df_final.columns = df_final.columns.astype(str)
df_final.reset_index(inplace=True)
print(f"df_final=\n{df_final}")
The output is like below.
df=
REPORT TIME_SEC TIME_RANGE
0 Report1 1 0-10sec
1 Report1 5 0-10sec
2 Report3 4 0-10sec
3 Report2 158 >2min
4 Report2 20 10-30sec
5 Report3 131 >2min
df_final=
TIME_RANGE REPORT 0-10sec 10-30sec >2min
0 Report1 2 0 0
1 Report2 0 1 1
2 Report3 1 0 1
bins = [ 0 10 30 60 120 7200]
Now while I do pd.cut
it doesn't list the ranges for which there's no data. Hence for the above example it doesn't list the ranges '30-60sec'
,'1-2min'
, as there's no value for these ranges.
My requirement is to print the ranges '30-60sec'
,'1-2min'
(for which no values are present) also in the final output. How can I get the '30-60sec','1-2min'
as part of pd.cut
OR any another way so that when I do pd.crosstab
then value '0'
is printed for the ranges '30-60sec'
,'1-2min'
.
I'm expecting the df_final to be like below.
TIME_RANGE REPORT 0-10sec 10-30sec 30-60sec 1-2min >2min
0 Report1 2 0 0 0 0
1 Report2 0 1 0 0 1
2 Report3 1 0 0 0 1
CodePudding user response:
You can reindex the dataframe as your last step (with your list label1
):
df = df.reindex(label1, fill_value=0, axis=1)
print(df)
Prints:
TIME_RANGE 0-10sec 10-30sec 30-60sec 1-2min >2min
0 2 0 0 0 0
1 0 1 0 0 1
2 1 0 0 0 1