I have a pandas dataframe that contains four date columns, a starttime and an end time and date column that defines a range. I'd like to be able to collectively create a queue count for all time and date across all rows in the data frame, as defined by these columns.
date start
1. date starttime endtime storeid
2. 2/3/20 6:20 pm 7:20 pm 12231
3. 2/3/20 6:25pm 7:25 pm 12231
4. 2/3/20 6:29pm 7:40 pm 12231
5. 2/3/20 7:21pm 7:59pm 12231
6. 2/3/20 6:21pm 7:21 pm 12232
I wants to generate queue based column that will give me how many queue are there before the current one in below way:
date start
1. date starttime endtime storeid queue
2. 2/3/20 6:20 pm 7:20 pm 12231 1
3. 2/3/20 6:25pm 7:25 pm 12231 2
4. 2/3/20 6:29pm 7:40 pm 12231 3
5. 2/3/20 7:21pm 7:59 pm 12231 2
6. 2/3/20 6:21pm 7:21 pm 12232 1
I am new to this and any help would be really appreciated.
CodePudding user response:
The size of a queue at any point is a step function. There is a package called
You can also add them together to create the combined queue
stepfunctions[12231] stepfunctions[12232]
# or
stepfunctions.sum()
You can sample the queue size at any point(s) you like. It seems like you want the queue size sampled at each start time. For store 12231 you can do this like so
stepfunctions[12231](data["starts"])
which gives you
array([1, 2, 3, 3], dtype=int64)
In order to get add this column to your dataframe it will be easier to do this in the same step as creating the step function in the groupby-apply.
So the entire solution (after imports) is
def queue_count_sample(df):
sf = sc.Stairs(df, "start", "end")
df["queue"] = sf(df["start"])
return df
data.groupby("storeid").apply(queue_count_sample)