I am working with some vehicular detection data and am looking at extracting the number of vehicles that are detected during a green light and the number detected during a red light. What is the most efficient way of extracting % of vehicles detected between a Green and a Red light vs all vehicles detected?
- Green light start is Event Code = 1,
- Red light start is Event Code = 10,
- Vehicle detected is Event Code = 82
CSV example:
Signal Id,Timestamp,Event Code,Event Parameter
14,2022-08-01 13:10:49.600,1,8
14,2022-08-01 13:10:52.500,82,32
14,2022-08-01 13:10:58.000,82,32
14,2022-08-01 13:11:01.200,82,32
14,2022-08-01 13:11:03.700,82,32
14,2022-08-01 13:11:04.200,82,32
14,2022-08-01 13:11:10.100,82,32
14,2022-08-01 13:11:16.000,82,32
14,2022-08-01 13:11:45.500,10,8
14,2022-08-01 13:12:10.200,82,32
14,2022-08-01 13:12:19.300,82,32
14,2022-08-01 13:12:30.300,82,32
14,2022-08-01 13:12:46.600,1,8
14,2022-08-01 13:12:51.400,82,32
14,2022-08-01 13:13:35.600,82,32
14,2022-08-01 13:13:42.800,10,8
14,2022-08-01 13:13:52.000,82,32
14,2022-08-01 13:13:57.000,82,32
14,2022-08-01 13:14:03.300,82,32
14,2022-08-01 13:14:04.500,82,32
14,2022-08-01 13:14:09.300,1,8
14,2022-08-01 13:14:29.800,82,32
14,2022-08-01 13:14:42.200,82,32
14,2022-08-01 13:14:46.000,82,32
14,2022-08-01 13:14:47.400,82,32
14,2022-08-01 13:15:36.800,10,8
For this snippet it would be 13 Green and 7 Red for a 65% of vehicles arriving on a Green light.
I broke out the file into a single direction as my first processes involved adding a column, parsing the csv row by row, flipping a boolean value back and forth every time it passed a Code 1 or 10 and denoting a value in the new column next to each detection. This seemed very rudimentary and thought that pandas might have a better way to perform a calculation. I explored the groupby() method, but figured I would need to modify one of the detector numbers which would involve parsing row by row again modifying numbers. Is there a better, more efficient way of pulling this data out?
CodePudding user response:
You can use boolean masking and value_counts
:
m = df['Event Code'].ne(82) # or .isin([1, 10])
out = (df['Event Code'].where(m).ffill()[~m]
.map({1: 'Green', 10: 'Red'})
.value_counts()
)
output:
Green 13
Red 7
Name: Event Code, dtype: int64
With .value_counts(normalize=True)
:
Green 0.65
Red 0.35
Name: Event Code, dtype: float64
CodePudding user response:
You can try this.
df = pd.DataFrame(data, columns=columns)
df['Event Type'] = np.NAN
df.loc[df['Event Code'] == 1, 'Event Type'] = 'green light start'
df.loc[df['Event Code'] == 10, 'Event Type'] = 'red light start'
df = df.fillna(method='ffill')
cars_on_green_light = df[(df['Event Type'] == 'green light start') & (df['Event Code'] != 1)].shape[0]
cars_on_red_light = df[(df['Event Type'] == 'red light start') & (df['Event Code'] != 10)].shape[0]
total_cars_arriving = df[df['Event Code'] == 82].shape[0]
percent_green_cars = cars_on_green_light / total_cars_arriving * 100
percent_red_cars = cars_on_red_light / total_cars_arriving * 100
print(f"""
cars_on_green_light : {cars_on_green_light}
cars_on_red_light : {cars_on_red_light}
total_cars_arriving : {total_cars_arriving}
percent_green_cars : {percent_green_cars}
percent_red_cars : {percent_red_cars}
""")
OUT:
cars_on_green_light : 13
cars_on_red_light : 7
total_cars_arriving : 20
percent_green_cars : 65.0
percent_red_cars : 35.0