I have some data of an owl being present in the nest box. In a previous question you helped me visualize when the owl is in the box:
In addition I created a plot of the hours per day spent in the box with the code below (probably this can be done more efficiently):
import pandas as pd
import matplotlib.pyplot as plt
# raw data indicating time spent in box (each row represents start and end time)
time = pd.DatetimeIndex(["2021-12-01 18:08","2021-12-01 18:11",
"2021-12-02 05:27","2021-12-02 05:29",
"2021-12-02 22:40","2021-12-02 22:43",
"2021-12-03 19:24","2021-12-03 19:27",
"2021-12-06 18:04","2021-12-06 18:06",
"2021-12-07 05:28","2021-12-07 05:30",
"2021-12-10 03:05","2021-12-10 03:10",
"2021-12-10 07:11","2021-12-10 07:13",
"2021-12-10 20:40","2021-12-10 20:41",
"2021-12-12 19:42","2021-12-12 19:45",
"2021-12-13 04:13","2021-12-13 04:17",
"2021-12-15 04:28","2021-12-15 04:30",
"2021-12-15 05:21","2021-12-15 05:25",
"2021-12-15 17:40","2021-12-15 17:44",
"2021-12-15 22:31","2021-12-15 22:37",
"2021-12-16 04:24","2021-12-16 04:28",
"2021-12-16 19:58","2021-12-16 20:09",
"2021-12-17 17:42","2021-12-17 18:04",
"2021-12-17 22:19","2021-12-17 22:26",
"2021-12-18 05:41","2021-12-18 05:44",
"2021-12-19 07:40","2021-12-19 16:55",
"2021-12-19 20:39","2021-12-19 20:52",
"2021-12-19 21:56","2021-12-19 23:17",
"2021-12-21 04:53","2021-12-21 04:59",
"2021-12-21 05:37","2021-12-21 05:39",
"2021-12-22 08:06","2021-12-22 17:22",
"2021-12-22 20:04","2021-12-22 21:24",
"2021-12-22 21:44","2021-12-22 22:47",
"2021-12-23 02:20","2021-12-23 06:17",
"2021-12-23 08:07","2021-12-23 16:54",
"2021-12-23 19:36","2021-12-23 23:59:59",
"2021-12-24 00:00","2021-12-24 00:28",
"2021-12-24 07:53","2021-12-24 17:00",
])
# create dataframe with column indicating presence (1) or absence (0)
time_df = pd.DataFrame(data={'present':[1,0]*int(len(time)/2)}, index=time)
# calculate interval length and add to time_df
time_df['interval'] = time_df.index.to_series().diff().astype('timedelta64[m]')
# add column with day to time_df
time_df['day'] = time.day
#select only intervals where owl is present
timeinbox = time_df.iloc[1::2, :]
interval = timeinbox.interval
day = timeinbox.day
# sum multiple intervals per day
interval_tot = [interval[0]]
day_tot = [day[0]]
for i in range(1, len(day)):
if day[i] == day[i-1]:
interval_tot[-1] =interval[i]
else:
day_tot.append(day[i])
interval_tot.append(interval[i])
# recalculate to hours
for i in range(len(interval_tot)):
interval_tot[i] = interval_tot[i]/(60)
plt.figure(figsize=(15, 5))
plt.grid(zorder=0)
plt.bar(day_tot, interval_tot, color='g', zorder=3)
plt.xlim([1,31])
plt.xlabel('day in December')
plt.ylabel('hours per day in nest box')
plt.xticks(np.arange(1,31,1))
plt.ylim([0, 24])
Now I would like to combine all data in one plot by making a stacked bar chart, where each day is represented by a bar and each bar indicating for each of the 24*60 minutes whether the owl is present or not. Is this possible from the current data structure?
CodePudding user response:
The data seems to have been created manually, so I have changed the format of the data presented. The approach I took was to create the time spent and the time not spent, with a continuous index of 1 minute intervals with the start and end time as the difference time and a flag of 1. Now to create non-stay time, I will create a time series index of start and end date 1 at 1 minute intervals. Update the original data frame with the newly created index. This is the data for the graph. In the graph, based on the data frame extracted in days, create a color list with red for stay and green for non-stay. Then, in a bar graph, stack the height one. It may be necessary to consider grouping the data into hourly units.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import timedelta
import io
data = '''
start_time,end_time
"2021-12-01 18:08","2021-12-01 18:11"
"2021-12-02 05:27","2021-12-02 05:29"
"2021-12-02 22:40","2021-12-02 22:43"
"2021-12-03 19:24","2021-12-03 19:27"
"2021-12-06 18:04","2021-12-06 18:06"
"2021-12-07 05:28","2021-12-07 05:30"
"2021-12-10 03:05","2021-12-10 03:10"
"2021-12-10 07:11","2021-12-10 07:13"
"2021-12-10 20:40","2021-12-10 20:41"
"2021-12-12 19:42","2021-12-12 19:45"
"2021-12-13 04:13","2021-12-13 04:17"
"2021-12-15 04:28","2021-12-15 04:30"
"2021-12-15 05:21","2021-12-15 05:25"
"2021-12-15 17:40","2021-12-15 17:44"
"2021-12-15 22:31","2021-12-15 22:37"
"2021-12-16 04:24","2021-12-16 04:28"
"2021-12-16 19:58","2021-12-16 20:09"
"2021-12-17 17:42","2021-12-17 18:04"
"2021-12-17 22:19","2021-12-17 22:26"
"2021-12-18 05:41","2021-12-18 05:44"
"2021-12-19 07:40","2021-12-19 16:55"
"2021-12-19 20:39","2021-12-19 20:52"
"2021-12-19 21:56","2021-12-19 23:17"
"2021-12-21 04:53","2021-12-21 04:59"
"2021-12-21 05:37","2021-12-21 05:39"
"2021-12-22 08:06","2021-12-22 17:22"
"2021-12-22 20:04","2021-12-22 21:24"
"2021-12-22 21:44","2021-12-22 22:47"
"2021-12-23 02:20","2021-12-23 06:17"
"2021-12-23 08:07","2021-12-23 16:54"
"2021-12-23 19:36","2021-12-24 00:00"
"2021-12-24 00:00","2021-12-24 00:28"
"2021-12-24 07:53","2021-12-24 17:00"
'''
df = pd.read_csv(io.StringIO(data), sep=',')
df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])
time_df = pd.DataFrame()
for idx, row in df.iterrows():
rng = pd.date_range(row['start_time'], row['end_time']-timedelta(minutes=1), freq='1min')
tmp = pd.DataFrame({'present':[1]*len(rng)}, index=rng)
time_df = time_df.append(tmp)
date_add = pd.date_range(time_df.index[0].date(), time_df.index[-1].date() timedelta(days=1), freq='1min')
time_df = time_df.reindex(date_add, fill_value=0)
time_df['day'] = time_df.index.day
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,15))
ax.set_yticks(np.arange(0,1500,60))
ax.set_ylim(0,1440)
ax.set_xticks(np.arange(1,25,1))
days = time_df['day'].unique()
for d in days:
#if d == 1:
day_df = time_df.query('day == @d')
colors = [ 'r' if p == 1 else 'g' for p in day_df['present']]
for i in range(len(day_df)):
ax.bar(d, height=1, width=0.5, bottom=i 1, color=colors[i])
plt.show()