Good day! I have a dataframe:
df1 = pd.DataFrame([['2011-01-01','2011-01-03','A'], ['2011-04-01','2011-04-01','A'], ['2012-08-28','2012-08-30','B'], ['2015-04-03','2015-04-05','A'], ['2015-08-21','2015-08-21','B']], columns = ['d0', 'd1', 'event'])
d0 d1 event
0 2011-01-01 2011-01-03 A
1 2011-04-01 2011-04-01 A
2 2012-08-28 2012-08-30 B
3 2015-04-03 2015-04-05 A
4 2015-08-21 2015-08-21 B
It contains some events A and B (there are actually more events, they are mixed, but they have no intersection by dates) that occurred in the specified interval from d0 to d1. Moreover, this interval can be 1 day (d0 = d1). I need to go from df1 to df2 in which these time intervals are "unrolled" for each event, i.e .:
df2 = pd.DataFrame([['2011-01-01','A'], ['2011-01-02','A'], ['2011-01-03','A'], ['2011-04-01','A'], ['2012-08-28','B'], ['2012-08-29','B'], ['2012-08-30','B'], ['2015-04-03','A'], ['2015-04-04','A'], ['2015-04-05','A'], ['2015-08-21','В']], columns = ['Date', 'event'])
Date event
0 2011-01-01 A
1 2011-01-02 A
2 2011-01-03 A
3 2011-04-01 A
4 2012-08-28 B
5 2012-08-29 B
6 2012-08-30 B
7 2015-04-03 A
8 2015-04-04 A
9 2015-04-05 A
10 2015-08-21 В
I tried doing this based on resample and comparing areas where ffill = bfill but couldn't come up with anything. How can this be done in the most simple way?
CodePudding user response:
We can set_index
to event
then create date_range
per row, then explode
to unwind the ranges and reset_index
to create the DataFrame:
df2 = (
df1.set_index('event')
.apply(lambda r: pd.date_range(r['d0'], r['d1']), axis=1)
.explode()
.reset_index(name='Date')[['Date', 'event']]
)
df2
:
Date event
0 2011-01-01 A
1 2011-01-02 A
2 2011-01-03 A
3 2011-04-01 A
4 2012-08-28 B
5 2012-08-29 B
6 2012-08-30 B
7 2015-04-03 A
8 2015-04-04 A
9 2015-04-05 A
10 2015-08-21 B
CodePudding user response:
Let us try comprehension to create the pairs of date and event
pd.DataFrame(((d, c) for (*v, c) in df1.to_numpy()
for d in pd.date_range(*v)), columns=['Date', 'Event'])
Date Event
0 2011-01-01 A
1 2011-01-02 A
2 2011-01-03 A
3 2011-04-01 A
4 2012-08-28 B
5 2012-08-29 B
6 2012-08-30 B
7 2015-04-03 A
8 2015-04-04 A
9 2015-04-05 A
10 2015-08-21 B
CodePudding user response:
I don't know if this is the "most simple," but it's the most intuitive way I can think to do it. I iterate over the rows and unroll it manually into a new dataframe. This means that I look at each row, iterate over the dates between d0
and d1
, and construct a row for each of them and compile them into a dataframe:
from datetime import timedelta
def unroll_events(df):
rows = []
for _, row in df.iterrows():
event = row['event']
start = row['d0']
end = row['d1']
current = start
while current != end:
rows.append(dict(Date=current, event=event))
current = timedelta(days=1)
rows.append(dict(Date=current, event=event)) # make sure last one is included
return pd.DataFrame(rows)