Home > OS >  How to convert dataframe?
How to convert dataframe?

Time:11-01

Good day! I have a dataframe:

df1 = pd.DataFrame([['2011-01-01','2011-01-03','A'], ['2011-04-01','2011-04-01','A'], ['2012-08-28','2012-08-30','B'], ['2015-04-03','2015-04-05','A'], ['2015-08-21','2015-08-21','B']], columns = ['d0', 'd1', 'event'])
d0  d1  event
0   2011-01-01  2011-01-03  A
1   2011-04-01  2011-04-01  A
2   2012-08-28  2012-08-30  B
3   2015-04-03  2015-04-05  A
4   2015-08-21  2015-08-21  B

It contains some events A and B (there are actually more events, they are mixed, but they have no intersection by dates) that occurred in the specified interval from d0 to d1. Moreover, this interval can be 1 day (d0 = d1). I need to go from df1 to df2 in which these time intervals are "unrolled" for each event, i.e .:

df2 = pd.DataFrame([['2011-01-01','A'], ['2011-01-02','A'], ['2011-01-03','A'], ['2011-04-01','A'], ['2012-08-28','B'], ['2012-08-29','B'], ['2012-08-30','B'], ['2015-04-03','A'], ['2015-04-04','A'], ['2015-04-05','A'], ['2015-08-21','В']], columns = ['Date', 'event'])
    Date    event
0   2011-01-01  A
1   2011-01-02  A
2   2011-01-03  A
3   2011-04-01  A
4   2012-08-28  B
5   2012-08-29  B
6   2012-08-30  B
7   2015-04-03  A
8   2015-04-04  A
9   2015-04-05  A
10  2015-08-21  В

I tried doing this based on resample and comparing areas where ffill = bfill but couldn't come up with anything. How can this be done in the most simple way?

CodePudding user response:

We can set_index to event then create date_range per row, then explode to unwind the ranges and reset_index to create the DataFrame:

df2 = (
    df1.set_index('event')
        .apply(lambda r: pd.date_range(r['d0'], r['d1']), axis=1)
        .explode()
        .reset_index(name='Date')[['Date', 'event']]
)

df2:

         Date event
0  2011-01-01     A
1  2011-01-02     A
2  2011-01-03     A
3  2011-04-01     A
4  2012-08-28     B
5  2012-08-29     B
6  2012-08-30     B
7  2015-04-03     A
8  2015-04-04     A
9  2015-04-05     A
10 2015-08-21     B

CodePudding user response:

Let us try comprehension to create the pairs of date and event

pd.DataFrame(((d, c) for (*v, c) in df1.to_numpy()
              for d in pd.date_range(*v)), columns=['Date', 'Event'])

         Date Event
0  2011-01-01     A
1  2011-01-02     A
2  2011-01-03     A
3  2011-04-01     A
4  2012-08-28     B
5  2012-08-29     B
6  2012-08-30     B
7  2015-04-03     A
8  2015-04-04     A
9  2015-04-05     A
10 2015-08-21     B

CodePudding user response:

I don't know if this is the "most simple," but it's the most intuitive way I can think to do it. I iterate over the rows and unroll it manually into a new dataframe. This means that I look at each row, iterate over the dates between d0 and d1, and construct a row for each of them and compile them into a dataframe:

from datetime import timedelta

def unroll_events(df):
    rows = []
    for _, row in df.iterrows():
        event = row['event']
        start = row['d0']
        end = row['d1']
        current = start
        while current != end:
            rows.append(dict(Date=current, event=event))
            current  = timedelta(days=1)
        rows.append(dict(Date=current, event=event)) # make sure last one is included
    return pd.DataFrame(rows)
  • Related