Fill columns of one data frame with columns of other dataframe on group-CodePudding

I have one data frame with multiple columns as mentioned below.

df1 
a  b  c  d  e  f   dr1    a1     de1    dr2   a2     de2     dr3   a3     de3   dr4    a4     de4
x  y  z  1  2  3   abc  08:00   09:00   abc  07:00  08:00   abc  08:00   09:00  abc  08:00   09:00
x1 y1 z1 7  9  2   abc  14:00   15:00   abc  14:00  15:00   abc  13:00   14:00  abc  13:00   14:00
x2 y2 z2 2  4  5   abc  21:00   22:00   abc  20:00  21:00   abc  20:00   21:00  abc  20:00   21:00

I have another dataframe as mentioned below.

df2
a   b   c   d   e   f      date       dr   a   de
x   y   z   1   2   3   2022-01-01
x   y   z   1   2   3   2022-01-02
x   y   z   1   2   3   2022-01-03
x   y   z   1   2   3   2022-01-04
x1 y1  z1   7   9   2   2022-01-01
x1 y1  z1   7   9   2   2022-01-02
x1 y1  z1   7   9   2   2022-01-03
x1 y1  z1   7   9   2   2022-01-04
x2 y2  z2   2   4   5   2022-01-01
x2 y2  z2   2   4   5   2022-01-02
x2 y2  z2   2   4   5   2022-01-03
x2 y2  z2   2   4   5   2022-01-04

Now i want to fill the "dr", "a", de" columns of df2 with values from df1 such that for 2022-01-01 the values are populated from "dr1","a1","de1" column of df1. For 2022-01-02 the values are populated from "dr2","a2","de2" columns. For 2022-01-03 the values are populated from "dr3","a3","de3" and so on. The dates are in ascending order for each group and there are multiple groups.

I have tried using Rank method to rank df2 on date and fill the columns, however it doesnt work.

CodePudding user response：

Try:

df2['dr'] = df1[[x for x in df1.columns if 'dr' in x]].T

Repeat this for the other two columns with a little bit of change.

CodePudding user response：

It is unclear how you exactly generate the dates (I assumed 2021-12-31 the number of days defined by the number on the column name).

For the general reshaping you need pandas.wide_to_long:

df2 = (pd
       .wide_to_long(df.rename(columns={'a':'A'}),
                     stubnames=['dr', 'a', 'de'],
                     i=['A', 'b', 'c', 'd', 'E', 'f'],
                     j='date')
       .reset_index()
       .assign(date=lambda d: pd.Timestamp('2021-12-31') pd.to_timedelta(d['date'], unit='d'))
       )

Note that you can't have duplicated column names, so you first need to rename 'a' (here as 'A').

Output:

     A   b   c  d  E  f       date   dr      a     de
0    x   y   z  1  2  3 2022-01-01  abc  08:00  09:00
1    x   y   z  1  2  3 2022-01-02  abc  07:00  08:00
2    x   y   z  1  2  3 2022-01-03  abc  08:00  09:00
3    x   y   z  1  2  3 2022-01-04  abc  08:00  09:00
4   x1  y1  z1  7  9  2 2022-01-01  abc  14:00  15:00
5   x1  y1  z1  7  9  2 2022-01-02  abc  14:00  15:00
6   x1  y1  z1  7  9  2 2022-01-03  abc  13:00  14:00
7   x1  y1  z1  7  9  2 2022-01-04  abc  13:00  14:00
8   x2  y2  z2  2  4  5 2022-01-01  abc  21:00  22:00
9   x2  y2  z2  2  4  5 2022-01-02  abc  20:00  21:00
10  x2  y2  z2  2  4  5 2022-01-03  abc  20:00  21:00
11  x2  y2  z2  2  4  5 2022-01-04  abc  20:00  21:00