How to transfer multiple rows into columns in panda data frame?-CodePudding

I have a data frame as follows:

  mydata_frame:
                 IDs   days                f1      f2      f3  ...  fn
                 Id1   day1                5       6       21  ...  5
                 Id1   delta_day0&day1     0       0       0   ...  0                     
                 Id1   day2                21      1       4   ...  2
                 Id1   delta_day1&day2     16      5       17  ...  3
                 Id1   day3                30      2       5   ...  2
                 Id1   delta_day2&day3     9       1       1   ...   0
                 ...     ...        ..      ..      ..  ... ...
                 Id2   day1                8       9       10  ...  2
                 Id2   delta_day0&day1     0       0       0   ...  0                     
                 Id2   day2                12      10      7   ...  6
                 Id2   delta_day1&day2     4       1       3  ...   4
                 Id2   day3                22      25      3   ...  1
                 Id2   delta_day2&day3     10       15     4   ...  5

I would like to transfer the value of all the rows which includes delta into columns. So my desired output should be like this:

    output:
                 IDs   days         f1    del_f1   f2  del_f2     f3  del_f3  ...  fn  del_fn
                 Id1   day1         5       0       6    0        21   0      ...  5      0
                 Id1   day2         21      16      1    5        4    17     ...  2      3
                 Id1   day3         30      9       2    1        5    1      ...  2      0
                 ...     ...        ..      ..      ..  ... ...
                 Id2   day1         8       0       9    0        10   0      ...  2      0                    
                 Id2   day2         12      4       10   1        7    3      ...  6      4
                 Id2   day3         22      10      25   15       3    4      ...  1      5

Any idea to do this conversion in an efficient way?

CodePudding user response：

You can start by building a Dataframe without delay (only days) i.e. keeping only every two rows starting from index 0.
Then insert new columns (from the 4th column i.e. index 3) with delay values of the preceding column (every two rows starting from 1):

df2 = df.iloc[::2]
new_index = df2.index

i = 3
for col in df.columns[2:]:
    new_col = "del_"   col
    series = pd.Series(df[col].iloc[1::2])
    series.index = new_index
    df2.insert(i, new_col, series)
    i =2

print(df2)

CodePudding user response：

You could do:

df1 = df[df.days.str.startswith("day")].reset_index(drop=True)
df2 = (
    df.loc[df.days.str.startswith("delta"), [f"f{i}" for i in range(1, n 1)]]
    .rename(columns={f"f{i}": f"del_f{i}" for i in range(1, n 1)})
    .reset_index(drop=True)
)
df = pd.concat([df1, df2], axis="columns")

Result for n == 3 and the following sample dataframe df

    IDs             days  f1  f2  f3
0   Id1             day1   5   6  21
1   Id1  delta_day0&day1   0   0   0
2   Id1             day2  21   1   4
3   Id1  delta_day1&day2  16   5  17
4   Id1             day3  30   2   5
5   Id1  delta_day2&day3   9   1   1
6   Id2             day1   8   9  10
7   Id2  delta_day0&day1   0   0   0
8   Id2             day2  12  10   7
9   Id2  delta_day1&day2   4   1   3
10  Id2             day3  22  25   3
11  Id2  delta_day2&day3  10  15   4

   IDs  days  f1  f2  f3  del_f1  del_f2  del_f3
0  Id1  day1   5   6  21       0       0       0
1  Id1  day2  21   1   4      16       5      17
2  Id1  day3  30   2   5       9       1       1
3  Id2  day1   8   9  10       0       0       0
4  Id2  day2  12  10   7       4       1       3
5  Id2  day3  22  25   3      10      15       4