Home > other >  Add information from adjacent rows pandas
Add information from adjacent rows pandas

Time:05-16

I have a table

Input table

    id  day info
0   1   1   i1
1   1   1   i2
2   1   1   i3
3   1   1   i4
4   1   1   i5
5   1   1   i6
6   1   2   j1
7   1   2   j2

I would like to attach the information from adjacent rows for each row in a dataframe with a stride=2. After that I need to store this information in a dataframe. If there is no adjacent information I need to place None.

Desired output table

    id  day info
0   1   1   [None,None,i1,i3,i5]
1   1   1   [None,None,i2,i4,i6]
2   1   1   [None,i1,i3,i5,None]
3   1   1   [None,i2,i4,i6,None]
4   1   1   [i1,i3,i5,None,None]
5   1   1   [i2,i4,i6,None,None]
6   1   2   [None,None,j1,None,None]
7   1   2   [None,None,j2,None,None]

I am sure I need to uses df.groupby(['id','day'])['info'] but I don't know how to correctly use shift in this situation

CodePudding user response:

As you want only 5 elements, if should be efficient enough to use shift with the wanted strides:

g = df.groupby(['id', 'day'])['info']

df['info'] = (pd
        .concat([g.shift(i)
                 for i in [4,2,0,-2,-4]],
                  axis=1)
        .agg(list, axis=1)
        )

Or using numpy:

g = df.groupby(['id', 'day'])['info']

df['info'] = np.array([g.shift(i)
                       for i in [4,2,0,-2,-4]]
                      ).T.tolist()

Output:

   id  day                      info
0   1    1    [nan, nan, i1, i3, i5]
1   1    1    [nan, nan, i2, i4, i6]
2   1    1    [nan, i1, i3, i5, nan]
3   1    1    [nan, i2, i4, i6, nan]
4   1    1    [i1, i3, i5, nan, nan]
5   1    1    [i2, i4, i6, nan, nan]
6   1    2  [nan, nan, j1, nan, nan]
7   1    2  [nan, nan, j2, nan, nan]
  • Related