I have a table
Input table
id day info
0 1 1 i1
1 1 1 i2
2 1 1 i3
3 1 1 i4
4 1 1 i5
5 1 1 i6
6 1 2 j1
7 1 2 j2
I would like to attach the information from adjacent rows for each row in a dataframe with a stride=2. After that I need to store this information in a dataframe. If there is no adjacent information I need to place None.
Desired output table
id day info
0 1 1 [None,None,i1,i3,i5]
1 1 1 [None,None,i2,i4,i6]
2 1 1 [None,i1,i3,i5,None]
3 1 1 [None,i2,i4,i6,None]
4 1 1 [i1,i3,i5,None,None]
5 1 1 [i2,i4,i6,None,None]
6 1 2 [None,None,j1,None,None]
7 1 2 [None,None,j2,None,None]
I am sure I need to uses df.groupby(['id','day'])['info']
but I don't know how to correctly use shift in this situation
CodePudding user response:
As you want only 5 elements, if should be efficient enough to use shift
with the wanted strides:
g = df.groupby(['id', 'day'])['info']
df['info'] = (pd
.concat([g.shift(i)
for i in [4,2,0,-2,-4]],
axis=1)
.agg(list, axis=1)
)
Or using numpy:
g = df.groupby(['id', 'day'])['info']
df['info'] = np.array([g.shift(i)
for i in [4,2,0,-2,-4]]
).T.tolist()
Output:
id day info
0 1 1 [nan, nan, i1, i3, i5]
1 1 1 [nan, nan, i2, i4, i6]
2 1 1 [nan, i1, i3, i5, nan]
3 1 1 [nan, i2, i4, i6, nan]
4 1 1 [i1, i3, i5, nan, nan]
5 1 1 [i2, i4, i6, nan, nan]
6 1 2 [nan, nan, j1, nan, nan]
7 1 2 [nan, nan, j2, nan, nan]