I have a dataframe like this:
id cola colb
0 0 0 [1]
1 1 1 [2,3,4]
2 1 2 [2,3,4]
3 2 3 [3]
4 2 4 [3]
5 2 5 [4]
6 3 6 [5]
...
and want it to convert to:
id cola colb colc
0 0 0 [1]
1 1 1 [2,3,4] 0
2 1 2 [2,3,4] 0
3 2 3 [3] 1,2
4 2 4 [3] 1,2
5 2 5 [4] 1,2
6 3 6 [5] 1,2,3,4
...
The rule is:
Take first row for example, colb=[1], corresponding cola=0, then cola=0 is appended to cell list in colc for id=[1].
Take second row for example, colb=[2,3,4], corresponding cola=1, then cola=1 is appended to cell list in colc for id=[2,3,4].
Take third row for example, colb=[2,3,4], corresponding cola=2, then cola=2 is appended to cell list in colc for id=[2,3,4]. ...
Here is what I have tried so far:
df['colc'] = [[] for _ in range(len(df))]
for i in range(len(df['colb'])):
lista = df.colb[i]
for j in lista:
df.colc.loc[df.id==j].append(pd.Series(df.cola[i]))
But it seems that something is wrong at df.colc.loc[df.id==j]
, no value has appended to the list. Is there anyway to do this? Could someone be so kind to help me? Thanks!
CodePudding user response:
df = pd.DataFrame({
'id': [0,1,1,2,2,2,3],
'cola': [0,1,2,3,4,5,6],
'colb': [[1], [2,3,4], [2,3,4], [3], [3], [4], [5]],
})
colb
is a column of list of id
, cola
is what we use to form colc
, so we can pick out those columns, rename them, and explode the renamed column of id
such that all lists are exploded into rows. Exploded rows will share the same colc
value before.
step1 = df[['colb', 'cola']].rename(columns={'colb': 'id', 'cola': 'colc'}).explode('id')
Then we can groupby
to form lists for different id
step2 = step1.groupby('id')['colc'].apply(list)
And finally merge those lists back
df.merge(step2, on='id', how='left')