Home > Mobile >  Pandas dataframe assign value to lists based on other columns
Pandas dataframe assign value to lists based on other columns

Time:02-23

I have a dataframe like this:

   id  cola colb
0   0   0    [1]
1   1   1    [2,3,4]
2   1   2    [2,3,4]
3   2   3    [3]
4   2   4    [3]
5   2   5    [4]
6   3   6    [5]
...

and want it to convert to:

   id  cola colb      colc
0   0   0    [1]       
1   1   1    [2,3,4]   0
2   1   2    [2,3,4]   0
3   2   3    [3]       1,2
4   2   4    [3]       1,2
5   2   5    [4]       1,2
6   3   6    [5]       1,2,3,4
...

The rule is:

Take first row for example, colb=[1], corresponding cola=0, then cola=0 is appended to cell list in colc for id=[1].

Take second row for example, colb=[2,3,4], corresponding cola=1, then cola=1 is appended to cell list in colc for id=[2,3,4].

Take third row for example, colb=[2,3,4], corresponding cola=2, then cola=2 is appended to cell list in colc for id=[2,3,4]. ...

Here is what I have tried so far:

df['colc'] = [[] for _ in range(len(df))]
for i in range(len(df['colb'])):
    lista = df.colb[i]
    for j in lista:
        df.colc.loc[df.id==j].append(pd.Series(df.cola[i])) 

But it seems that something is wrong at df.colc.loc[df.id==j], no value has appended to the list. Is there anyway to do this? Could someone be so kind to help me? Thanks!

CodePudding user response:

df = pd.DataFrame({
    'id': [0,1,1,2,2,2,3],
    'cola': [0,1,2,3,4,5,6],
    'colb': [[1], [2,3,4], [2,3,4], [3], [3], [4], [5]],
})

colb is a column of list of id, cola is what we use to form colc, so we can pick out those columns, rename them, and explode the renamed column of id such that all lists are exploded into rows. Exploded rows will share the same colc value before.

step1 = df[['colb', 'cola']].rename(columns={'colb': 'id', 'cola': 'colc'}).explode('id')

Then we can groupby to form lists for different id

step2 = step1.groupby('id')['colc'].apply(list)

And finally merge those lists back

df.merge(step2, on='id', how='left')
  • Related