Pandas: Comma Separated Excel Cells not Converting to List-CodePudding

I've joined 3 Excel tab data sets to give me my base dataframe, and then for each line I want to count the int values in the comma separated values in DUAlloc and divide Amount by the DUAlloc Count then loop through the DuAlloc list and assign individual lines e.g.

Base Data:

Description	DuAlloc	Amount
Blah	1,2,3,4,5	1000
Yada	30,15,3,4,5	200

Processed Data:

Description	DuAlloc	Amount
Blah	1	200
Blah	2	200
Blah	3	200
Yada	3	40
Blah	4	200
Yada	4	40
Blah	5	200
Yada	5	40
Yada	15	40
Yada	30	40

I've tried numerous ways to convert to a list: list(), tolist(), but either get the same number for all the counts, or the nearest I've come is [len(str(c)) for c in df3['DUAlloc']] which counts all the characters which I don't want.

How would I go about achieving this, and is Pandas the best route to take?

CodePudding user response：

Use Series.str.split, df.explode, Groupby.transform and df.div:

In [501]: out = df.assign(DuAlloc=df['DuAlloc'].str.split(',')).explode('DuAlloc')

In [506]: out['Amount'] = out['Amount'].div(out.groupby('Description')['Amount'].transform('size'))

In [507]: out
Out[507]: 
  Description DuAlloc  Amount
0        Blah       1   200.0
0        Blah       2   200.0
0        Blah       3   200.0
0        Blah       4   200.0
0        Blah       5   200.0
1        Yada      30    40.0
1        Yada      15    40.0
1        Yada       3    40.0
1        Yada       4    40.0
1        Yada       5    40.0

CodePudding user response：

You can use .str.count to count the number of , in columns.

out = (df.assign(Amount=df['Amount'].div(df['DuAlloc'].str.count(',').add(1)),
                 DuAlloc=df['DuAlloc'].str.split(','))
       .explode('DuAlloc'))

print(out)

  Description DuAlloc  Amount
0        Blah       1   200.0
0        Blah       2   200.0
0        Blah       3   200.0
0        Blah       4   200.0
0        Blah       5   200.0
1        Yada      30    40.0
1        Yada      15    40.0
1        Yada       3    40.0
1        Yada       4    40.0
1        Yada       5    40.0