Home > Net >  Pandas: Comma Separated Excel Cells not Converting to List
Pandas: Comma Separated Excel Cells not Converting to List

Time:05-16

I've joined 3 Excel tab data sets to give me my base dataframe, and then for each line I want to count the int values in the comma separated values in DUAlloc and divide Amount by the DUAlloc Count then loop through the DuAlloc list and assign individual lines e.g.

Base Data:

Description DuAlloc Amount
Blah 1,2,3,4,5 1000
Yada 30,15,3,4,5 200

Processed Data:

Description DuAlloc Amount
Blah 1 200
Blah 2 200
Blah 3 200
Yada 3 40
Blah 4 200
Yada 4 40
Blah 5 200
Yada 5 40
Yada 15 40
Yada 30 40

I've tried numerous ways to convert to a list: list(), tolist(), but either get the same number for all the counts, or the nearest I've come is [len(str(c)) for c in df3['DUAlloc']] which counts all the characters which I don't want.

How would I go about achieving this, and is Pandas the best route to take?

CodePudding user response:

Use Series.str.split, df.explode, Groupby.transform and df.div:

In [501]: out = df.assign(DuAlloc=df['DuAlloc'].str.split(',')).explode('DuAlloc')

In [506]: out['Amount'] = out['Amount'].div(out.groupby('Description')['Amount'].transform('size'))

In [507]: out
Out[507]: 
  Description DuAlloc  Amount
0        Blah       1   200.0
0        Blah       2   200.0
0        Blah       3   200.0
0        Blah       4   200.0
0        Blah       5   200.0
1        Yada      30    40.0
1        Yada      15    40.0
1        Yada       3    40.0
1        Yada       4    40.0
1        Yada       5    40.0

CodePudding user response:

You can use .str.count to count the number of , in columns.

out = (df.assign(Amount=df['Amount'].div(df['DuAlloc'].str.count(',').add(1)),
                 DuAlloc=df['DuAlloc'].str.split(','))
       .explode('DuAlloc'))
print(out)

  Description DuAlloc  Amount
0        Blah       1   200.0
0        Blah       2   200.0
0        Blah       3   200.0
0        Blah       4   200.0
0        Blah       5   200.0
1        Yada      30    40.0
1        Yada      15    40.0
1        Yada       3    40.0
1        Yada       4    40.0
1        Yada       5    40.0
  • Related