I got df like this:
cols=['a', 'b']
df = pd.DataFrame([[[3,1,5,7], [42,31]], [[],[44]], [[44,3,5,5,5,10],[]], [[], [44324,3]]],
columns=cols)
As you see theres list in every cell. I want to to followings things on each of element:
- Calculate mean of list and append 5
- If result <= 0, add 1 in place of list
- If list is empty, add 0 in place of list
My working solution:
df
def convert_list(x):
if len(x) != 0:
res = (sum(x)/len(x)) 5
if res <= 0:
res = 1
return res
return 0
for col in cols:
df[col] = df[col].apply(lambda x: convert_list(x))
Desired output:
df
It's working but its very slow solution (in original df I got about 50k columns and 100k rows, and list might contains many elements). Is there any efficient solution for this? I also tried convert it to numpy array and do some vectecorized operations, but the problem is every list might have different length, so I cant convert it (unless I add many elements to other lists...)
CodePudding user response:
You could use applymap
and np.mean
to average every cell and add 5. Then any value below 5 would have been a negative mean, and nans can be filled with zero.
import pandas as pd
import numpy as np
cols=['a', 'b']
df = pd.DataFrame([[[3,1,5,7], [42,31]], [[],[44]], [[44,3,5,5,5,10],[]], [[], [44324,3]]],
columns=cols)
df = (df.applymap(np.mean) 5)
df[df<5]=1
df = df.fillna(0)
Output
a b
0 9.0 41.5
1 0.0 49.0
2 17.0 0.0
3 0.0 22168.5
CodePudding user response:
You could try using explode
then an aggregation, with the idea of avoiding apply
.
Something like that, but it is ugly.
def correct_column(col):
cole = col.explode()
m_empty = cole.isna().groupby(cole.index).first()
groups = cole.groupby(cole.index)
col_sum = groups.sum()
col_count = groups.count()
col = groups.sum() / groups.count() 5
m_negative = col <= 0
col[m_empty] = 0
col[m_negative] = 1
return col
for col in df.columns:
df[col] = correct_column(df[col])
Note: groups.sum() / groups.count()
may be strange way to get the mean but for empty lists (NaN
after explode
) mean
raises an error when sum
returns 0
.