Home > Blockchain >  Take average of range entities and replace it in pandas column
Take average of range entities and replace it in pandas column

Time:11-29

I have dataframe where one column looks like

Average Weight (Kg) 
0.647             
0.88
0              
0.73              
1.7 - 2.1         
1.2 - 1.5         
2.5 
NaN         
1.5 - 1.9         
1.3 - 1.5         
0.4               
1.7 - 2.9 

Reproducible data

df = pd.DataFrame([0.647,0.88,0,0.73,'1.7 - 2.1','1.2 - 1.5',2.5 ,np.NaN,'1.5 - 1.9','1.3 - 1.5',0.4,'1.7 - 2.9'],columns=['Average Weight (Kg)'])  

where I would like to take average of range entries and replace it in the dataframe e.g. 1.7 - 2.1 will be replaced by 1.9 , following code doesn't work TypeError: 'float' object is not iterable

np.where(df['Average Weight (Kg)'].str.contains('-'), df['Average Weight (Kg)'].str.split('-')
.apply(lambda x: statistics.mean((list(map(float, x)) ))), df['Average Weight (Kg)'])

CodePudding user response:

Another possible solution, which is based on the following ideas:

  1. Convert column to string.

  2. Split each cell by \s-\s.

  3. Explode column.

  4. Convert back to float.

  5. Group by and mean.

df['Average Weight (Kg)'] = df['Average Weight (Kg)'].astype(
    str).str.split(r'\s-\s').explode().astype(float).groupby(level=0).mean()

Output:

    Average Weight (Kg)
0                 0.647
1                 0.880
2                 0.000
3                 0.730
4                 1.900
5                 1.350
6                 2.500
7                   NaN
8                 1.700
9                 1.400
10                0.400
11                2.300

CodePudding user response:

edit: slight change to avoid creating a new column

You could go for something like this (renamed your column name to avg, cause it was long to type :-) ):

new_average =(df.avg.str.split('-').str[1].astype(float)   df.avg.str.split('-').str[0].astype(float) ) / 2
df["avg"] = new_average.fillna(df.avg)

yields for avg:

0     0.647
1     0.880
2     0.000
3     0.730
4     1.900
5     1.350
6     2.500
7       NaN
8     1.700
9     1.400
10    0.400
11    2.300
Name: avg2, dtype: float64
  • Related