Home > database >  how could I create a mean value column with the multiple values column
how could I create a mean value column with the multiple values column

Time:12-24

I've created a new column with str.findall() method because the original value is a sentence, like "Preis ab 23,48 EUR pro Person/Tag" so I extracted the number ['23,48'] as a new value, but some sentences contain two prices, Like "from €83 to €120", so I will get the new value ['83' , '120'], now I want to replace the row with two values with the average, but I can't find a solution plan

The extract method is as follows:

df['pricerange_All'] = df.pricerange.str.findall(r'([0-9,.] (?:\.[0-9,.] )?)')

and the dataframe which looks like this:

       pricerange_All 
0         ['16.51']
1         ['17,61']
2         ['16,96']
3         ['13,70']
4         ['17,85']
.
100       ['690.94', '690.76']

as you can see some of the values have two prices

I've try the solution :

df2= pd.DataFrame([pd.Series(x) for x in df.pricerange_All])
df2.columns = ['price_{}'.format(x 1) for x in df2.columns]

but it didn't work

How to separate the column or create a new column that save the mean value of each row? like:

         mean_price    
0         16.51 
1         17,61 
2         16,96   
3         13,70   
4         17,85  
.           .
100       690.85


I've racked my brains and really need to solve this as soon as possible. Thanks.

Really appreciate for the reply.

CodePudding user response:

If you do in fact want every comma and decimal to remain the same, then the final column must be of a string type because you can't have floats and strings together.

You could use pd.apply and convert the elements of any lists longer than 1 to floats, take the mean, and convert the result back to a string type. For example:

df['pricerange_All_new'] = df['pricerange_All'].apply(lambda row: row[0] if len(row) == 1 else str(np.mean([float(p.replace(',','.')) for p in row])))

Result:

df
     pricerange_All pricerange_All_new
0           [16.51]              16.51
1           [17,61]              17,61
2           [16,96]              16,96
3           [13,70]              13,70
4           [17,85]              17,85
5  [690.94, 690.76]             690.85

CodePudding user response:

Assuming you have a dataframe (df) that looks like this:

df = pd.DataFrame({ 'pricerange_All': [['16.51'], ['17,61'], ['16,96'], ['13,70'], ['17,85'], ['690.94', '690.76']]})

print(df)

----------------------------

     pricerange_All
0           [16.51]
1           [17,61]
2           [16,96]
3           [13,70]
4           [17,85]
5  [690.94, 690.76]


You could use a lambda function to apply the mean function from the statistics package to each:

Code:

from statistics import mean 

df.pricerange_All = df.pricerange_All.apply(lambda x: mean(list(map(float, [item.replace(",", ".") for item in x]))))
print(df)

Output:

   pricerange_All
0           16.51
1           17.61
2           16.96
3           13.70
4           17.85
5          690.85


Alternative (keep commas):

If you want to keep the commas, you could something like this instead:

df.pricerange_All = df.pricerange_All.apply(lambda x: mean(list(map(float, [item.replace(",", ".") for item in x]))) if len(x) > 1 else x[0])
print(df)

Output:

  pricerange_All
0          16.51
1          17,61
2          16,96
3          13,70
4          17,85
5         690.85
  • Related