I've created a new column with str.findall() method because the original value is a sentence, like "Preis ab 23,48 EUR pro Person/Tag" so I extracted the number ['23,48'] as a new value, but some sentences contain two prices, Like "from €83 to €120", so I will get the new value ['83' , '120'], now I want to replace the row with two values with the average, but I can't find a solution plan
The extract method is as follows:
df['pricerange_All'] = df.pricerange.str.findall(r'([0-9,.] (?:\.[0-9,.] )?)')
and the dataframe which looks like this:
pricerange_All
0 ['16.51']
1 ['17,61']
2 ['16,96']
3 ['13,70']
4 ['17,85']
.
100 ['690.94', '690.76']
as you can see some of the values have two prices
I've try the solution :
df2= pd.DataFrame([pd.Series(x) for x in df.pricerange_All])
df2.columns = ['price_{}'.format(x 1) for x in df2.columns]
but it didn't work
How to separate the column or create a new column that save the mean value of each row? like:
mean_price
0 16.51
1 17,61
2 16,96
3 13,70
4 17,85
. .
100 690.85
I've racked my brains and really need to solve this as soon as possible. Thanks.
Really appreciate for the reply.
CodePudding user response:
If you do in fact want every comma and decimal to remain the same, then the final column must be of a string type because you can't have floats and strings together.
You could use pd.apply
and convert the elements of any lists longer than 1 to floats, take the mean, and convert the result back to a string type. For example:
df['pricerange_All_new'] = df['pricerange_All'].apply(lambda row: row[0] if len(row) == 1 else str(np.mean([float(p.replace(',','.')) for p in row])))
Result:
df
pricerange_All pricerange_All_new
0 [16.51] 16.51
1 [17,61] 17,61
2 [16,96] 16,96
3 [13,70] 13,70
4 [17,85] 17,85
5 [690.94, 690.76] 690.85
CodePudding user response:
Assuming you have a dataframe (df) that looks like this:
df = pd.DataFrame({ 'pricerange_All': [['16.51'], ['17,61'], ['16,96'], ['13,70'], ['17,85'], ['690.94', '690.76']]})
print(df)
----------------------------
pricerange_All
0 [16.51]
1 [17,61]
2 [16,96]
3 [13,70]
4 [17,85]
5 [690.94, 690.76]
You could use a lambda function to apply the mean function from the statistics package to each:
Code:
from statistics import mean
df.pricerange_All = df.pricerange_All.apply(lambda x: mean(list(map(float, [item.replace(",", ".") for item in x]))))
print(df)
Output:
pricerange_All
0 16.51
1 17.61
2 16.96
3 13.70
4 17.85
5 690.85
Alternative (keep commas):
If you want to keep the commas, you could something like this instead:
df.pricerange_All = df.pricerange_All.apply(lambda x: mean(list(map(float, [item.replace(",", ".") for item in x]))) if len(x) > 1 else x[0])
print(df)
Output:
pricerange_All
0 16.51
1 17,61
2 16,96
3 13,70
4 17,85
5 690.85