I have a following example Dataframe:
col1 | col2 | ... total
------------------------
0,19 | 31 | .... |I need 200,02 euros
0,19 | 40 | .... |I need 10,02 euros
0 | 20 | .... |I need 150,02 euros
. | .. | .... |...
I want to replace the numeric values in total
columns, only if col2
is 31.
The new numeric values (to be replaced) should be the sum of (total
* col1
) total
for rows except 31
. So, the new value in total
should be ((10.02 * 0.19) 10.02) ((150.02 * 0) 150.02) = 161.94
. Therefore, the resulting dataframe should be as follows:
col1 | col2 | ... total
------------------------
0,19 | 31 | .... |I need 161,94 euros
0,19 | 40 | .... |I need 10,02 euros
0 | 20 | .... |I need 150,02 euros
. | .. | .... |...
I came across df.iterrows(), but they explicitly tell us to
never modify something you are iterating over
So, how do I achieve this?
CodePudding user response:
You can use the apply function to apply a function to each row or column in a dataframe. You can use this function to create a new column that holds the new value for the total column for each row.
Here's an example of how you can do this:
def update_total(row):
if row['col2'] == 31:
return row['total']
else:
return ((row['total'] * row['col1']) row['total'])
df['new_total'] = df.apply(update_total, axis=1)
This will apply the update_total function to each row in the dataframe, and the axis=1 argument specifies that the function should be applied to each row. The function will check if the value in the col2 column is equal to 31, and if it is, it will return the original value in the total column. If the value in the col2 column is not equal to 31, it will return the new value for the total column.
Once the apply function has completed, you will have a new column called new_total in your dataframe that holds the updated values for the total column. You can then rename this column to total if you want to replace the original total column with the updated values.
df = df.rename(columns={'new_total': 'total'})
CodePudding user response:
For latter part of your question, all you need to do is create a copy of your dataframe, and use one for the iteration and the other one for the modifications:
df_copy = df.copy()
for index, row in df_copy.iterrows():
df.loc[index, 'cond'] = # edit the DF
Second, I feel like you are overcomplicating way too much yourself on this problem. Why can't you have a column with the number
total (just a float value, 200.0
) and then another string
column where you get the full sentence I need 200.0 euros
?
In any case, you will need to parse your total
column in search of the amount.
aux = df.loc[df['col2'] == 31]
aux['aux_total'] = # parse the total column and extract the amount
df.loc[
df['col2'] == 31,
'total'
] = 'I need %2.1f euros' % ((aux['aux_total']*aux['col1']) aux['col2'])
p.d.: as I can see all the other answers are totally missing the fact the total
column is a string
, which has to be replaced. You can't operate with that column as an integer.
CodePudding user response:
First multiple col1
with total
, add total
and filter rows by mask and sum with assign to rows with mask - but first convert values to numeric:
m = df.col2.ne(31)
total = df['total'].str.extract('(\d ,\d )', expand=False).str.replace(',','.').astype(float)
s1 = df['col1'].str.replace(',','.').astype(float)
s = s1.mul(total).add(total)
df.loc[~m, 'total'] = f"I need {str(s[m].sum().round(2)).replace('.',',')} euros"
print (df)
col1 col2 total
0 0,19 31 I need 161,94 euros
1 0,19 40 I need 10,02 euros
2 0 20 I need 150,02 euros