Home > other >  Replace substring in a pandas column
Replace substring in a pandas column

Time:01-03

I have a following example Dataframe:

col1 | col2 | ...  total
------------------------
0,19 | 31  | .... |I need  200,02 euros
0,19 | 40  | .... |I need   10,02 euros
   0 | 20  | .... |I need  150,02 euros
   . | ..  | .... |...

I want to replace the numeric values in total columns, only if col2 is 31.

The new numeric values (to be replaced) should be the sum of (total * col1) total for rows except 31. So, the new value in total should be ((10.02 * 0.19) 10.02) ((150.02 * 0) 150.02) = 161.94. Therefore, the resulting dataframe should be as follows:

col1 | col2 | ...  total
------------------------
0,19 | 31  | .... |I need  161,94 euros
0,19 | 40  | .... |I need   10,02 euros
   0 | 20  | .... |I need  150,02 euros
   . | ..  | .... |...

I came across df.iterrows(), but they explicitly tell us to

never modify something you are iterating over

So, how do I achieve this?

CodePudding user response:

You can use the apply function to apply a function to each row or column in a dataframe. You can use this function to create a new column that holds the new value for the total column for each row.

Here's an example of how you can do this:

def update_total(row):
    if row['col2'] == 31:
        return row['total']
    else:
        return ((row['total'] * row['col1'])   row['total'])

df['new_total'] = df.apply(update_total, axis=1)

This will apply the update_total function to each row in the dataframe, and the axis=1 argument specifies that the function should be applied to each row. The function will check if the value in the col2 column is equal to 31, and if it is, it will return the original value in the total column. If the value in the col2 column is not equal to 31, it will return the new value for the total column.

Once the apply function has completed, you will have a new column called new_total in your dataframe that holds the updated values for the total column. You can then rename this column to total if you want to replace the original total column with the updated values.

df = df.rename(columns={'new_total': 'total'})

CodePudding user response:

For latter part of your question, all you need to do is create a copy of your dataframe, and use one for the iteration and the other one for the modifications:

df_copy = df.copy()
for index, row in df_copy.iterrows():
    df.loc[index, 'cond'] = # edit the DF

Second, I feel like you are overcomplicating way too much yourself on this problem. Why can't you have a column with the number total (just a float value, 200.0) and then another string column where you get the full sentence I need 200.0 euros?

In any case, you will need to parse your total column in search of the amount.

aux = df.loc[df['col2'] == 31]
aux['aux_total'] = # parse the total column and extract the amount
df.loc[
    df['col2'] == 31,
    'total'
] = 'I need %2.1f euros' % ((aux['aux_total']*aux['col1']) aux['col2'])

p.d.: as I can see all the other answers are totally missing the fact the total column is a string, which has to be replaced. You can't operate with that column as an integer.

CodePudding user response:

First multiple col1 with total, add total and filter rows by mask and sum with assign to rows with mask - but first convert values to numeric:

m = df.col2.ne(31)

total = df['total'].str.extract('(\d ,\d )', expand=False).str.replace(',','.').astype(float)

s1 = df['col1'].str.replace(',','.').astype(float)

s = s1.mul(total).add(total)

df.loc[~m, 'total'] = f"I need {str(s[m].sum().round(2)).replace('.',',')} euros"

print (df)
   col1  col2                total
0  0,19    31  I need 161,94 euros
1  0,19    40   I need 10,02 euros
2     0    20  I need 150,02 euros
  • Related