Home > Software engineering >  How do I update column based on value in pandas df?
How do I update column based on value in pandas df?

Time:10-14

I have a column of values where some rows have a range of values. I want to keep the second value of the range.

[10%,15%,30%,16.1% - 22%,16.1% - 22%,16.189% - 20.67%,16.189% - 20.67%]

The code I use to loop through the column and do something based on the condition is :

for i in df.column:
    if ' - ' in str(i):
        split = i.replace(' ', '').split('-')[1]
    
        print(split)

When I print split, I get the value that I want. However, I can't figure out how to update the value in the column to keep the value I want.

When I do this:

for i in df.column:
    if ' - ' in str(i):
        split = i.replace(' ', '').split('-')[1]
        df['column'][i] =split

I get: **C:\Users\lukem\AppData\Local\Temp\ipykernel_15752\833050674.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df['column'][i] =split C:\Users\lukem\AppData\Local\Temp\ipykernel_15752\833050674.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame**

Any advice is appreciated.

CodePudding user response:

you can use apply with lambda to apply it directly on the whole column:

   import pandas as pd
    df=pd.DataFrame(['10%','15%','30%','16.1% - 22%','16.1% - 22%','16.189% - 20.67%','16.189% - 20.67%'],columns=['column'])


df.column=df.column.apply(lambda x :x.split('-')[1] if ' - ' in x else x)

CodePudding user response:

For the first code you can try this instead df.apply function.

df[column]=df[column].apply(functn_name)

CodePudding user response:

You can use str.extract for this:

import pandas as pd

data = {'column': ['10%','15%','30%','16.1% - 22%','16.1% - 22%',
                   '16.189% - 20.67%','16.189% - 20.67%']}
df = pd.DataFrame(data)

df['column'] = df.column.str.extract(r'(\d \.?\d %$)')

print(df)

   column
0     10%
1     15%
2     30%
3     22%
4     22%
5  20.67%
6  20.67%

Or you could use str.split with expand=True. Apply df.ffill on axis=1 and select only the last column:

df['column'] = df.column.str.split(' - ', expand=True).ffill(axis=1).iloc[:,-1]
  • Related