Home > Software engineering >  How to correctly update column of filtered dataframe in pandas avoiding chained indexing
How to correctly update column of filtered dataframe in pandas avoiding chained indexing

Time:06-12

I'm working on a pandas dataframe. The first thing is to filter and create a new dataframe(df1) from the original dataframe(df) based on number that i specify in num_posts column and user column is user1, then next step is to update the num_posts to another number, and finalize by updating df from df1.

The original df is:

df = pd.DataFrame({'num_posts': [4, 4, 3, 4, 1, 14],
                   'date': ['2020-08-09', '2020-08-25', 
                            '2020-09-05', '2020-09-12', 
                            '2020-09-29', '2020-10-15'],
                  'user': ['user1', 'user1', 'user2', 'user3', 'user4', 'user4']})


# The new filtered df1
# filter posts that equal 4 and user is user1
df1 = df.loc[(df['num_posts'] == 4) & (df['user'] == 'user1')]
df1

# overwrite the num_posts column with 10
for i in df1.index:
    df1.loc[i, 'num_posts'] = 10

# Updating the original dataframe df with df1
df.update(df1)
df

When i run my code i get the following warning displayed.

C:\Program Files\Python38\lib\site-packages\pandas\core\indexing.py:1817: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)

On opening the link in the warning message, i'm redirected to pandas official website, the issue seems to be chained indexing. I need assistance to know how to get rid of it and avoid it on successive filtering of the same original dataframe df.

CodePudding user response:

If it helps, try this:

#df1 = df.loc[(df['num_posts'] == 4)].copy()
df1 = df.loc[(df['num_posts'] == 4) & (df['user'] == 'user1')].copy()

description here

Output

   num_posts        date   user
0       10.0  2020-08-09  user1
1       10.0  2020-08-25  user1
2        3.0  2020-09-05  user2
3        4.0  2020-09-12  user3
4        1.0  2020-09-29  user4
5       14.0  2020-10-15  user4
  • Related