Home > Software engineering >  Dealing with "view vs copy" in pandas
Dealing with "view vs copy" in pandas

Time:06-29

I'm trying to decode my dataframe through the following code:

 df = pd.read_sql_table('mytable',con) 


 for column in df.columns :
     for i in range(len(df[column])):
         if type(df[column][i]) == bytearray or type(df[column][i]) == bytes:
             df[column][i] = str(df[column][i], 'utf-8')

but I keep getting SettingWithCopy warnings no matter what I try

Anyone know how to deal with this warning ?

UPDATE:

I've end up settling for this:

if df[column].dtype == 'object':
    df[column] = df[column].apply(lambda x: x.decode('utf-8') if isinstance(x, bytes) else x)

Thanks for the help!

CodePudding user response:

A few ways to improve this:

  1. It looks like you are converting the whole column to string so you don't need to loop through each value of the column.
  2. You can use the inbuilt pd.Series.astype() method which is more efficient than str() as it is vectorized (i.e. you can call it on the whole Series).
  3. Use .loc to avoid the setting with copy warning.

So your code will look like:

 for column in df.columns :
    df.loc[column, :] = df[column].astype(str)

Note that str type will be encoded as utf-8 in all but very old versions of Python. However if you are using 2.x you can do df[column].astype('unicode').

  • Related