I'm trying to decode my dataframe through the following code:
df = pd.read_sql_table('mytable',con)
for column in df.columns :
for i in range(len(df[column])):
if type(df[column][i]) == bytearray or type(df[column][i]) == bytes:
df[column][i] = str(df[column][i], 'utf-8')
but I keep getting SettingWithCopy warnings no matter what I try
Anyone know how to deal with this warning ?
UPDATE:
I've end up settling for this:
if df[column].dtype == 'object':
df[column] = df[column].apply(lambda x: x.decode('utf-8') if isinstance(x, bytes) else x)
Thanks for the help!
CodePudding user response:
A few ways to improve this:
- It looks like you are converting the whole column to string so you don't need to loop through each value of the column.
- You can use the inbuilt
pd.Series.astype()
method which is more efficient thanstr()
as it is vectorized (i.e. you can call it on the whole Series). - Use
.loc
to avoid the setting with copy warning.
So your code will look like:
for column in df.columns :
df.loc[column, :] = df[column].astype(str)
Note that str
type will be encoded as utf-8
in all but very old versions of Python. However if you are using 2.x you can do df[column].astype('unicode')
.