Home > Mobile >  Does .loc in Python Pandas make inplace change on the original dataframe?
Does .loc in Python Pandas make inplace change on the original dataframe?

Time:11-05

I was working on a dataframe like below:

df:

Site   Visits   Temp   Type
KFC    511      74     Food
KFC    565      77     Food
KFC    498      72     Food
K&G    300      75     Gas
K&G    255      71     Gas

I wanted to change 'Type' column into 0-1 variable so I could use df.corr() to check the correlation.

I tried two ways, one was to make a dictionary and make a new column:

dict = {'Food':1, 'Gas':0}
df['BinaryType'] = df['Type'].map(dict)

I was then able to use df.corr() to check correlation between 'Visits' and 'BinaryType'. Since 'Type' column contains strings, df.corr() would not show correlation between 'Visits' and 'Type'.

Second way was to use .loc:

df.loc[df['Type']=='Food','Type'] = 1
df.loc[df['Type']!=1,'Type'] = 0

Then I checked df in console, it was like below and it seemed an inplace change was made. I also checked the data type using df['Type'][0] and it read 1(I suppose it's integer):

Site   Visits   Temp   Type
KFC    511      74     1
KFC    565      77     1
KFC    498      72     1
K&G    300      75     0
K&G    255      71     0

Here however, df.corr() would not show correlation between 'Visits' and 'Type'! It was as if this column hadn't been changed.

You can use code below to reproduce:

df = pd.DataFrame({
    'Site': {0: 'KFC', 1: 'KFC', 2: 'KFC', 3: 'K&G', 4:'K&G'},
    'Visits': {0: 511, 1: 565, 2: 498, 3: 300, 4:255},
    'Temp': {0: 74, 1: 77, 2: 72, 3: 75, 4:71},
    'Type': {0: 'Food', 1: 'Food', 2: 'Food', 3: 'Gas', 4:'Gas'}})
# 1
dict = {'Food':1, 'Gas':0}
df['BinaryType'] = df['Type'].map(dict)
df.corr()
del df['BinaryType']

# 2
df.loc[df['Type']=='Food','Type'] = 1
df.loc[df['Type']!=1,'Type'] = 0
df.corr()

Any idea on how Pandas .loc works on the background?

CodePudding user response:

As your first method is working, you can just use:

dict = {'Food':1, 'Gas':0}
df['Type'] = df['Type'].map(dict)

CodePudding user response:

Your 2nd method doesn't actually change the dtype of the series even though the values are all ints. You can see that by doing df.dtypes which would show the Type column is still of object dtype

You need to explicitly cast them to int using an .astype(int)

OR

use df['Type'] = np.where(df['Type'] == 'Food', 1, 0)

running df.corr() after that gives

In [22]: df.corr()
Out[22]:
          Visits      Temp      Type
Visits  1.000000  0.498462  0.976714
Temp    0.498462  1.000000  0.305888
Type    0.976714  0.305888  1.000000
  • Related