I would like use a for loop over a list of pandas dataframes, to make similar changes to each one. But the changes don't seem to take effect globally.
For a simpler example that works as expected (but with python lists instead of dataframes): the following works as I would expect it to:
Example 1:
u=[68, 82, 75]
v=[92, 54, 71, 56]
for x in [u,v]:
x[0]=100
print(u)
print(v)
The 0th entry in both $u$ and $v$ have been updated to 100, as I expected.
But when I try to do something similar with pandas dataframes, the updates don't seem to stick:
Example 2:
import pandas as pd
data_current = [['tom', 72], ['nick', 77], ['julie', 68]]
data_desired = [['mary', 65], ['john', 73], ['Alex', 74]]
# Create the pandas DataFrames
df_current = pd.DataFrame(data_current, columns=['Name', 'Height'])
df_desired = pd.DataFrame(data_desired, columns=['Name', 'Height'])
#go through both dataframes and keep only those with height > 70
for df in [df_current, df_desired]:
df=df[df['Height']>70]
print("Current Roster:")
print(df_current)
print("Desired Roster:")
print(df_desired)
I would have expected the final two printouts to only include rows where the height was >70, but no rows have been excluded. I.e., the dataframe adjustments in the for loop haven't taken effect globally.
I think I can cobble together a way to do it based on other SO answers, but I would like to understand why Example 1 works as I expect, but Example 2 does not.
CodePudding user response:
The thing is that the list does not contain the original references to DataFrame objects but their copies.
This can be fixed through initially creating DataFrame objects straight as the list values.
import pandas as pd
data_current = [['tom', 72], ['nick', 77], ['julie', 68]]
data_desired = [['mary', 65], ['john', 73], ['Alex', 74]]
# Create the pandas DataFrames
l = [pd.DataFrame(data_current, columns=['Name', 'Height']), pd.DataFrame(data_desired, columns=['Name', 'Height'])]
#go through both dataframes and keep only those with height > 70
for i in range(len(l)):
l[i] = l[i][l[i]['Height']>70].copy()
print("Current Roster:")
print(l[0])
print("Desired Roster:")
print(l[1])
Current Roster:
Name Height
0 tom 72
1 nick 77
Desired Roster:
Name Height
1 john 73
2 Alex 74
CodePudding user response:
use local
func
import pandas as pd
data_current = [['tom', 72], ['nick', 77], ['julie', 68]]
data_desired = [['mary', 65], ['john', 73], ['Alex', 74]]
df_current = pd.DataFrame(data_current, columns=['Name', 'Height'])
df_desired = pd.DataFrame(data_desired, columns=['Name', 'Height'])
dfs = ['current', 'desired']
for i in dfs:
locals()['df_{}'.format(i)] = locals()['df_{}'.format(i)].query('Height > 70')
print("Current Roster:")
print(df_current)
print("Desired Roster:")
print(df_desired)
but i recommend following:
import pandas as pd
df_current = pd.DataFrame(data_current, columns=['Name', 'Height']).query('Height > 70')
df_desired = pd.DataFrame(data_desired, columns=['Name', 'Height']).query('Height > 70')
print("Current Roster:")
print(df_current)
print("Desired Roster:")
print(df_desired)