Home > Blockchain >  Best way to select column with OR condition
Best way to select column with OR condition

Time:11-18

I am comparing data where the same quantities are given by different names ('radius', 'r', 'Radius[m]', etc.) in the different DataFrames I am comparing. Now I need to loop over these DataFrame and select a quantity. What would be the most elegant/clean way to select column based on an OR condition? So ideally I would do

df['r' or 'radius']

However this obviously would not work. What would be the closest to this that would work?

CodePudding user response:

DataFrame.rename

Assuming each dataframe will only contain one of those quantities, you can rename all the possible quantities to r, so then you can always access as df['r']:

df = df.rename(columns={'radius': 'r', 'Radius[m]': 'r'})

Example:

df = pd.DataFrame({'Radius[m]': [42,13,100], 'foo': [0,1,2]})
df = df.rename(columns={'radius': 'r', 'Radius[m]': 'r'})
df['r']

# 0    42
# 1    38
# 2   100
# Name: r, dtype: int64

Index.intersection

If there are potentially multiple quantities in a dataframe, use Index.intersection to find the overlapping names:

quantities = ['r', 'radius', 'Radius[m]']
columns = df.columns.intersection(pd.Index(quantities))

Examples:

df1 = pd.DataFrame({'r': [1,2,3], 'foo': [20,40,60]})
columns = df1.columns.intersection(pd.Index(quantities))
df1[columns]

#    r
# 0  1
# 1  2
# 2  3
df2 = pd.DataFrame({'radius': [100,200,300], 'foo': [0,1,2]})
columns = df2.columns.intersection(pd.Index(quantities))
df2[columns]

#    radius
# 0     100
# 1     200
# 2     300
  • Related