I am comparing data where the same quantities are given by different names ('radius', 'r', 'Radius[m]', etc.) in the different DataFrames I am comparing. Now I need to loop over these DataFrame and select a quantity. What would be the most elegant/clean way to select column based on an OR
condition? So ideally I would do
df['r' or 'radius']
However this obviously would not work. What would be the closest to this that would work?
CodePudding user response:
DataFrame.rename
Assuming each dataframe will only contain one of those quantities, you can rename
all the possible quantities to r
, so then you can always access as df['r']
:
df = df.rename(columns={'radius': 'r', 'Radius[m]': 'r'})
Example:
df = pd.DataFrame({'Radius[m]': [42,13,100], 'foo': [0,1,2]})
df = df.rename(columns={'radius': 'r', 'Radius[m]': 'r'})
df['r']
# 0 42
# 1 38
# 2 100
# Name: r, dtype: int64
Index.intersection
If there are potentially multiple quantities in a dataframe, use Index.intersection
to find the overlapping names:
quantities = ['r', 'radius', 'Radius[m]']
columns = df.columns.intersection(pd.Index(quantities))
Examples:
df1 = pd.DataFrame({'r': [1,2,3], 'foo': [20,40,60]})
columns = df1.columns.intersection(pd.Index(quantities))
df1[columns]
# r
# 0 1
# 1 2
# 2 3
df2 = pd.DataFrame({'radius': [100,200,300], 'foo': [0,1,2]})
columns = df2.columns.intersection(pd.Index(quantities))
df2[columns]
# radius
# 0 100
# 1 200
# 2 300