Python Find columns of second dataframe with matching index to first datframe-CodePudding

I have two dataframes.

Input data

# First df mainly consists data provided by the user 

fdf = pd.DataFrame(columns=['user_data'],data=[10,14,1],index=['alpha','beta','gamma'])

        user_data
alpha   10
beta    14
gamma   1

# Second df is basically a default data consisting kind of analysis I can run based on the data in the first dataframe provided the user
sdf = pd.DataFrame(columns=['AD_analysis','BGD_analysis','ABG_analysis'],
             data=[[1,0,1],[0,1,1],[0,1,1],[1,1,0]],index=['alpha','beta','gamma','delta'])
sdf = 
         AD_analysis    BGD_analysis    ABG_analysis
alpha          1           0                 1
beta           0           1                 1
gamma          0           1                 1
delta          1           1                 0
# Above table basically tells us that we can do AD_analysis if alpha, delta values are given by the user in the first df

So, I want to know kind of analysis (sdf) I can run based on the data provided by the user (fdf).

Expected answer:

# Since delta is not given and I cannot run any analysis associated with this parameters
# Possible analysis with given data is 
['ABG_analysis']

My approach:

# find common index
com_idx = fdf.index.intersection(sdf.index)

if len(com_idx)==3 & com_idx.isin('alpha'):
    print('ABG_analysis')
if len(com_idx)==3 & com_idx.isin('delta'):
    print('BGD_analysis')
if len(com_idx)==2 :
    print('AD_analysis')

Too many if statements does not convince as a best pythonic approach. Can you suggest a better approach?

CodePudding user response：

Assuming you want to identify the analyses for which no required data is missing. You can use:

# get indices not provided by user
diff = sdf.index.difference(fdf.index)

# ensure they are not required for an analysis
sdf.columns[~sdf.reindex(diff).any()]

Output: Index(['ABG_analysis'], dtype='object')

If you want to ensure that all data is used (an analysis requiring only alpha and beta would be excluded):

sdf.columns[sdf.reindex(fdf.index).all() 
           &~sdf.loc[sdf.index.difference(fdf.index)].any()]

Used inputs:

fdf = pd.DataFrame(columns=['user_data'],data=[10,14,1],index=['alpha','beta','gamma'])

sdf = pd.DataFrame(columns=['AD_analysis','BGD_analysis','ABG_analysis'],
             data=[[1,0,1],[0,1,1],[0,1,1],[1,1,0]],index=['alpha','beta','gamma','delta'])

CodePudding user response：

Get the indices provided by the use from your second table. Then subset the columns where all the arguments are equal to 1.

sdf.loc[fdf.index].eq(1).all(0).loc[lambda x:x].index

Index(['ABG_analysis'], dtype='object')