I have two dataframes.
Input data
# First df mainly consists data provided by the user
fdf = pd.DataFrame(columns=['user_data'],data=[10,14,1],index=['alpha','beta','gamma'])
user_data
alpha 10
beta 14
gamma 1
# Second df is basically a default data consisting kind of analysis I can run based on the data in the first dataframe provided the user
sdf = pd.DataFrame(columns=['AD_analysis','BGD_analysis','ABG_analysis'],
data=[[1,0,1],[0,1,1],[0,1,1],[1,1,0]],index=['alpha','beta','gamma','delta'])
sdf =
AD_analysis BGD_analysis ABG_analysis
alpha 1 0 1
beta 0 1 1
gamma 0 1 1
delta 1 1 0
# Above table basically tells us that we can do AD_analysis if alpha, delta values are given by the user in the first df
So, I want to know kind of analysis (sdf) I can run based on the data provided by the user (fdf).
Expected answer:
# Since delta is not given and I cannot run any analysis associated with this parameters
# Possible analysis with given data is
['ABG_analysis']
My approach:
# find common index
com_idx = fdf.index.intersection(sdf.index)
if len(com_idx)==3 & com_idx.isin('alpha'):
print('ABG_analysis')
if len(com_idx)==3 & com_idx.isin('delta'):
print('BGD_analysis')
if len(com_idx)==2 :
print('AD_analysis')
Too many if statements does not convince as a best pythonic approach. Can you suggest a better approach?
CodePudding user response:
Assuming you want to identify the analyses for which no required data is missing. You can use:
# get indices not provided by user
diff = sdf.index.difference(fdf.index)
# ensure they are not required for an analysis
sdf.columns[~sdf.reindex(diff).any()]
Output: Index(['ABG_analysis'], dtype='object')
If you want to ensure that all data is used (an analysis requiring only alpha and beta would be excluded):
sdf.columns[sdf.reindex(fdf.index).all()
&~sdf.loc[sdf.index.difference(fdf.index)].any()]
Used inputs:
fdf = pd.DataFrame(columns=['user_data'],data=[10,14,1],index=['alpha','beta','gamma'])
sdf = pd.DataFrame(columns=['AD_analysis','BGD_analysis','ABG_analysis'],
data=[[1,0,1],[0,1,1],[0,1,1],[1,1,0]],index=['alpha','beta','gamma','delta'])
CodePudding user response:
Get the indices provided by the use from your second table. Then subset the columns where all the arguments are equal to 1.
sdf.loc[fdf.index].eq(1).all(0).loc[lambda x:x].index
Index(['ABG_analysis'], dtype='object')