How to find subset(s) in a dataframe column and return <subsetOf

I just used .groupby and .agg to make my df as follows -

  Name inclusionId            
    A   1 , 2                  
    B   1 , 3                  
    C   5 , 7                  
    D   5 , 2 , 9 , 7 , 1     
    E   2 , 1 , 9

Now I want to check whether these are subsets of each other or not. Need output like below -

 Name inclusionId            Subset of -
    A   1 , 2                  E
    B   1 , 3                  No
    C   5 , 7                  D
    D   5 , 2 , 9 , 7 , 1      No
    E   2 , 1 , 9              D

Please help!

CodePudding user response：

With pandas you can select

all rows and limited columns
all columns and limited rows
limited rows and limited columns

you can select columns like this:

dataframe['column']

dataframe[['column1', 'column2' ]]

now to select the rows you point out the column and set a condition that only certain rows meet like the following:

population_500 = housing[housing['population']>500]

in here we select rows having population greater than 500

you can also use dataframe.loc(row_number/s) to select certain rows for example:

dataframe.loc[[1,5,7]]

and you can select both rows and columns also using .loc():

dataframe.loc[1:7,['column_1', 'column_2']]

where 1 and 7 refer to the rows numbers

you can also use .iloc() to select a subset of rows and columns:

dataframe.iloc[[2,3,6], [3, 5]]

Hope you find this helpful!

CodePudding user response：

A little bit complicative

s = df.set_index('Name').inclusionId.str.get_dummies(',')
s = s.dot(s.T)
diag = np.diag(s).copy()
np.fill_diagonal(s.values,0)
df['new'] = s.eq(diag).T.dot(s.columns ',').str[:-1].values
Out[74]: 
  Name inclusionId  new
0    A         1,2  D,E
1    B         1,3     
2    C         5,7    D
3    D   5,2,9,7,1     
4    E       2,1,9    D