Home > other >  How to find subset(s) in a dataframe column and return <subsetOf - >?
How to find subset(s) in a dataframe column and return <subsetOf - >?

Time:05-24

I just used .groupby and .agg to make my df as follows -

  Name inclusionId            
    A   1 , 2                  
    B   1 , 3                  
    C   5 , 7                  
    D   5 , 2 , 9 , 7 , 1     
    E   2 , 1 , 9              

Now I want to check whether these are subsets of each other or not. Need output like below -

 Name inclusionId            Subset of -
    A   1 , 2                  E
    B   1 , 3                  No
    C   5 , 7                  D
    D   5 , 2 , 9 , 7 , 1      No
    E   2 , 1 , 9              D

Please help!

CodePudding user response:

With pandas you can select

  1. all rows and limited columns
  2. all columns and limited rows
  3. limited rows and limited columns

you can select columns like this:

dataframe['column']

or

dataframe[['column1', 'column2' ]]

now to select the rows you point out the column and set a condition that only certain rows meet like the following:

population_500 = housing[housing['population']>500]

in here we select rows having population greater than 500

you can also use dataframe.loc(row_number/s) to select certain rows for example:

dataframe.loc[[1,5,7]]

and you can select both rows and columns also using .loc():

dataframe.loc[1:7,['column_1', 'column_2']]

where 1 and 7 refer to the rows numbers

you can also use .iloc() to select a subset of rows and columns:

dataframe.iloc[[2,3,6], [3, 5]]

Hope you find this helpful!

CodePudding user response:

A little bit complicative

s = df.set_index('Name').inclusionId.str.get_dummies(',')
s = s.dot(s.T)
diag = np.diag(s).copy()
np.fill_diagonal(s.values,0)
df['new'] = s.eq(diag).T.dot(s.columns ',').str[:-1].values
Out[74]: 
  Name inclusionId  new
0    A         1,2  D,E
1    B         1,3     
2    C         5,7    D
3    D   5,2,9,7,1     
4    E       2,1,9    D  
  • Related