I just used .groupby and .agg to make my df as follows -
Name inclusionId
A 1 , 2
B 1 , 3
C 5 , 7
D 5 , 2 , 9 , 7 , 1
E 2 , 1 , 9
Now I want to check whether these are subsets of each other or not. Need output like below -
Name inclusionId Subset of -
A 1 , 2 E
B 1 , 3 No
C 5 , 7 D
D 5 , 2 , 9 , 7 , 1 No
E 2 , 1 , 9 D
Please help!
CodePudding user response:
With pandas you can select
- all rows and limited columns
- all columns and limited rows
- limited rows and limited columns
you can select columns like this:
dataframe['column']
or
dataframe[['column1', 'column2' ]]
now to select the rows you point out the column and set a condition that only certain rows meet like the following:
population_500 = housing[housing['population']>500]
in here we select rows having population greater than 500
you can also use dataframe.loc(row_number/s)
to select certain rows for example:
dataframe.loc[[1,5,7]]
and you can select both rows and columns also using .loc()
:
dataframe.loc[1:7,['column_1', 'column_2']]
where 1 and 7 refer to the rows numbers
you can also use .iloc()
to select a subset of rows and columns:
dataframe.iloc[[2,3,6], [3, 5]]
Hope you find this helpful!
CodePudding user response:
A little bit complicative
s = df.set_index('Name').inclusionId.str.get_dummies(',')
s = s.dot(s.T)
diag = np.diag(s).copy()
np.fill_diagonal(s.values,0)
df['new'] = s.eq(diag).T.dot(s.columns ',').str[:-1].values
Out[74]:
Name inclusionId new
0 A 1,2 D,E
1 B 1,3
2 C 5,7 D
3 D 5,2,9,7,1
4 E 2,1,9 D