I’ll illustrate my problem with a drawing:
I have a pandas dataframe with 13 columns of 6 different types. Then I randomly want to take one of each type and create a new table to perform subsequent analyses. So in the end I want to create (3 choose 1) * 1 * (2 choose 1) * (2 choose 1) * (4 choose 1) * 1 = 48 new dataframes out of one pandas dataframe.
The columns don't have specific names, but it could be for example: A1, A2, A3, B1, C1, C2, D1, D2, E1, E2, E3, E4, F1
Has anyone an idea how to implement this problem in Python?
CodePudding user response:
If you can separate column names to lists according to their types, then your problem becomes a question of finding the Cartesian product of these lists. Once you find the Cartesian product, you can iterate over it and filter your DataFrame with a combination of column names (there are (3 choose 1) * 1 * (2 choose 1) * (2 choose 1) * (4 choose 1) * 1 = 48
of them).
A_cols = ['A1','A2','A3']
B_cols = ['B1']
C_cols = ['C1','C2']
D_cols = ['D1','D2']
E_cols = ['E1','E2','E3','E4']
F_cols = ['F1']
# column_combos is length 48
column_combos = pd.MultiIndex.from_product([A_cols,B_cols,C_cols,D_cols,E_cols,F_cols])
# out is a dictionary of 48 DataFrames
out = {';'.join(cols): df[[*cols]] for cols in column_combos}