I have 8 columns, col_1 to col_5 can have either a valid value or -1 indicating any value and I have lists which contain values that these columns can take. There are 3 other columns col_6 to col_8 which can take any valid value which is defined in some list. I want to create all combinations of col_1 to col_5 having -1 and fill remaining with random valid values. col_6 to col_8 can take any random value from the list.
Example with 2 columns not having -1 and 2 columns having -1:
Example with 2 columns not having -1 and 3 columns having -1:
Valid means any value that is sampled from the list of values.
For my case I want to have rows having -1 for the below columns then randomly sample some values for the other columns
[(),
('col_1',),
('col_2',),
('col_3',),
('col_4',),
('col_5',),
('col_1', 'col_2'),
('col_1', 'col_3'),
('col_1', 'col_4'),
('col_1', 'col_5'),
('col_2', 'col_3'),
('col_2', 'col_4'),
('col_2', 'col_5'),
('col_3', 'col_4'),
('col_3', 'col_5'),
('col_4', 'col_5'),
('col_1', 'col_2', 'col_3'),
('col_1', 'col_2', 'col_4'),
('col_1', 'col_2', 'col_5'),
('col_1', 'col_3', 'col_4'),
('col_1', 'col_3', 'col_5'),
('col_1', 'col_4', 'col_5'),
('col_2', 'col_3', 'col_4'),
('col_2', 'col_3', 'col_5'),
('col_2', 'col_4', 'col_5'),
('col_3', 'col_4', 'col_5'),
('col_1', 'col_2', 'col_3', 'col_4'),
('col_1', 'col_2', 'col_3', 'col_5'),
('col_1', 'col_2', 'col_4', 'col_5'),
('col_1', 'col_3', 'col_4', 'col_5'),
('col_2', 'col_3', 'col_4', 'col_5'),
('col_1', 'col_2', 'col_3', 'col_4', 'col_5')]
CodePudding user response:
Use:
from itertools import product
col_1_li = 'a,b,c'.split(',')
col_2_li = 'd,e,f'.split(',')
col_3_li = 'g,h,i'.split(',')
col_4_li = 'j,k,l'.split(',')
#columns with -1
L1 = [col_1_li, col_2_li]
#columns without -1
L2 = [col_3_li, col_4_li]
#get all combinations by length of -1 columns
c = list(product([True, False], repeat=len(L1)))
print (c)
[(True, True), (True, False), (False, True), (False, False)]
#generated random combinations - size is by number of combinations
L = [np.random.choice(x, size=len(c)) for x in L1 L2]
#generate DataFrame
df = pd.DataFrame(dict(enumerate(L))).add_prefix('col_')
#added -1 by combinations
df.iloc[:, :len(L1)] = df.iloc[:, :len(L1)].where(pd.DataFrame(c).add_prefix('col_'), -1)
print (df)
col_0 col_1 col_2 col_3
0 b e i k
1 a -1 g k
2 -1 d g l
3 -1 -1 h l
from itertools import product
col_1_li = 'a,b,c'.split(',')
col_2_li = 'd,e,f'.split(',')
col_3_li = 'g,h,i'.split(',')
col_4_li = 'j,k,l'.split(',')
col_5_li = 'x,y,z'.split(',')
#columns with -1
L1 = [col_1_li, col_2_li, col_5_li]
#columns without -1
L2 = [col_3_li, col_4_li]
#get all combinations by length of -1 columns
c = list(product([True, False], repeat=len(L1)))
print (c)
[(True, True, True), (True, True, False), (True, False, True),
(True, False, False), (False, True, True), (False, True, False),
(False, False, True), (False, False, False)]
#generated random combinations - size is by number of combinations
L = [np.random.choice(x, size=len(c)) for x in L1 L2]
#generate DataFrame
df = pd.DataFrame(dict(enumerate(L))).add_prefix('col_')
#added -1 by combinations
df.iloc[:, :len(L1)] = df.iloc[:, :len(L1)].where(pd.DataFrame(c).add_prefix('col_'), -1)
print (df)
col_0 col_1 col_2 col_3 col_4
0 b d y i k
1 b f -1 h l
2 c -1 z g k
3 a -1 -1 h k
4 -1 e x h l
5 -1 e -1 g k
6 -1 -1 y g j
7 -1 -1 -1 h k