Home > Back-end >  Trying to create a test dataframe with combination of some values
Trying to create a test dataframe with combination of some values

Time:08-10

I have 8 columns, col_1 to col_5 can have either a valid value or -1 indicating any value and I have lists which contain values that these columns can take. There are 3 other columns col_6 to col_8 which can take any valid value which is defined in some list. I want to create all combinations of col_1 to col_5 having -1 and fill remaining with random valid values. col_6 to col_8 can take any random value from the list.

Example with 2 columns not having -1 and 2 columns having -1:

enter image description here

Example with 2 columns not having -1 and 3 columns having -1:

enter image description here

Valid means any value that is sampled from the list of values.

For my case I want to have rows having -1 for the below columns then randomly sample some values for the other columns

[(),
 ('col_1',),
 ('col_2',),
 ('col_3',),
 ('col_4',),
 ('col_5',),
 ('col_1', 'col_2'),
 ('col_1', 'col_3'),
 ('col_1', 'col_4'),
 ('col_1', 'col_5'),
 ('col_2', 'col_3'),
 ('col_2', 'col_4'),
 ('col_2', 'col_5'),
 ('col_3', 'col_4'),
 ('col_3', 'col_5'),
 ('col_4', 'col_5'),
 ('col_1', 'col_2', 'col_3'),
 ('col_1', 'col_2', 'col_4'),
 ('col_1', 'col_2', 'col_5'),
 ('col_1', 'col_3', 'col_4'),
 ('col_1', 'col_3', 'col_5'),
 ('col_1', 'col_4', 'col_5'),
 ('col_2', 'col_3', 'col_4'),
 ('col_2', 'col_3', 'col_5'),
 ('col_2', 'col_4', 'col_5'),
 ('col_3', 'col_4', 'col_5'),
 ('col_1', 'col_2', 'col_3', 'col_4'),
 ('col_1', 'col_2', 'col_3', 'col_5'),
 ('col_1', 'col_2', 'col_4', 'col_5'),
 ('col_1', 'col_3', 'col_4', 'col_5'),
 ('col_2', 'col_3', 'col_4', 'col_5'),
 ('col_1', 'col_2', 'col_3', 'col_4', 'col_5')]

CodePudding user response:

Use:

from  itertools import product

col_1_li = 'a,b,c'.split(',')
col_2_li = 'd,e,f'.split(',')
col_3_li = 'g,h,i'.split(',')
col_4_li = 'j,k,l'.split(',')

#columns with -1
L1 = [col_1_li, col_2_li]
#columns without -1
L2 = [col_3_li, col_4_li]

#get all combinations by length of -1 columns
c = list(product([True, False], repeat=len(L1)))
print (c)
[(True, True), (True, False), (False, True), (False, False)]

#generated random combinations - size is by number of combinations
L = [np.random.choice(x, size=len(c)) for x in L1   L2]

#generate DataFrame
df = pd.DataFrame(dict(enumerate(L))).add_prefix('col_')

#added -1 by combinations
df.iloc[:, :len(L1)] = df.iloc[:, :len(L1)].where(pd.DataFrame(c).add_prefix('col_'), -1)

print (df)
  col_0 col_1 col_2 col_3
0     b     e     i     k
1     a    -1     g     k
2    -1     d     g     l
3    -1    -1     h     l

from  itertools import product

col_1_li = 'a,b,c'.split(',')
col_2_li = 'd,e,f'.split(',')
col_3_li = 'g,h,i'.split(',')
col_4_li = 'j,k,l'.split(',')
col_5_li = 'x,y,z'.split(',')

#columns with -1
L1 = [col_1_li, col_2_li, col_5_li]
#columns without -1
L2 = [col_3_li, col_4_li]

#get all combinations by length of -1 columns
c = list(product([True, False], repeat=len(L1)))
print (c)
[(True, True, True), (True, True, False), (True, False, True), 
 (True, False, False), (False, True, True), (False, True, False),
 (False, False, True), (False, False, False)]

#generated random combinations - size is by number of combinations
L = [np.random.choice(x, size=len(c)) for x in L1   L2]

#generate DataFrame
df = pd.DataFrame(dict(enumerate(L))).add_prefix('col_')

#added -1 by combinations
df.iloc[:, :len(L1)] = df.iloc[:, :len(L1)].where(pd.DataFrame(c).add_prefix('col_'), -1)

print (df)
  col_0 col_1 col_2 col_3 col_4
0     b     d     y     i     k
1     b     f    -1     h     l
2     c    -1     z     g     k
3     a    -1    -1     h     k
4    -1     e     x     h     l
5    -1     e    -1     g     k
6    -1    -1     y     g     j
7    -1    -1    -1     h     k
  • Related