I want to generate Test Data for my Bayesian Network. This is my current Code:
data = np.random.randint(2, size=(5, 6))
columns = ['p_1', 'p_2', 'OP1', 'OP2', 'OP3', 'OP4']
df = pd.DataFrame(data=data, columns=columns)
df.loc[(df['p_1'] == 1) & (df['p_2'] == 1), 'OP1'] = 1
df.loc[(df['p_1'] == 1) & (df['p_2'] == 0), 'OP2'] = 1
df.loc[(df['p_1'] == 0) & (df['p_2'] == 1), 'OP3'] = 1
df.loc[(df['p_1'] == 0) & (df['p_2'] == 0), 'OP4'] = 1
print(df)
So every time, for example, p_1 has a 1 and p_2 has a 1, the OP1 should be 1 as well, all the other values should output 0 in the column. When p_1 is 1 and p_2 is 0, then OP2 should be 1 an d all others 0, and so on.
But my current Output is the following:
p_1 | p_2 | OP1 | OP2 | OP3 | OP4 | |
---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 1 | |
1 | 0 | 1 | 1 | 1 | 1 | |
0 | 0 | 1 | 1 | 0 | 1 | |
0 | 1 | 1 | 1 | 1 | 1 | |
1 | 0 | 0 | 1 | 1 | 0 |
Is there any way to fix it? What did I do wrong?
I didn't really understand the solutions to other peoples questions, so I thought Id ask here.
I hope that someone can help me.
CodePudding user response:
The problem is that when you instantiate df
, the "OP" columns already have some values:
data = np.random.randint(2, size=(5, 6))
columns = ['p_1', 'p_2', 'OP1', 'OP2', 'OP3', 'OP4']
df = pd.DataFrame(data=data, columns=columns)
df
p_1 p_2 OP1 OP2 OP3 OP4
0 1 1 0 1 0 0
1 0 0 1 1 0 1
2 0 1 1 1 0 0
3 1 1 1 1 0 1
4 0 1 1 0 1 0
One way of fixing it with your code is forcing all "OP" columns to 0 before:
df["OP1"] = df["OP2"] = df["OP3"] df["OP4"] = 0
But then you are generating too many random numbers. I'd do this instead:
data = np.random.randint(2, size=(5, 2))
columns = ['p_1', 'p_2']
df = pd.DataFrame(data=data, columns=columns)
df["OP1"] = ((df['p_1'] == 0) & (df['p_2'] == 1)).astype(int)
CodePudding user response:
You can defined tuples for test and create new columns by casting values of mask to inetegers for True/False
to 1/0
mapping:
vals = [(1,1),(1,0),(0,1),(0,0)]
for i, (a, b) in enumerate(vals, 1):
df[f'OP{i}'] = ((df['p_1'] == a) & (df['p_2'] == b)).astype(int)
print(df)
p_1 p_2 OP1 OP2 OP3 OP4
0 0 0 0 0 0 1
1 0 1 0 0 1 0
2 0 1 0 0 1 0
3 0 1 0 0 1 0
4 1 0 0 1 0 0
In your solution set 0
first, because already are set 1
values in original DataFrame
:
cols = ['OP1', 'OP2', 'OP3', 'OP4']
df[cols] = 0