So I am stuck in sorting out this problem and I have this data of email ID and there respective value as 0 & 1 (corresponding tag values used in Logistic Regression). The data is as follows:
input_x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
input_y = np.array([0,1,0,0,1,1,1,0,0,0,0,1,0,1,0])
Now I want to split this data into two sets where I have all 0's values and corresponding "input_x" values in one set and all 1's values and corresponding "input_x" values in other set. For that I have made this function:
def split_data(x,y):
shpx = x.shape[0]
shpy = y.shape[0]
neg_data = 0
pos_data = 0
for i in range(shpy):
if y[i] == 0:
neg_data = neg_data 1
else:
pos_data = pos_data 1
print(f"Number of negative (0) values = {neg_data}")
print(f"Number of positive (1) values = {pos_data}")
emp_neg_data_x = np.zeros(neg_data)
emp_neg_data_y = np.zeros(neg_data)
emp_pos_data_x = np.zeros(pos_data)
emp_pos_data_y = np.zeros(pos_data)
for j in range(neg_data):
for k in range(shpx):
if y[k] == 0:
emp_neg_data_x[j] = x[j]
emp_neg_data_y[j] = 0
else:
pass
for m in range(pos_data):
for n in range(shpx):
if y[n] == 0:
emp_pos_data_x[m] = x[m]
emp_pos_data_y[m] = 1
else:
pass
return emp_neg_data_x,emp_neg_data_y,emp_pos_data_x,emp_pos_data_y
Where args x & y are input arrays. Upon running this function I get this result as:
Number of negative (0) values = 9
Number of positive (1) values = 6
[1. 2. 3. 4. 5. 6. 7. 8. 9.]
[0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 2. 3. 4. 5. 6.]
[1. 1. 1. 1. 1. 1.]
The emp_neg_data_y and emp_pos_data_y give correct values but the other two arrays simply output the sequenced index value and not the value of email_idx/input_x corresponding to 0 and 1. Can you help me out? (I guess there is a problem in loop but I am stuck...)
CodePudding user response:
First make a dictionary of x and y:
input_x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
input_y = np.array([0,1,0,0,1,1,1,0,0,0,0,1,0,1,0])
y_dict = {x: input_y[x-1] for x in input_x}
Create your lists and print:
emp_neg_data_x = [x for x, y in y_dict.items() if y == 0]
emp_neg_data_y = [y for x, y in y_dict.items() if y == 0]
emp_pos_data_x = [x for x, y in y_dict.items() if y == 1]
emp_pos_data_y = [y for x, y in y_dict.items() if y == 1]
print(emp_neg_data_x)
print(emp_neg_data_y)
print(emp_pos_data_x)
print(emp_pos_data_y)
Output:
[1, 3, 4, 8, 9, 10, 11, 13, 15]
[0, 0, 0, 0, 0, 0, 0, 0, 0]
[2, 5, 6, 7, 12, 14]
[1, 1, 1, 1, 1, 1]