I want to extract one specific column as y from a pandas DataFrame.
I found two ways to do this so far:
# The First way
y_df = df[specific_column]
y_array = np.array(y_df)
X_df = df.drop(columns=[specific_column])
X_array = np.array(X_df)
# The second way
features = ['some columns in my dataset']
y_df = np.array(df.loc[:, [specific_column]].values)
X_df = df.loc[:, features].values
But when I compare the values in each y array, I see they are not equal:
y[:4]==y_array[:4]
array([[ True, True, False, False],
[ True, True, False, False],
[False, False, True, True],
[False, False, True, True]])
But I am sure that these two arrays contain the same elements:
y[:4], y_array[:4]
(array([[0],
[0],
[1],
[1]], dtype=int64),
array([0, 0, 1, 1], dtype=int64))
So, why do I see False values when I compare them together?
CodePudding user response:
If use double [[]]
get one element DataFrame and if convert to array get 2d array:
y_df = np.array(df.loc[:, [specific_column]].values)
Solution is remove []
for Series
and if convert to array get 1d array:
y_df = df[specific_column].to_numpy()
#your solution
y_df = np.array(df.loc[:, specific_column].values)