Home > Software engineering >  Best way to get a specific column as y in pandas DataFrame
Best way to get a specific column as y in pandas DataFrame

Time:12-14

I want to extract one specific column as y from a pandas DataFrame.
I found two ways to do this so far:

# The First way
y_df = df[specific_column]
y_array = np.array(y_df)
X_df = df.drop(columns=[specific_column])
X_array = np.array(X_df)

# The second way
features = ['some columns in my dataset']
y_df = np.array(df.loc[:, [specific_column]].values)
X_df = df.loc[:, features].values

But when I compare the values in each y array, I see they are not equal:

y[:4]==y_array[:4]

array([[ True,  True, False, False],
       [ True,  True, False, False],
       [False, False,  True,  True],
       [False, False,  True,  True]])

But I am sure that these two arrays contain the same elements:

y[:4], y_array[:4]

(array([[0],
        [0],
        [1],
        [1]], dtype=int64),
 array([0, 0, 1, 1], dtype=int64))

So, why do I see False values when I compare them together?

CodePudding user response:

If use double [[]] get one element DataFrame and if convert to array get 2d array:

y_df = np.array(df.loc[:, [specific_column]].values)

Solution is remove [] for Series and if convert to array get 1d array:

y_df = df[specific_column].to_numpy()
#your solution
y_df = np.array(df.loc[:, specific_column].values)
  • Related