This question is probably very simple, but I seem to be having trouble creating a new column in a dataframe and filling that column with a numpy array. I have an array i.e. [0,0,0,1,0,1,1] and a dataframe that has the same number of rows as the length of that array. I want to add a column and I have been doing this:
df['new_col'] = array
however I get the following warning error:
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
I tried to do df.loc[:,'new_col'] = array
but get the same warning error. I also tried:
df.loc['new_col'] = pd.Series(array, index = df.index)
based on a different answer from a question a different user asked. Does anyone know a "better" way to code this? Or should I just ignore the warning messages?
CodePudding user response:
Code from https://www.geeksforgeeks.org/adding-new-column-to-existing-dataframe-in-pandas/
Import pandas package
import pandas as pd
Define a dictionary containing data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2]
}
Convert the dictionary into DataFrame
original_df = pd.DataFrame(data)
Using 'Qualification' as the column name and equating it to the list
altered_df = original_df.assign(Qualification = ['Msc', 'MA', 'Msc', 'Msc'])
Observe the result
altered_df
CodePudding user response:
The DataFrame expects a list input (Each column is like a dictionary with columns as keys and a list as values)
Try this using the tolist()
method on the numpy array:
df['new_col'] = array.tolist()