I have a dataframe
employees = [('Jack', 34, 'Sydney' ) ,
('Riti', 31, 'Delhi' ) ,
('Aadi', 16, 'London') ,
('Mark', 18, 'Delhi' )]
dataFrame = pd.DataFrame( employees,
columns=['Name', 'Age', 'City'])
I would like to append this DataFrame with some new columns. I did it with:
data = ['Height', 'Weight', 'Eyecolor']
duduFrame = pd.DataFrame(columns=data)
This results in:
Name Age City Height Weight Eyecolor
0 Jack 34.0 Sydney NaN NaN NaN
1 Riti 31.0 Delhi NaN NaN NaN
2 Aadi 16.0 London NaN NaN NaN
3 Mark 18.0 Delhi NaN NaN NaN
So far so good.
Now I have new Data about Height, Weight and Eyecolor for "Riti":
Riti_data = [(172, 74, 'Brown')]
This I would like to add to dataFrame
.
I tried it with
dataFrame.loc['Riti', [duduFrame]] = Riti_data
But I get the error
ValueError: Buffer has wrong number of dimensions (expected 1, got 3)
What am I doing wrong?
CodePudding user response:
try this :
dataFrame.loc[dataFrame['Name']=='Riti', ['Height','Weight','Eyecolor']] = Riti_data
your mistake I think was not to specify the columns you did : duduFrame
instead of the data
which contains the name columns you want to add the new value
CodePudding user response:
You can do this :
df = pd.concat([dataFrame, duduFrame])
df = df.set_index('Name')
df.loc['Riti',data] = [172,74,'Brown']
Resulting in :
Age City Height Weight Eyecolor
Name
Jack 34.0 Sydney NaN NaN NaN
Riti 31.0 Delhi 172 74 Brown
Aadi 16.0 London NaN NaN NaN
Mark 18.0 Delhi NaN NaN NaN
CodePudding user response:
Pandas has a pd.concat
function, whose role is to concatenate dataframes, either vertically (axis = 0), or in your case horizontally (axis = 1).
However, I personally see merging horizontally more like a pd.merge
use-case, which gives you more flexibility on how exactly do you want the merge to happen.
In your case, you want to match Name
column, right ?
So I would do it in 2 steps:
- Build both dataframes with column
Name
and their respective data - Merge both dataframes with
pd.merge(df1, df2, on = 'Name', how = 'outer')
The how = outer
parameter makes sure that you don't lose any data from df1 or df2, in case some Name
has data in only one of both dataframes. This will be easier for you to catch errors with your data, and will make you think more in terms of SQL JOIN, which is a necessary way of thinking :).