Home > Software design >  Insert new data to dataframe
Insert new data to dataframe


I have a dataframe

employees = [('Jack',    34, 'Sydney'   ) ,
            ('Riti',    31, 'Delhi'    ) ,
            ('Aadi',    16, 'London') ,
            ('Mark',  18, 'Delhi'  )]
dataFrame = pd.DataFrame(  employees, 
                    columns=['Name', 'Age', 'City'])

I would like to append this DataFrame with some new columns. I did it with:

    data = ['Height', 'Weight', 'Eyecolor']
    duduFrame = pd.DataFrame(columns=data)

This results in:

    Name    Age     City    Height  Weight  Eyecolor
0   Jack    34.0    Sydney  NaN     NaN     NaN
1   Riti    31.0    Delhi   NaN     NaN     NaN
2   Aadi    16.0    London  NaN     NaN     NaN
3   Mark    18.0    Delhi   NaN     NaN     NaN

So far so good.

Now I have new Data about Height, Weight and Eyecolor for "Riti":

Riti_data = [(172, 74, 'Brown')]

This I would like to add to dataFrame.

I tried it with

dataFrame.loc['Riti', [duduFrame]] = Riti_data

But I get the error

ValueError: Buffer has wrong number of dimensions (expected 1, got 3)

What am I doing wrong?

CodePudding user response:

try this :

dataFrame.loc[dataFrame['Name']=='Riti', ['Height','Weight','Eyecolor']] = Riti_data

your mistake I think was not to specify the columns you did : duduFrame instead of the data which contains the name columns you want to add the new value

CodePudding user response:

You can do this :

df = pd.concat([dataFrame, duduFrame])
df = df.set_index('Name')
df.loc['Riti',data] = [172,74,'Brown']

Resulting in :

       Age    City Height Weight Eyecolor
Jack  34.0  Sydney    NaN    NaN      NaN
Riti  31.0   Delhi    172     74    Brown
Aadi  16.0  London    NaN    NaN      NaN
Mark  18.0   Delhi    NaN    NaN      NaN

CodePudding user response:

Pandas has a pd.concat function, whose role is to concatenate dataframes, either vertically (axis = 0), or in your case horizontally (axis = 1).

However, I personally see merging horizontally more like a pd.merge use-case, which gives you more flexibility on how exactly do you want the merge to happen.

In your case, you want to match Name column, right ?

So I would do it in 2 steps:

  • Build both dataframes with column Name and their respective data
  • Merge both dataframes with pd.merge(df1, df2, on = 'Name', how = 'outer')

The how = outer parameter makes sure that you don't lose any data from df1 or df2, in case some Name has data in only one of both dataframes. This will be easier for you to catch errors with your data, and will make you think more in terms of SQL JOIN, which is a necessary way of thinking :).

  • Related