Home > database >  How do I append to a Pandas DataFrame column?
How do I append to a Pandas DataFrame column?

Time:10-30

I have a DataFrame that I am adding data to, but I do not gather the data for each row at the same time. Some of the columns are assembled in pieces.

The first time I have some data for a column, I can simply assign it:

import pandas as pd

df = pd.DataFrame()
df["name"] = pd.Series(["A", "B", "C", "D", "E", "F", "G", "H", "I"])
df["data"] = pd.Series([1, 2, 3])

new_data = pd.Series([4, 5, 6])

How can I append new_data to the "data" column?

Expected output:

  name  data
0    A   1.0
1    B   2.0
2    C   3.0
3    D   4.0
4    E   5.0
5    F   6.0
6    G   NaN
7    H   NaN
8    I   NaN

Things I've tried:

df["data"]  = new_data
# Adds instead of appending


df["data"] = df["data"].append(new_data)
# ValueError: cannot reindex on an axis with duplicate labels


df["data"] = pd.concat([df["data"], new_data])
# ValueError: cannot reindex on an axis with duplicate labels


column = df["data"]
df.drop(columns="data")
df["data"] = pd.concat([column, new_data])
# ValueError: cannot reindex on an axis with duplicate labels


df["data"] = pd.concat([df["data"], new_data], ignore_index=True)
# This doesn't appear to modify the column (but no errors)


df["data"] = pd.concat([df["data"], new_data], axis=1)
# ValueError: Columns must be same length as key

How can I accomplish this deceptively simple task? I believe I am not understanding how indexing works in Pandas.

CodePudding user response:

Before concat to original value, you need drop the NaN first first

df['data'] = pd.concat([df['data'].dropna(), new_data], ignore_index=True)
print(df)

  name  data
0    A   1.0
1    B   2.0
2    C   3.0
3    D   4.0
4    E   5.0
5    F   6.0
6    G   NaN
7    H   NaN
8    I   NaN

If the new concated Series is longer than original, you can join/concat the result back to original dataframe

out = (df.drop(columns='data')
       .join(pd.concat([df['data'].dropna(), new_data], ignore_index=True).to_frame('data'), how='outer'))
# or
out = (pd.concat([df.drop(columns='data'),
                  pd.concat([df['data'].dropna(), new_data], ignore_index=True).to_frame('data')], axis=1))
print(out)

   name  data
0     A   1.0
1     B   2.0
2     C   3.0
3     D   4.0
4     E   5.0
5     F   6.0
6     G   9.0
7     H  10.0
8     I  11.0
9   NaN  11.0
10  NaN  12.0

CodePudding user response:

Each cell in a dataframe has a row label (aka index) and a column label. This assign cell whose at row labels 3, 4, 5 and column label data the values 4, 5, 6:

df.loc[3:5, "data"] = [4,5,6]

However, if your right hand side is a Series, the rows 3 to 5 will be NaN:

df.loc[3:5, "data"] = [4,5,6] # NaN

This is because pandas try to match the label on the left hand side and the right hand side. The labels on the left are 3, 4, 5. The labels on the right are 0, 1, 2 (default when you do not specify an index). So nothing matches and hence NaN.

  • Related