Home > Software engineering >  Why can one column of the pandas DataFrame not be filled?
Why can one column of the pandas DataFrame not be filled?

Time:05-06

I'm having some problems iteratively filling a pandas DataFrame with two different types of values. As a simple example, please consider the following initialization:

IN:

df = pd.DataFrame(data=np.nan,
                  index=range(5),
                  columns=['date', 'price'])

df

OUT:

    date    price
0   NaN NaN
1   NaN NaN
2   NaN NaN
3   NaN NaN
4   NaN NaN

When I try to fill one row of the DataFrame, it won't adjust the value in the date column. Example:

IN:

df.iloc[0]['date'] = '2022-05-06'
df.iloc[0]['price'] = 100
df

OUT:


    date    price
0   NaN 100.0
1   NaN NaN
2   NaN NaN
3   NaN NaN
4   NaN NaN

I'm suspecting it has something to do with the fact that the default np.nan value cannot be replaced by a str type value, but I'm not sure how to solve it. Please note that changing the date column's type to str does not seem to make a difference.

CodePudding user response:

This doesn't work because df.iloc[0] creates a temporary Series, which is what you update, not the original DataFrame.

If you need to mix positional and label indexing you can use:

df.loc[df.index[0], 'date'] = '2022-05-06'
df.loc[df.index[0], 'price'] = 100

output:

         date  price
0  2022-05-06  100.0
1         NaN    NaN
2         NaN    NaN
3         NaN    NaN
4         NaN    NaN

CodePudding user response:

Using loc() as shown below may work better:

import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.nan,
                  index=range(5),
                  columns=['date', 'price'])
print(df)
df.loc[0, 'date'] = '2022-05-06'
df.loc[0, 'price'] = 100
print(df)

Output:

   date  price
0   NaN    NaN
1   NaN    NaN
2   NaN    NaN
3   NaN    NaN
4   NaN    NaN
         date  price
0  2022-05-06  100.0
1         NaN    NaN
2         NaN    NaN
3         NaN    NaN
4         NaN    NaN
  • Related