Home > other >  Add data to a new column in pandas.DataFrame from existing columns using for-loop
Add data to a new column in pandas.DataFrame from existing columns using for-loop

Time:02-27

I have a large dataframe that has temperature measurements where some of the values are missing. The values are in two separate columns, where one has the actual measurements (TEMP), while the other column has only estimated temperatures (TEMP_ESTIMATED).

I'm trying to create a new column where these 2 values are combined in a way that the new column would have the actual measurement values if the value exists (is not NaN), and otherwise the new column would have the estimated values. Example of dataframe and how I would want it to look after the for-loop.

I have tried many different ways to do this but none of them have worked so far. I'm still new to programming so I apologize if there are some obvious mistakes, just trying to learn more!

What I tried the last time but the values were not added to the new column (I have imported pandas already and all the temperature data is saved to the data.DataFrame):

for i in range(len(data)):
    if data.at[i, 'TEMP'] == 'NaN':
        data.at[i, 'TEMP_ALL'] = data.at[i, 'TEMP_ESTIMATED']
    else:
        data.at[i, 'TEMP_ALL'] = data.at[i, 'TEMP']

I would greatly appreciate any feedback on this or any alternate ways how to achieve the desired result, thank you!

CodePudding user response:

You can try using np.where:

import pandas as pd
import numpy as np

df = pd.DataFrame(data={'DATE': ['20100101', '20100102', '20100103', '20100104', '20100105'],
                        'TEMP': [np.nan, np.nan, np.nan, 15, 20],
                        'TEMP_ESTIMATED': [10, 15, 16, 17, 22]})
df = df.rename_axis('index')

df['TEMP_ALL'] = np.where(np.isnan(df.TEMP), df.TEMP_ESTIMATED, df.TEMP)
index DATE TEMP TEMP_ESTIMATED TEMP_ALL
0 20100101 nan 10 10
1 20100102 nan 15 15
2 20100103 nan 16 16
3 20100104 15 17 15
4 20100105 20 22 20

If your NaN values are strings, try:

df['TEMP_ALL'] = np.where(df.TEMP == 'NaN', df.TEMP_ESTIMATED, df.TEMP)
  • Related