Add data to a new column in pandas.DataFrame from existing columns using for-loop-CodePudding

I have a large dataframe that has temperature measurements where some of the values are missing. The values are in two separate columns, where one has the actual measurements (TEMP), while the other column has only estimated temperatures (TEMP_ESTIMATED).

I'm trying to create a new column where these 2 values are combined in a way that the new column would have the actual measurement values if the value exists (is not NaN), and otherwise the new column would have the estimated values. Example of dataframe and how I would want it to look after the for-loop.

I have tried many different ways to do this but none of them have worked so far. I'm still new to programming so I apologize if there are some obvious mistakes, just trying to learn more!

What I tried the last time but the values were not added to the new column (I have imported pandas already and all the temperature data is saved to the data.DataFrame):

for i in range(len(data)):
    if data.at[i, 'TEMP'] == 'NaN':
        data.at[i, 'TEMP_ALL'] = data.at[i, 'TEMP_ESTIMATED']
    else:
        data.at[i, 'TEMP_ALL'] = data.at[i, 'TEMP']

I would greatly appreciate any feedback on this or any alternate ways how to achieve the desired result, thank you!

CodePudding user response：

You can try using np.where:

import pandas as pd
import numpy as np

df = pd.DataFrame(data={'DATE': ['20100101', '20100102', '20100103', '20100104', '20100105'],
                        'TEMP': [np.nan, np.nan, np.nan, 15, 20],
                        'TEMP_ESTIMATED': [10, 15, 16, 17, 22]})
df = df.rename_axis('index')

df['TEMP_ALL'] = np.where(np.isnan(df.TEMP), df.TEMP_ESTIMATED, df.TEMP)

index	DATE	TEMP	TEMP_ESTIMATED	TEMP_ALL
0	20100101	nan	10	10
1	20100102	nan	15	15
2	20100103	nan	16	16
3	20100104	15	17	15
4	20100105	20	22	20

If your NaN values are strings, try:

df['TEMP_ALL'] = np.where(df.TEMP == 'NaN', df.TEMP_ESTIMATED, df.TEMP)