I have a dataframe with 2 columns. How can I create a third column which:
- Takes either col1 or col2 value if either exists
- Takes mean if both exists
- Keeps NaN if neither exists
And finally I want to store it in df['col3']. I tried this, but the values are wrong.
df['col3']=pd.concat([df['col2'], df['col1']]).groupby(level=0).mean()
How can I do this?
time col1 col2
2000-01-31 389.5400 NaN
2000-02-29 387.7700 NaN
2000-03-31 386.6600 250.2
2000-04-30 384.1850 NaN
2000-05-31 383.3600 267.2
... ... ...
2020-03-31 396.3755 NaN
2020-04-30 NaN 350.12
2020-05-31 395.0485 NaN
2020-06-30 394.9400 396.321
2020-07-31 395.3070 NaN
CodePudding user response:
The answer is surprisingly simple:
df['col3'] = df[['col1', 'col2']].mean(axis=1)
This is due to the fact that mean
ignores the NaN by default (skipna=True
), so if you have only one value, the mean is the value itself, if only NaNs, the output is a NaN
Output:
time col1 col2 col3
0 2000-01-31 389.5400 NaN 389.5400
1 2000-02-29 387.7700 NaN 387.7700
2 2000-03-31 386.6600 250.200 318.4300
3 2000-04-30 384.1850 NaN 384.1850
4 2000-05-31 383.3600 267.200 325.2800
5 2020-03-31 396.3755 NaN 396.3755
6 2020-04-30 NaN 350.120 350.1200
7 2020-05-31 395.0485 NaN 395.0485
8 2020-06-30 394.9400 396.321 395.6305
9 2020-07-31 395.3070 NaN 395.3070
CodePudding user response:
You can use this:
df['col3'] = df.loc[:, ["col1","col2"]].mean(axis = 1)