Home > OS >  How to merge 2 columns in pandas dataframe by taking either value or mean and create a third column?
How to merge 2 columns in pandas dataframe by taking either value or mean and create a third column?

Time:11-17

I have a dataframe with 2 columns. How can I create a third column which:

  1. Takes either col1 or col2 value if either exists
  2. Takes mean if both exists
  3. Keeps NaN if neither exists

And finally I want to store it in df['col3']. I tried this, but the values are wrong.

df['col3']=pd.concat([df['col2'], df['col1']]).groupby(level=0).mean()

How can I do this?

time        col1        col2    
2000-01-31  389.5400    NaN
2000-02-29  387.7700    NaN
2000-03-31  386.6600    250.2
2000-04-30  384.1850    NaN
2000-05-31  383.3600    267.2
...     ...     ...
2020-03-31  396.3755    NaN
2020-04-30  NaN         350.12
2020-05-31  395.0485    NaN
2020-06-30  394.9400    396.321
2020-07-31  395.3070    NaN

CodePudding user response:

The answer is surprisingly simple:

df['col3'] = df[['col1', 'col2']].mean(axis=1)

This is due to the fact that mean ignores the NaN by default (skipna=True), so if you have only one value, the mean is the value itself, if only NaNs, the output is a NaN

Output:

         time      col1     col2      col3
0  2000-01-31  389.5400      NaN  389.5400
1  2000-02-29  387.7700      NaN  387.7700
2  2000-03-31  386.6600  250.200  318.4300
3  2000-04-30  384.1850      NaN  384.1850
4  2000-05-31  383.3600  267.200  325.2800
5  2020-03-31  396.3755      NaN  396.3755
6  2020-04-30       NaN  350.120  350.1200
7  2020-05-31  395.0485      NaN  395.0485
8  2020-06-30  394.9400  396.321  395.6305
9  2020-07-31  395.3070      NaN  395.3070

CodePudding user response:

You can use this:

df['col3'] = df.loc[:, ["col1","col2"]].mean(axis = 1)
  • Related