I have the following two arrays: Data and Baseline
Data
Phase ENSO EJO
0 1 -1 2
1 1 0 2
2 1 1 2
3 2 -1 7
4 2 1 1
Baseline
Phase ENSO EJO
0 1 -1 0.0
1 1 0 0.0
2 1 1 0.0
3 2 -1 0.0
4 2 0 0.0
5 2 1 0.0
I want to alter the 'Data' data frame such that the "missing row" gets filled in by the Baseline data. Final result would look like this
Data
Phase ENSO EJO
0 1 -1 2
1 1 0 2
2 1 1 2
3 2 -1 7
4 2 0 0
5 2 1 1
CodePudding user response:
You could reindex
Data
by Baseline
indices and fill the missing values with Baseline
values using fillna
.
Data = Data.reindex(Baseline.index).fillna(Baseline)
Another option is to use combine_first
:
Data = Data.combine_first(Baseline)
Output:
Phase ENSO EJO
0 1.0 -1.0 2.0
1 1.0 0.0 2.0
2 1.0 1.0 2.0
3 2.0 -1.0 7.0
4 2.0 1.0 1.0
5 2.0 1.0 0.0
CodePudding user response:
enke's method is great - here's a solution that preserves the original datatypes:
Data = pd.concat([Data, Baseline[~Baseline.index.isin(Data.index)]])
Output:
>>> Data
Phase ENSO EJO
0 1 -1 2.0
1 1 0 2.0
2 1 1 2.0
3 2 -1 7.0
4 2 1 1.0
5 2 1 0.0
CodePudding user response:
Here's an ultra-short solution:
>>> tmp = baseline.copy()
>>> tmp.update(data)
>>> tmp
Phase ENSO EJO
0 1.0 -1.0 2.0
1 1.0 0.0 2.0
2 1.0 1.0 2.0
3 2.0 -1.0 7.0
4 2.0 1.0 1.0
5 2.0 1.0 0.0
Or (Python 3.8 )
>>> (tmp := baseline.copy()).update(data)
>>> tmp
Phase ENSO EJO
0 1.0 -1.0 2.0
1 1.0 0.0 2.0
2 1.0 1.0 2.0
3 2.0 -1.0 7.0
4 2.0 1.0 1.0
5 2.0 1.0 0.0
Simplest:
>>> baseline.update(data)
>>> baseline
Phase ENSO EJO
0 1.0 -1.0 2.0
1 1.0 0.0 2.0
2 1.0 1.0 2.0
3 2.0 -1.0 7.0
4 2.0 1.0 1.0
5 2.0 1.0 0.0
CodePudding user response:
I made the assumption that you want to decide which rows need to be filled based on the values of Phase
and ENSO
columns. Here is my solution based on that.
import os
import pandas as pd
import numpy as np
from io import StringIO
x_str = """Phase ENSO EJO
0 1 -1 2
1 1 0 2
2 1 1 2
3 2 -1 7
4 2 1 1"""
y_str = """Phase ENSO EJO
0 1 -1 0.0
1 1 0 0.0
2 1 1 0.0
3 2 -1 0.0
4 2 0 0.0
5 2 1 0.0"""
x = pd.read_csv(StringIO(x_str), sep="\s ")
y = pd.read_csv(StringIO(y_str), sep="\s ")
# Outer merge with indicator on
z = pd.merge(x, y, how='outer', on=['Phase', 'ENSO'], indicator=True)
rows_to_be_filled = z['_merge'].isin(['right_only'])
z.loc[rows_to_be_filled, 'EJO_x'] = z.loc[rows_to_be_filled, 'EJO_y']
# Cleanup
z = z.rename({'EJO_x': 'EJO'}, axis=1).drop(['EJO_y', '_merge'], axis=1)
z
# Phase ENSO EJO
# 0 1 -1 2.0
# 1 1 0 2.0
# 2 1 1 2.0
# 3 2 -1 7.0
# 4 2 1 1.0
# 5 2 0 0.0