Home > Software design >  Match, update and replace values from one dataset to another in Pandas
Match, update and replace values from one dataset to another in Pandas

Time:11-05

I have two datasets, one has old data and one has updated data. I'd like to create a new dataset by updating values based on if the area, date and column values match.

Data

df1

area    date        aa  bb  cc  
japan   10/1/2027   1   0   0   
us      1/1/2022    5   5   5   
fiji    11/2/2026   1   1   1   
                
                

df2

area        date        aa  bb  cc  stat
japan       10/1/2027   0   5   5   yes
fiji        11/2/2026   0   0   10  no
                

I have two datasets. I wish to replace the values in [aa], [bb], and [cc] columns of df2 with the updated values from df1 if we have the same date and area values. The aa, bb, and cc column are replaced with the updated values.

Desired

area        date        aa  bb  cc  stat
japan       10/1/2027   1   0   0   yes
fiji        11/2/2026   1   1   1   no

Doing

df['date'] = df.date.apply(lambda x: np.nan if x == ' ' else x)

I am not exactly sure how to set this up, however, I have an idea. Any suggestion is appreciated

CodePudding user response:

You can merge and combine_first:

cols = ['area', 'date']
out = (df2[cols].merge(df1, on=cols, how='left')
       .combine_first(df2)[df2.columns]
      )

output:

    area       date  aa  bb  cc stat
0  japan  10/1/2027   1   0   0  yes
1   fiji  11/2/2026   1   1   1   no

CodePudding user response:

Using .merge and making sure date columns in both dfs are set to datetime.

df1["date"] = pd.to_datetime(df1["date"])
df2["date"] = pd.to_datetime(df2["date"])

df3 = pd.merge(left=df1, right=df2, on=["area", "date"], how="right").filter(regex=r".*(?<!_y)$")
df3.columns = df3.columns.str.split("_").str[0]
print(df3)

    area       date  aa  bb  cc stat
0  japan 2027-10-01   1   0   0  yes
1   fiji 2026-11-02   1   1   1   no

CodePudding user response:

I think this can possibly be simplified to:

output = df1[df1['area'].isin(df2['area']) & df1['date'].isin(df2['date'])]

OUTPUT:

area        date        aa  bb  cc  stat
japan       10/1/2027   1   0   0   yes
fiji        11/2/2026   1   1   1   no


Even when df1 looks like this:

DF1:

    area       date  aa  bb  cc
0  japan  10/1/2027   1   0   0
1     us   1/1/2022   5   5   5
2   fiji  11/2/2026   1   1   1
3   fiji  12/5/2025   9   9   9
  • Related