How to subtract two columns with different length only if other column has the same value-CodePudding

I have two df.

df1 has two columns where the first one is an identifier and the second one is a value. The total length of df is 1071.

df2 has multiple columns but also have one column as identifier and other as value. The total length of df2 is 500000.

I want to write a program where I create a new column in df2 where in each record I have the result of df2['value']-df['value] if df2['identifier']==df1['identifier']. I have this but i know that it does not make sense:

if df1['identifier'] == df2['identifier']:
    df2['new_value']= df2['value']-df1['value']
else:
    pass

I got the following error: ValueError: Can only compare identically-labeled Series objects.

Thanks in advance

CodePudding user response：

Would something like this work for you?

df2 = df2.merge(df1, how="outer", on="identifier")
df2["new_value"] = df2["value_y"] - df2["value_x"]

First, merge the two data sets on the identifier column, then subtract. If the identifier is not present in df1, then NaN is returned.

CodePudding user response：

Please try this:

import numpy as np
import pandas as pd
df2['new_value']= np.nan
identifiers = df2['identifier'].unique()
for identifier in identifiers:
    if identifier in df1.identifier.values:
        df2.loc[df2['identifier']==identifier,'new_value'] = df2.loc[df2['identifier']==identifier]['value'].iloc[0]-df1.loc[df1['identifier']==identifier]['value'].iloc[0]

This works if for an identifier, there exist a unique value in a dataframe