Home > other >  How to subtract two columns with different length only if other column has the same value
How to subtract two columns with different length only if other column has the same value

Time:08-18

I have two df.

df1 has two columns where the first one is an identifier and the second one is a value. The total length of df is 1071.

df2 has multiple columns but also have one column as identifier and other as value. The total length of df2 is 500000.

I want to write a program where I create a new column in df2 where in each record I have the result of df2['value']-df['value] if df2['identifier']==df1['identifier']. I have this but i know that it does not make sense:

if df1['identifier'] == df2['identifier']:
    df2['new_value']= df2['value']-df1['value']
else:
    pass

I got the following error: ValueError: Can only compare identically-labeled Series objects.

Thanks in advance

CodePudding user response:

Would something like this work for you?

df2 = df2.merge(df1, how="outer", on="identifier")
df2["new_value"] = df2["value_y"] - df2["value_x"]

First, merge the two data sets on the identifier column, then subtract. If the identifier is not present in df1, then NaN is returned.

CodePudding user response:

Please try this:

import numpy as np
import pandas as pd
df2['new_value']= np.nan
identifiers = df2['identifier'].unique()
for identifier in identifiers:
    if identifier in df1.identifier.values:
        df2.loc[df2['identifier']==identifier,'new_value'] = df2.loc[df2['identifier']==identifier]['value'].iloc[0]-df1.loc[df1['identifier']==identifier]['value'].iloc[0]

This works if for an identifier, there exist a unique value in a dataframe

  • Related