I have two df.
df1 has two columns where the first one is an identifier and the second one is a value. The total length of df is 1071.
df2 has multiple columns but also have one column as identifier and other as value. The total length of df2 is 500000.
I want to write a program where I create a new column in df2 where in each record I have the result of df2['value']-df['value] if df2['identifier']==df1['identifier']. I have this but i know that it does not make sense:
if df1['identifier'] == df2['identifier']:
df2['new_value']= df2['value']-df1['value']
else:
pass
I got the following error: ValueError: Can only compare identically-labeled Series objects.
Thanks in advance
CodePudding user response:
Would something like this work for you?
df2 = df2.merge(df1, how="outer", on="identifier")
df2["new_value"] = df2["value_y"] - df2["value_x"]
First, merge the two data sets on the identifier
column, then subtract. If the identifier is not present in df1
, then NaN
is returned.
CodePudding user response:
Please try this:
import numpy as np
import pandas as pd
df2['new_value']= np.nan
identifiers = df2['identifier'].unique()
for identifier in identifiers:
if identifier in df1.identifier.values:
df2.loc[df2['identifier']==identifier,'new_value'] = df2.loc[df2['identifier']==identifier]['value'].iloc[0]-df1.loc[df1['identifier']==identifier]['value'].iloc[0]
This works if for an identifier, there exist a unique value in a dataframe