I fairly new to programming in python and using the pandas library and I am having problems with comparing 2 dataframes with different quantities i want to see if the quantity in data_1 is less than the quantity in data_2 for each item
import pandas as pd
data_1 = [['banana',10],['orange',2],['strawberry',3]]
data_2 = [['banana',1],['orange',2],['strawberry',5],['melon',8]]
df_1 = pd.Dataframe(data_1,columns = ['item','quantity'])
df_2 = pd.Dataframe(data_2,columns = ['item','quantity'])
i was trying to use pd.merge() the 2 dataframes to compare but it's not quite what i was looking for..... i needed a 3° dataframe with the differences only
CodePudding user response:
For such comparison item should be index and then you might concat quantity columns and compute desired column following way
import pandas as pd
data_1 = [['banana',10],['orange',2],['strawberry',3]]
data_2 = [['banana',1],['orange',2],['strawberry',5],['melon',8]]
df_1 = pd.DataFrame(data_1,columns = ['item','quantity'])
df_2 = pd.DataFrame(data_2,columns = ['item','quantity'])
df = pd.concat({"data1":df_1.set_index("item").quantity,"data2":df_2.set_index("item").quantity},axis=1,sort=False)
df['isless'] = df['data1'] < df['data2']
print(df)
output
data1 data2 isless
banana 10.0 1 False
orange 2.0 2 False
strawberry 3.0 5 True
melon NaN 8 False