Is there a way subtract the values of one column from another column, and get the results in a new c-CodePudding

So, I was messing around with operations in Pandas, and I reached conditional operations. For reference, I have two dataframes like this:

df_1:

Time	Coupons_Sold
First_Quarter-2021	1041
Second_Quarter-2021	2145
Third_Quarter-2021	1809
Fourth_Quarter-2021	1104

df_2:

Time	Coupons_Sold
First_Quarter-2022	861
Second_Quarter-2022	1024
Third_Quarter-2021	902
Fourth_Quarter-2021	1011

I wanted to do a conditional subtraction on these two datasets, such that the new column contains the absolute values from subtraction of the individual elements of the two columns, if and only if the time periods match.

I want something like:

Time	Coupons_Sold
Third_Quarter-2021	907
Fourth_Quarter-2021	93

because there are mappings for third and fourth quarters in both dataframes.

I tried this piece of code:

new_column = df_1['Coupons_Sold'] - df_2['Coupons_Sold']

But, this just gave me:

center
180
1121
907
93

Then I tried a few conditional statements like we do in python:

if df_1['Time'] == df_2['Time']:

df_1['Coupons_Sold'] - df_2['Coupons_Sold']

I tried the above code with in keyword, but got error.

but these conditional statements just gave me errors. Is there any way to do these kind of operations(py 2.7 or py3.7, both are okay)?

Thanks in advance.

If you need any more info, please ask and I will add the same.

CodePudding user response：

You could use merge diff for the specific columns:

cols = ['Time','Coupons_Sold']
out = df1[cols].merge(df2[cols], on='Time', suffixes=('_','')).set_index('Time').diff(axis=1).abs().dropna(axis=1).reset_index()

Output:

                  Time  Coupons_Sold
2   Third_Quarter-2021           907
3  Fourth_Quarter-2021            93

CodePudding user response：

The subtraction is done on the row index. By default its just 0, 1, 2, etc... You could make Time the index of both dataframes and then the subtraction will work. You'll get a Series with values and NaN. If you want the new column to match the shape of the original dataframes, you can just use it. Otherwise, apply .dropna() to collapse it.

>>> df_1.set_index("Time", inplace=True)
>>> df_2.set_index("Time", inplace=True) 
>>> df_1["Coupons_Sold"] - df_2["Coupons_Sold"]
Time
First_Quarter-2021       NaN
First_Quarter-2022       NaN
Fourth_Quarter-2021     93.0
Second_Quarter-2021      NaN
Second_Quarter-2022      NaN
Third_Quarter-2021     907.0
Name: Coupons_Sold, dtype: float64
>>> (df_1["Coupons_Sold"] - df_2["Coupons_Sold"]).dropna()
Time
Fourth_Quarter-2021     93.0
Third_Quarter-2021     907.0
Name: Coupons_Sold, dtype: float64

CodePudding user response：

You may try:

tset = set(df1['Time']).intersection(set(df2['Time']))
df3 = df1.loc[df1['Time'].isin(tset)].merge(df2.loc[df2['Time'].isin(tset)], on='Time')
df3['Coupons_Sold']=df3['Coupons_Sold_x']-df3['Coupons_Sold_y']
df3.drop(['Coupons_Sold_x','Coupons_Sold_y'], axis=1,inplace=True)

Output (df3):

                  Time  Coupons_Sold
0   Third_Quarter-2021           907
1   Fourth_Quarter-2021           93