Home > OS >  Is there a way subtract the values of one column from another column, and get the results in a new c
Is there a way subtract the values of one column from another column, and get the results in a new c

Time:04-24

So, I was messing around with operations in Pandas, and I reached conditional operations. For reference, I have two dataframes like this:

df_1:

Time Coupons_Sold
First_Quarter-2021 1041
Second_Quarter-2021 2145
Third_Quarter-2021 1809
Fourth_Quarter-2021 1104

df_2:

Time Coupons_Sold
First_Quarter-2022 861
Second_Quarter-2022 1024
Third_Quarter-2021 902
Fourth_Quarter-2021 1011

I wanted to do a conditional subtraction on these two datasets, such that the new column contains the absolute values from subtraction of the individual elements of the two columns, if and only if the time periods match.

I want something like:

Time Coupons_Sold
Third_Quarter-2021 907
Fourth_Quarter-2021 93

because there are mappings for third and fourth quarters in both dataframes.

I tried this piece of code:

new_column = df_1['Coupons_Sold'] - df_2['Coupons_Sold']

But, this just gave me:

center
180
1121
907
93

Then I tried a few conditional statements like we do in python:

if df_1['Time'] == df_2['Time']:

df_1['Coupons_Sold'] - df_2['Coupons_Sold']

I tried the above code with in keyword, but got error.

but these conditional statements just gave me errors. Is there any way to do these kind of operations(py 2.7 or py3.7, both are okay)?

Thanks in advance.

If you need any more info, please ask and I will add the same.

CodePudding user response:

You could use merge diff for the specific columns:

cols = ['Time','Coupons_Sold']
out = df1[cols].merge(df2[cols], on='Time', suffixes=('_','')).set_index('Time').diff(axis=1).abs().dropna(axis=1).reset_index()

Output:

                  Time  Coupons_Sold
2   Third_Quarter-2021           907
3  Fourth_Quarter-2021            93

CodePudding user response:

The subtraction is done on the row index. By default its just 0, 1, 2, etc... You could make Time the index of both dataframes and then the subtraction will work. You'll get a Series with values and NaN. If you want the new column to match the shape of the original dataframes, you can just use it. Otherwise, apply .dropna() to collapse it.

>>> df_1.set_index("Time", inplace=True)
>>> df_2.set_index("Time", inplace=True) 
>>> df_1["Coupons_Sold"] - df_2["Coupons_Sold"]
Time
First_Quarter-2021       NaN
First_Quarter-2022       NaN
Fourth_Quarter-2021     93.0
Second_Quarter-2021      NaN
Second_Quarter-2022      NaN
Third_Quarter-2021     907.0
Name: Coupons_Sold, dtype: float64
>>> (df_1["Coupons_Sold"] - df_2["Coupons_Sold"]).dropna()
Time
Fourth_Quarter-2021     93.0
Third_Quarter-2021     907.0
Name: Coupons_Sold, dtype: float64

CodePudding user response:

You may try:

tset = set(df1['Time']).intersection(set(df2['Time']))
df3 = df1.loc[df1['Time'].isin(tset)].merge(df2.loc[df2['Time'].isin(tset)], on='Time')
df3['Coupons_Sold']=df3['Coupons_Sold_x']-df3['Coupons_Sold_y']
df3.drop(['Coupons_Sold_x','Coupons_Sold_y'], axis=1,inplace=True)

Output (df3):

                  Time  Coupons_Sold
0   Third_Quarter-2021           907
1   Fourth_Quarter-2021           93
  • Related