So, I was messing around with operations in Pandas, and I reached conditional operations. For reference, I have two dataframes like this:
df_1:
Time | Coupons_Sold |
---|---|
First_Quarter-2021 | 1041 |
Second_Quarter-2021 | 2145 |
Third_Quarter-2021 | 1809 |
Fourth_Quarter-2021 | 1104 |
df_2:
Time | Coupons_Sold |
---|---|
First_Quarter-2022 | 861 |
Second_Quarter-2022 | 1024 |
Third_Quarter-2021 | 902 |
Fourth_Quarter-2021 | 1011 |
I wanted to do a conditional subtraction on these two datasets, such that the new column contains the absolute values from subtraction of the individual elements of the two columns, if and only if the time periods match.
I want something like:
Time | Coupons_Sold |
---|---|
Third_Quarter-2021 | 907 |
Fourth_Quarter-2021 | 93 |
because there are mappings for third and fourth quarters in both dataframes.
I tried this piece of code:
new_column = df_1['Coupons_Sold'] - df_2['Coupons_Sold']
But, this just gave me:
center |
---|
180 |
1121 |
907 |
93 |
Then I tried a few conditional statements like we do in python:
if df_1['Time'] == df_2['Time']:
df_1['Coupons_Sold'] - df_2['Coupons_Sold']
I tried the above code with in
keyword, but got error.
but these conditional statements just gave me errors. Is there any way to do these kind of operations(py 2.7 or py3.7, both are okay)?
Thanks in advance.
If you need any more info, please ask and I will add the same.
CodePudding user response:
You could use merge
diff
for the specific columns:
cols = ['Time','Coupons_Sold']
out = df1[cols].merge(df2[cols], on='Time', suffixes=('_','')).set_index('Time').diff(axis=1).abs().dropna(axis=1).reset_index()
Output:
Time Coupons_Sold
2 Third_Quarter-2021 907
3 Fourth_Quarter-2021 93
CodePudding user response:
The subtraction is done on the row index. By default its just 0, 1, 2, etc... You could make Time
the index of both dataframes and then the subtraction will work. You'll get a Series
with values and NaN
. If you want the new column to match the shape of the original dataframes, you can just use it. Otherwise, apply .dropna()
to collapse it.
>>> df_1.set_index("Time", inplace=True)
>>> df_2.set_index("Time", inplace=True)
>>> df_1["Coupons_Sold"] - df_2["Coupons_Sold"]
Time
First_Quarter-2021 NaN
First_Quarter-2022 NaN
Fourth_Quarter-2021 93.0
Second_Quarter-2021 NaN
Second_Quarter-2022 NaN
Third_Quarter-2021 907.0
Name: Coupons_Sold, dtype: float64
>>> (df_1["Coupons_Sold"] - df_2["Coupons_Sold"]).dropna()
Time
Fourth_Quarter-2021 93.0
Third_Quarter-2021 907.0
Name: Coupons_Sold, dtype: float64
CodePudding user response:
You may try:
tset = set(df1['Time']).intersection(set(df2['Time']))
df3 = df1.loc[df1['Time'].isin(tset)].merge(df2.loc[df2['Time'].isin(tset)], on='Time')
df3['Coupons_Sold']=df3['Coupons_Sold_x']-df3['Coupons_Sold_y']
df3.drop(['Coupons_Sold_x','Coupons_Sold_y'], axis=1,inplace=True)
Output (df3):
Time Coupons_Sold
0 Third_Quarter-2021 907
1 Fourth_Quarter-2021 93