Substract Two Dataframes by Index and Keep String Columns-CodePudding

I would like to subtract two data frames by indexes:


# importing pandas as pd
import pandas as pd

# Creating the second dataframe
df1 = pd.DataFrame({"Type":['T1', 'T2', 'T3', 'T4', 'T5'],
                    "A":[10, 11, 7, 8, 5],
                    "B":[21, 5, 32, 4, 6],
                    "C":[11, 21, 23, 7, 9],
                    "D":[1, 5, 3, 8, 6]},
                    index =["2001", "2002", "2003", "2004", "2005"])
df1
 
# Creating the first dataframe
df2 = pd.DataFrame({"A":[1, 2, 2, 2],
                    "B":[3, 2, 4, 3],
                    "C":[2, 2, 7, 3],
                    "D":[1, 3, 2, 1]},
                    index =["2000", "2002", "2003", "2004"])
df2

# Desired
df = pd.DataFrame({"Type":['T1', 'T2', 'T3', 'T4', 'T5'],
                    "A":[10, 9, 5, 6, 5],
                    "B":[21, 3, 28, 1, 6],
                    "C":[11, 19, 16, 4, 9],
                    "D":[1, 2, 1, 7, 5]},
                    index =["2001", "2002", "2003", "2004", "2005"])
df

df1.subtract(df2)

However, it returns in some cases NAs, I would like to keep values from the first df1 if not deductable.

CodePudding user response：

You could handle NaN using:

df1.subtract(df2).combine_first(df1).dropna(how='all')

output:

         A     B     C    D Type
2001  10.0  21.0  11.0  1.0   T1
2002   9.0   3.0  19.0  2.0   T2
2003   5.0  28.0  16.0  1.0   T3
2004   6.0   1.0   4.0  7.0   T4
2005   5.0   6.0   9.0  6.0   T5

CodePudding user response：

You can use select_dtypes to choose the correct data type, then subtract the reindex data:

(df1.select_dtypes(include='number')
    .sub(df2.reindex(df1.index, fill_value=0))
    .join(df1.select_dtypes(exclude='number'))
)

Output:

       A   B   C  D Type
2001  10  21  11  1   T1
2002   9   3  19  2   T2
2003   5  28  16  1   T3
2004   6   1   4  7   T4
2005   5   6   9  6   T5