Home > OS >  subtrack two rows in pandas df containing both str and numerical values
subtrack two rows in pandas df containing both str and numerical values

Time:03-19

This seems like a very simple case but have been struggling with this. My df looks like this:

        col1   col2  col3   col4   col5   col6
row1    str    str    1.5    2.5    3.5   4.5
row2    str1   str    2.5    5.5    4.5   5.5
row3    str2   str    3.5    4.5    7.5   6.5
row4    str    str    4.5    5.5    6.5   7.5

What is the best way to create row5 subtracting values from 2 different rows i.e. row3 and row2?

Expected outcome:

        col1    col2     col3   col4   col5   col6
row1    str      str      1.5    2.5    3.5   4.5
row2    str1     str      2.5    5.5    4.5   5.5
row3    str2     str      3.5    4.5    7.5   6.5
row4    str      str      4.5    5.5    6.5   7.5
row5   str/nan   str/nan  1.0   -1.0    3.0   1.0

This is my latest attempt:

df.loc["row5"] = df[(df["col1"] == "str2")].subtract(df[(df["col1" == "str1")], axis = 0)

But I am getting the following error:

cannot set a row with mismatched columns

Thanks!

CodePudding user response:

Use DataFrame.select_dtypes for numeric column and then subtract rows slected by indices:

df1 = df.select_dtypes(np.number)
df.loc["row5"] = df1.loc["row3"].sub(df1.loc["row2"])
print (df)
     col1 col2  col3  col4  col5  col6
row1  str  str   1.5   2.5   3.5   4.5
row2  str  str   2.5   5.5   4.5   5.5
row3  str  str   3.5   4.5   7.5   6.5
row4  str  str   4.5   5.5   6.5   7.5
row5  NaN  NaN   1.0  -1.0   3.0   1.0

EDIT: First select numeric columns to df1:

df1 = df.select_dtypes(np.number)

then filter by mask and select first row for Series:

s1 = df1[df["col1"] == "str2"].iloc[0]
print (s1)
col3    3.5
col4    4.5
col5    7.5
col6    6.5
Name: row3, dtype: float64

s2 = df1[df["col1"] == "str"].iloc[0]
print (s2)
col3    1.5
col4    2.5
col5    3.5
col6    4.5
Name: row1, dtype: float64

So possible subtract Series and assign to new row:

df.loc["row5"] = s1.sub(s2)
print (df)
      col1 col2  col3  col4  col5  col6
row1   str  str   1.5   2.5   3.5   4.5
row2  str1  str   2.5   5.5   4.5   5.5
row3  str2  str   3.5   4.5   7.5   6.5
row4   str  str   4.5   5.5   6.5   7.5
row5   NaN  NaN   2.0   2.0   4.0   2.0
  • Related