This seems like a very simple case but have been struggling with this. My df looks like this:
col1 col2 col3 col4 col5 col6
row1 str str 1.5 2.5 3.5 4.5
row2 str1 str 2.5 5.5 4.5 5.5
row3 str2 str 3.5 4.5 7.5 6.5
row4 str str 4.5 5.5 6.5 7.5
What is the best way to create row5 subtracting values from 2 different rows i.e. row3 and row2?
Expected outcome:
col1 col2 col3 col4 col5 col6
row1 str str 1.5 2.5 3.5 4.5
row2 str1 str 2.5 5.5 4.5 5.5
row3 str2 str 3.5 4.5 7.5 6.5
row4 str str 4.5 5.5 6.5 7.5
row5 str/nan str/nan 1.0 -1.0 3.0 1.0
This is my latest attempt:
df.loc["row5"] = df[(df["col1"] == "str2")].subtract(df[(df["col1" == "str1")], axis = 0)
But I am getting the following error:
cannot set a row with mismatched columns
Thanks!
CodePudding user response:
Use DataFrame.select_dtypes
for numeric column and then subtract rows slected by indices:
df1 = df.select_dtypes(np.number)
df.loc["row5"] = df1.loc["row3"].sub(df1.loc["row2"])
print (df)
col1 col2 col3 col4 col5 col6
row1 str str 1.5 2.5 3.5 4.5
row2 str str 2.5 5.5 4.5 5.5
row3 str str 3.5 4.5 7.5 6.5
row4 str str 4.5 5.5 6.5 7.5
row5 NaN NaN 1.0 -1.0 3.0 1.0
EDIT: First select numeric columns to df1
:
df1 = df.select_dtypes(np.number)
then filter by mask and select first row for Series:
s1 = df1[df["col1"] == "str2"].iloc[0]
print (s1)
col3 3.5
col4 4.5
col5 7.5
col6 6.5
Name: row3, dtype: float64
s2 = df1[df["col1"] == "str"].iloc[0]
print (s2)
col3 1.5
col4 2.5
col5 3.5
col6 4.5
Name: row1, dtype: float64
So possible subtract Series and assign to new row:
df.loc["row5"] = s1.sub(s2)
print (df)
col1 col2 col3 col4 col5 col6
row1 str str 1.5 2.5 3.5 4.5
row2 str1 str 2.5 5.5 4.5 5.5
row3 str2 str 3.5 4.5 7.5 6.5
row4 str str 4.5 5.5 6.5 7.5
row5 NaN NaN 2.0 2.0 4.0 2.0