Home > Back-end >  find difference between multiple columns in a dataframe
find difference between multiple columns in a dataframe

Time:06-26

I am working on a dataframe,

0 1 2 3 4 5 6 7
new_width new_height new_depth audited_Width audited_Height audited_Depth inf val
---------- ----------- ---------- -------------- --------------- -------------- --- ---
35.00 2.00 21.00 21.00 2.50 35.00 T
12.00 4.40 10.60 11.60 4.40 12.00 T
20.50 17.00 5.50 21.50 17.05 20.50 F
24.33 22.00 18.11 24.00 22.05 24.33 T
23.00 23.00 19.00 19.00 23.00 23.00 F

here i want to find difference between rows (0 , 3) and (1,4) and (2,5) and verify if the difference value(any one or all the three) falls in the range(0,1) and if yes then it should check the corresponding cell in row 6 and if it is 'T', then print 'YES' in corresponding cell in row 7!

I did,

a=df['new_width'] - df['audited_Width']
for i in a:
    if (i in range (0,1))==True:
        df['Value'] = 'Yes'        
print(df['Value'])

I know that 4th line is incorrect. kindly help me with this code and possibly suggest me a better alternate for this code!

CodePudding user response:

You shouldn't iterate over the rows of a DataFrame. Instead, you can create a mask to select the rows that meet your condition, and then use that to fill the "val" column:

mask = ((df["new_width"] - df["audited_Width"]).between(0, 1) \
    | (df["new_height"] - df["audited_Height"]).between(0, 1) \
    | (df["new_depth"] - df["audited_Depth"]).between(0, 1)) \
    & (df["inf"] == "T")
df["val"] = df["val"].where(~mask, "YES")

This outputs:

   new_width  new_height  new_depth  audited_Width  audited_Height  audited_Depth inf  val
0      35.00         2.0      21.00           21.0            2.50          35.00   T  NaN
1      12.00         4.4      10.60           11.6            4.40          12.00   T  YES
2      20.50        17.0       5.50           21.5           17.05          20.50   F  NaN
3      24.33        22.0      18.11           24.0           22.05          24.33   T  YES
4      23.00        23.0      19.00           19.0           23.00          23.00   F  NaN

CodePudding user response:

Custom for loops are basically never the best option when it comes to pandas.

This is a method that reshapes your dataframe to an arguably better shape, performs a simple check on the new shape, and then extracts indices that should be modified in the original dataframe.

df.columns = df.columns.str.lower()
df2 = pd.wide_to_long(df.reset_index(), ['new', 'audited'], ['index'], 'values', '_', '\w ')
mask = df2[df2.new.sub(df2.audited).between(0,1) & df2.inf.eq('T')]
idx = mask.reset_index('index')['index'].unique()
df.loc[idx, 'val'] = 'YES'
print(df)

Output:

   new_width  new_height  new_depth  audited_width  audited_height  audited_depth inf  val
0      35.00         2.0      21.00           21.0            2.50          35.00   T  NaN
1      12.00         4.4      10.60           11.6            4.40          12.00   T  YES
2      20.50        17.0       5.50           21.5           17.05          20.50   F  NaN
3      24.33        22.0      18.11           24.0           22.05          24.33   T  YES
4      23.00        23.0      19.00           19.0           23.00          23.00   F  NaN
  • Related