I am working on a dataframe,
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
new_width | new_height | new_depth | audited_Width | audited_Height | audited_Depth | inf | val |
---------- | ----------- | ---------- | -------------- | --------------- | -------------- | --- | --- |
35.00 | 2.00 | 21.00 | 21.00 | 2.50 | 35.00 | T | |
12.00 | 4.40 | 10.60 | 11.60 | 4.40 | 12.00 | T | |
20.50 | 17.00 | 5.50 | 21.50 | 17.05 | 20.50 | F | |
24.33 | 22.00 | 18.11 | 24.00 | 22.05 | 24.33 | T | |
23.00 | 23.00 | 19.00 | 19.00 | 23.00 | 23.00 | F |
here i want to find difference between rows (0 , 3) and (1,4) and (2,5) and verify if the difference value(any one or all the three) falls in the range(0,1) and if yes then it should check the corresponding cell in row 6 and if it is 'T', then print 'YES' in corresponding cell in row 7!
I did,
a=df['new_width'] - df['audited_Width']
for i in a:
if (i in range (0,1))==True:
df['Value'] = 'Yes'
print(df['Value'])
I know that 4th line is incorrect. kindly help me with this code and possibly suggest me a better alternate for this code!
CodePudding user response:
You shouldn't iterate over the rows of a DataFrame. Instead, you can create a mask to select the rows that meet your condition, and then use that to fill the "val"
column:
mask = ((df["new_width"] - df["audited_Width"]).between(0, 1) \
| (df["new_height"] - df["audited_Height"]).between(0, 1) \
| (df["new_depth"] - df["audited_Depth"]).between(0, 1)) \
& (df["inf"] == "T")
df["val"] = df["val"].where(~mask, "YES")
This outputs:
new_width new_height new_depth audited_Width audited_Height audited_Depth inf val
0 35.00 2.0 21.00 21.0 2.50 35.00 T NaN
1 12.00 4.4 10.60 11.6 4.40 12.00 T YES
2 20.50 17.0 5.50 21.5 17.05 20.50 F NaN
3 24.33 22.0 18.11 24.0 22.05 24.33 T YES
4 23.00 23.0 19.00 19.0 23.00 23.00 F NaN
CodePudding user response:
Custom for loops are basically never the best option when it comes to pandas.
This is a method that reshapes your dataframe to an arguably better shape, performs a simple check on the new shape, and then extracts indices that should be modified in the original dataframe.
df.columns = df.columns.str.lower()
df2 = pd.wide_to_long(df.reset_index(), ['new', 'audited'], ['index'], 'values', '_', '\w ')
mask = df2[df2.new.sub(df2.audited).between(0,1) & df2.inf.eq('T')]
idx = mask.reset_index('index')['index'].unique()
df.loc[idx, 'val'] = 'YES'
print(df)
Output:
new_width new_height new_depth audited_width audited_height audited_depth inf val
0 35.00 2.0 21.00 21.0 2.50 35.00 T NaN
1 12.00 4.4 10.60 11.6 4.40 12.00 T YES
2 20.50 17.0 5.50 21.5 17.05 20.50 F NaN
3 24.33 22.0 18.11 24.0 22.05 24.33 T YES
4 23.00 23.0 19.00 19.0 23.00 23.00 F NaN