I have a pandas dataframe that has incorrect values at certain rows and columns. An example dataframe would be something like this:
dict1 = {'Total Points': [10, 20, 30, 40, 50],
'A pts': [8, 'X', 20, 30, 40],
'A pts percentage': [0.80, 0.75, 0.67, 0.75, 0.80],
'B pts': [2, 5, 'X', 'X', 10],
'B pts percentage': [0.20, 0.25, 0.33, 0.25, 0.20]
}
df = pd.DataFrame(dict1)
cols = ["A pts", "B pts"]
pct_cols = [2, 4]
bad_rows = df.index[df[cols].isin(['X']).any(axis=1)]
I can calculate which rows have the value I want to remove. What I want to do is calculate the actual pts scored by taking the next column (percentage) and multiplying it by the total points scored for that row. Ideally, if I could only perform that to the columns that have the bad value, that'd be ideal. I know I can use
df.loc[bad_rows, cols] = value
to set all values at those rows and columns to a particular value, but don't know how to apply different values. I tried doing
df.loc[bad_rows, cols] = df.iloc[bad_rows, pct_cols] * df.loc[bad_rows, "Total Points"]
but that doesn't work. Any input would be greatly appreciated.
Edit: I know I could simply apply a function to the entire dataframe for certain columns, but I'm trying to avoid doing that if possible, particularly because the dataset I'm working with is quite large and that would be incredibly inefficient. I know I could simply use for loops to iterate through everything and set each value one at a time, but that also seems inefficient.
CodePudding user response:
Use mul
on index axis:
df.loc[bad_rows, cols] = df.iloc[bad_rows, pct_cols].mul(df.loc[bad_rows, 'Total Points'], axis=0).values
print(df)
# Output
Total Points A pts A pts percentage B pts B pts percentage
0 10 8 0.80 2 0.20
1 20 15.0 0.75 5.0 0.25
2 30 20.1 0.67 9.9 0.33
3 40 30.0 0.75 10.0 0.25
4 50 40 0.80 10 0.20