I'd like to replace multiple values of a column but only for selected rows (that is a subset of the data) defined by a condition on a different column in python. For example for the table below I'd like to replace 22000 with twentytwo and 23000 with twentythree but only if Courses is equal to Spark as shown below.
Here is code for the initial dataframe.
df = pd.DataFrame({
'Courses': ["Spark", "PySpark", "Spark", "Spark", "PySpark", "PySpark"],
'Fee': [22000, 25000, 23000, 24000, 22000, 23000],
})
Courses | Fee | |
---|---|---|
0 | Spark | 22000 |
1 | PySpark | 25000 |
2 | Spark | 23000 |
3 | Spark | 24000 |
4 | PySpark | 22000 |
5 | PySpark | 23000 |
Below is the desired outcome.
Courses | Fee | |
---|---|---|
0 | Spark | twentytwo |
1 | PySpark | 25000 |
2 | Spark | twentythree |
3 | Spark | 24000 |
4 | PySpark | 22000 |
5 | PySpark | 23000 |
I have tried using df["Fee"].replace({22000:'twentytwo', 23000:'twentythree'}, inplace=True)
but this results in replacement of all the specified values in the column and not just where Courses is equal to Spark.
CodePudding user response:
First, we need a more stringent selector in the form of df[x, y]
where x
selects rows (where Courses equals Spark) and y
selects columns ("Fee"):
df[df["Courses"]=="Spark", "Fee"]
We can then try using replace()
, but pandas will tell us that we are operating on a "slice" and not the real data frame. From this post, df.update()
may be useful.
Solution:
replacements = {22000:'twentytwo', 23000:'twentythree'}
df.update(df.loc[df["Courses"]=="Spark", "Fee"].replace(replacements))
Finally, since none of the values to be replaced are in "Courses", we can safely remove the column selector "Fee".
...df.loc[df["Courses"]=="Spark"]...