Home > database >  Replace multiple values in a column for a subset of the dataframe in python
Replace multiple values in a column for a subset of the dataframe in python

Time:10-18

I'd like to replace multiple values of a column but only for selected rows (that is a subset of the data) defined by a condition on a different column in python. For example for the table below I'd like to replace 22000 with twentytwo and 23000 with twentythree but only if Courses is equal to Spark as shown below.

Here is code for the initial dataframe.

df = pd.DataFrame({
   'Courses': ["Spark", "PySpark", "Spark", "Spark", "PySpark", "PySpark"],
   'Fee': [22000, 25000, 23000, 24000, 22000, 23000],
})

Courses Fee
0 Spark 22000
1 PySpark 25000
2 Spark 23000
3 Spark 24000
4 PySpark 22000
5 PySpark 23000

Below is the desired outcome.

Courses Fee
0 Spark twentytwo
1 PySpark 25000
2 Spark twentythree
3 Spark 24000
4 PySpark 22000
5 PySpark 23000

I have tried using df["Fee"].replace({22000:'twentytwo', 23000:'twentythree'}, inplace=True) but this results in replacement of all the specified values in the column and not just where Courses is equal to Spark.

CodePudding user response:

First, we need a more stringent selector in the form of df[x, y] where x selects rows (where Courses equals Spark) and y selects columns ("Fee"):

df[df["Courses"]=="Spark", "Fee"]

We can then try using replace(), but pandas will tell us that we are operating on a "slice" and not the real data frame. From this post, df.update() may be useful.

Solution:

replacements = {22000:'twentytwo', 23000:'twentythree'}
df.update(df.loc[df["Courses"]=="Spark", "Fee"].replace(replacements))

Finally, since none of the values to be replaced are in "Courses", we can safely remove the column selector "Fee".

...df.loc[df["Courses"]=="Spark"]...
  • Related