Home > Back-end >  How to randomly chose rows of a pandas dataframe to update
How to randomly chose rows of a pandas dataframe to update

Time:01-14

I am a beginner in python and I have a pandas dataframe that I want to change as below:

10% of rows of column "review" must be changed by adding a prefix 90% of rows of column "review" must be unchanged

for changing all rows of "review" I can use the code : X_test["modified_review"] = " abc " X_test["review"]

and to select 10% of rows I can use : X_test.sample(frac=0.1)

But I don't know how to combine the two codes to modify only the selected lines.

Please help!

CodePudding user response:

You can sample 10% random indexes and update the corresponding locations only:

df["modified_review"] = df["review"]

rand_ids = df.index.to_series().sample(frac=0.1)
df.loc[rand_ids, "modified_review"] = " abc "   df.loc[rand_ids, "modified_review"]
  • Related