Home > Mobile >  Replacing duplicate values in distinct columns
Replacing duplicate values in distinct columns

Time:11-02

I have the following dataframe:

Year-Week    Case    Team     Case Info
2022-42      540    Finance    OTHER
2022-42      540    IT         ACCEPTED
2022-42      480    Sales      OTHER
2022-42      480    Finance    CHECK CASE
2022-42      480    IT         OTHER
2022-42      465    IT         OTHER
2022-42      465    Sales      OTHER
2022-42      310    Legal      VALIDATED
2022-42      310    Marketing  OTHER

As you can see, there are duplicate values in the "Case" section, this is because each case can belong to a different team and each team can categorize the case differently in the "Case Info" section.

My problem is that I would like my python code to take the duplicate "Case" cases and when they have a different value in "Case Info", any other value other than "OTHER" prevails, I attach an example:

Year-Week    Case    Team     Case Info
2022-42      540    Finance    ACCEPTED
2022-42      540    IT         ACCEPTED
2022-42      480    Sales      CHECK CASE
2022-42      480    Finance    CHECK CASE
2022-42      480    IT         CHECK CASE
2022-42      465    IT         OTHER
2022-42      465    Sales      OTHER
2022-42      310    Legal      VALIDATED
2022-42      310    Marketing  VALIDATED

I've been trying using commands like:

df['Case Info'] = df.groupby('Case')['Case Info'].transform('max')

But that only leads me to replace one value over the other, leveraging the fact that the initial letter in the ASCII table is applied. Having said that, that only makes half of the job, since is only doing it for just one value vs 'OTHER' that comes at last.

CodePudding user response:

Not exactly, the most efficient way but this method chaining should work buddy

df.groupby('Case')['Case Info'].apply(lambda x : x.replace('OTHER',np.nan).bfill().ffill()) # This should return your wanted series
  • Related