Home > database >  How to replace <NA> values based on another column main value in pandas
How to replace <NA> values based on another column main value in pandas

Time:05-26

Problem with pandas

I have a following dataframe:

      product code trends_vs  trends_100_0
1   aaaaaaa   it       150           100
6   aaaaaaa   pl      <NA>            90
11  aaaaaaa   ro      <NA>            82
15  aaaaaaa   sk      <NA>            13
21   bbbbb   ro       260           100
25   bbbbb   cz      <NA>            62
29   bbbbb   sk      <NA>            42

I want to calculate values from trends_vs column the following way:

each product in a ‘product’ column have ONLY one trends_vs value which corresponds with trends_100_0 - 100 value.

And now to get trends_vs value for second row we have to do 150 * (90/100)

The output and logic should look like this:

      product code trends_vs  trends_100_0
1   aaaaaaa   it       150           100
6   aaaaaaa   pl      135            90      ---> 150 * (90/100)
11  aaaaaaa   ro      123            82    ---> 150 * (82/100)
15  aaaaaaa   sk      19.5            13  ---> 150 * (13/100)
21   bbbbb   ro       260           100
25   bbbbb   cz      161.2            62  ---> 260 * (62/100) # 260 instead of 150 because its a different product 'bbbbb'
29   bbbbb   sk      109.2            42 ---> 260 * (42/100)

I tried with groupby or map but something not work as expected. I have a solution when each column should corresponds with each other in a row but in this scenario where we should calculate based on one main value for each product?

CodePudding user response:

Another method without groupby:

df['trends_vs'] = (df.sort_values(['product', 'trends_100_0'])['trends_vs'].bfill()
                       * df['trends_100_0'] / 100)
print(df)

# Output
    product code  trends_vs  trends_100_0
1   aaaaaaa   it      150.0           100
6   aaaaaaa   pl      135.0            90
11  aaaaaaa   ro      123.0            82
15  aaaaaaa   sk       19.5            13
21    bbbbb   ro      260.0           100
25    bbbbb   cz      161.2            62
29    bbbbb   sk      109.2            42

CodePudding user response:

use .groupby.transform and then a simple .mul with .div

we can use .fillna to target your rows.

df["trends_vs"] = df["trends_vs"].fillna(
df.groupby("product")["trends_vs"]
.transform("min")
.mul((df["trends_100_0"].div(100)))
)


    product code  trends_vs  trends_100_0
1   aaaaaaa   it      150.0           100
6   aaaaaaa   pl      135.0            90
11  aaaaaaa   ro      123.0            82
15  aaaaaaa   sk       19.5            13
21    bbbbb   ro      260.0           100
25    bbbbb   cz      161.2            62
29    bbbbb   sk      109.2            4

CodePudding user response:

This will allow you to find the trends_vs based on product and the trends_100_0 based on the results

df['trends_vs'] = df.groupby('product')['trends_vs'].transform(max)
df['trends_vs'] = df['trends_vs'] * (df['trends_100_0']/100)
df
  • Related