Home > Back-end >  How to set the starting scores of value related scores to 100 in df
How to set the starting scores of value related scores to 100 in df

Time:10-06

This is such a simple question but I can't find it using the terms 'normalize' or the overused term 'index'. How do I set the start value of all my 'cars' to 100? In reality there are 52 cars and much more timestamps.

import pandas as pd

ExampleOfWhatIHave = {'Car':['A', 'B', 'A', 'B'],
                      'Hour':['1', '1', '2', '2'],
                      'Car-speed':[10, 20, 11, 19]}

ExampleOfWhatINeed = {'Car':['A', 'B', 'A', 'B'],
                      'Hour':['1', '1', '2', '2'],
                      'Car51-speed':[100, 100, 110, 95]}
# Create DataFrame
df = pd.DataFrame(ExampleOfWhatIHave)
df = pd.DataFrame(ExampleOfWhatINeed)

CodePudding user response:

How about this nice one liner?

df['Car51-speed'] = 100 * df['Car-speed'] / df.groupby('Car')['Car-speed'].transform('first')

CodePudding user response:

This is an explanation of Quixotic22's answer as it was easier than writing it in a comment

Starting with the Denominator

df.groupby('Car') group the dataframe by the values on the car column, the .transform('first') then selects the first value for each group so df_1.groupby('Car').transform('first')

gives:

  Hour  Car-speed  Car51-speed
0    1         10        100.0
1    1         20        100.0
2    1         10        100.0
3    1         20        100.0

the addition of ['Car-speed'] selects just that column as a pandas series

df_1.groupby('Car')['Car-speed'].transform('first') thus gives:

0    10
1    20
2    10
3    20
Name: Car-speed, dtype: int64

On the Numerator we have 100 * df_1['Car-speed']

this is simply the 'Car-speed' column as a series multiplied by the value 100 giving:

0    1000
1    2000
2    1100
3    1900
Name: Car-speed, dtype: int64

We divide these Series giving a Series where each value is the result of the division of the values at that index

0    100.0
1    100.0
2    110.0
3     95.0
Name: Car-speed, dtype: float64

this resulting series is set to the column 'Car51-speed' of the data frame creating a new column

Output:

  Car Hour  Car-speed  Car51-speed
0   A    1         10        100.0
1   B    1         20        100.0
2   A    2         11        110.0
3   B    2         19         95.0
  • Related