This is such a simple question but I can't find it using the terms 'normalize' or the overused term 'index'. How do I set the start value of all my 'cars' to 100? In reality there are 52 cars and much more timestamps.
import pandas as pd
ExampleOfWhatIHave = {'Car':['A', 'B', 'A', 'B'],
'Hour':['1', '1', '2', '2'],
'Car-speed':[10, 20, 11, 19]}
ExampleOfWhatINeed = {'Car':['A', 'B', 'A', 'B'],
'Hour':['1', '1', '2', '2'],
'Car51-speed':[100, 100, 110, 95]}
# Create DataFrame
df = pd.DataFrame(ExampleOfWhatIHave)
df = pd.DataFrame(ExampleOfWhatINeed)
CodePudding user response:
How about this nice one liner?
df['Car51-speed'] = 100 * df['Car-speed'] / df.groupby('Car')['Car-speed'].transform('first')
CodePudding user response:
This is an explanation of Quixotic22's answer as it was easier than writing it in a comment
Starting with the Denominator
df.groupby('Car') group the dataframe by the values on the car column, the .transform('first') then selects the first value for each group so
df_1.groupby('Car').transform('first')
gives:
Hour Car-speed Car51-speed
0 1 10 100.0
1 1 20 100.0
2 1 10 100.0
3 1 20 100.0
the addition of ['Car-speed'] selects just that column as a pandas series
df_1.groupby('Car')['Car-speed'].transform('first')
thus gives:
0 10
1 20
2 10
3 20
Name: Car-speed, dtype: int64
On the Numerator we have 100 * df_1['Car-speed']
this is simply the 'Car-speed' column as a series multiplied by the value 100 giving:
0 1000
1 2000
2 1100
3 1900
Name: Car-speed, dtype: int64
We divide these Series giving a Series where each value is the result of the division of the values at that index
0 100.0
1 100.0
2 110.0
3 95.0
Name: Car-speed, dtype: float64
this resulting series is set to the column 'Car51-speed' of the data frame creating a new column
Output:
Car Hour Car-speed Car51-speed
0 A 1 10 100.0
1 B 1 20 100.0
2 A 2 11 110.0
3 B 2 19 95.0