I have a dataset given as such:
#Load the required libraries
import pandas as pd
#Create dataset
data = {'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
'Run_time': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 4],
'Married': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
'Self_Employed': ['No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No'],
'LoanAmount': [123, 128, 66, 120, 141, 52,96,15,85,36,58,89],
}
#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)
Here, I wish to add an additional column 'Last_entry' which will contain 0's and 1's.
This column appears such that, for team-A, the last run-time is 5. So that row has Last_entry=1 and all other run-times for team-A should be 0.
For team-B, the last run-time is 3. So that row has Last_entry=1 and all other run-times for team-B should be 0.
For team-C, the last run-time is 4. So that row has Last_entry=1 and all other run-times for team-C should be 0.
The net result needs to look as such:
New dataframe by adding additional column
Can somebody please let me know how to achieve this task in python?
I wish to add an additional column in an existing dataset by using python
CodePudding user response:
You can use groupby
and tail
to get the last entry for each team. Then make a new column of zeroes, and set the resulting rows to one:
# Determine indicies for last entries
last_entry_idx = df.groupby('team').tail(1).index
# Create new column
df['last_entry'] = 0
df.loc[last_entry_idx, 'last_entry'] = 1