Create an additional column in a datframe based on a specific condition-CodePudding

I have a dataset given as such:

#Load the required libraries
import pandas as pd


#Create dataset
data = {'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
        'Run_time': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 4],
        'Married': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
        'Self_Employed': ['No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No'],
        'LoanAmount': [123, 128, 66, 120, 141, 52,96,15,85,36,58,89],
        }

#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)

Here, I wish to add an additional column 'Last_entry' which will contain 0's and 1's.

This column appears such that, for team-A, the last run-time is 5. So that row has Last_entry=1 and all other run-times for team-A should be 0.

For team-B, the last run-time is 3. So that row has Last_entry=1 and all other run-times for team-B should be 0.

For team-C, the last run-time is 4. So that row has Last_entry=1 and all other run-times for team-C should be 0.

The net result needs to look as such:

New dataframe by adding additional column

Can somebody please let me know how to achieve this task in python?

I wish to add an additional column in an existing dataset by using python

CodePudding user response：

You can use groupby and tail to get the last entry for each team. Then make a new column of zeroes, and set the resulting rows to one:

# Determine indicies for last entries
last_entry_idx = df.groupby('team').tail(1).index

# Create new column
df['last_entry'] = 0
df.loc[last_entry_idx, 'last_entry'] = 1