Home > Enterprise >  Create an additional column in a datframe based on a specific condition
Create an additional column in a datframe based on a specific condition

Time:12-24

I have a dataset given as such:

#Load the required libraries
import pandas as pd


#Create dataset
data = {'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
        'Run_time': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 4],
        'Married': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
        'Self_Employed': ['No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No'],
        'LoanAmount': [123, 128, 66, 120, 141, 52,96,15,85,36,58,89],
        }

#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)

Here, I wish to add an additional column 'Last_entry' which will contain 0's and 1's.

This column appears such that, for team-A, the last run-time is 5. So that row has Last_entry=1 and all other run-times for team-A should be 0.

For team-B, the last run-time is 3. So that row has Last_entry=1 and all other run-times for team-B should be 0.

For team-C, the last run-time is 4. So that row has Last_entry=1 and all other run-times for team-C should be 0.

The net result needs to look as such:

New dataframe by adding additional column

Can somebody please let me know how to achieve this task in python?

I wish to add an additional column in an existing dataset by using python

CodePudding user response:

You can use groupby and tail to get the last entry for each team. Then make a new column of zeroes, and set the resulting rows to one:

# Determine indicies for last entries
last_entry_idx = df.groupby('team').tail(1).index

# Create new column
df['last_entry'] = 0
df.loc[last_entry_idx, 'last_entry'] = 1
  • Related