Home > Net >  Create Flag Column based off of multiple conditions
Create Flag Column based off of multiple conditions

Time:09-29

I want to create a flag column based off if the description column contains 'Hello' and 'Name' key words and if call_duration is 2 min or less. If both conditions are met flag would be '1', else it would be '0'.

Data example

Description Call_Duration
Hello, my name 1
Contact error 42
Hello, my name 3

Output should be

Description Call_Duration Flag_Column
Hello, my name 1 1
Contact error 42 0
Hello, my name 3 0

Below is my code i'm currently using

df = pd.read_csv('file_1.txt', sep=',')

df['call_duration'] = df['call_duration_sec']/60

df.call_duration = df.call_duration.round()

# Adding flag column based off key words in Description column
df['Flag_Column'] = np.where(df['Description'].str.contains("hello&name", case=False, na=False) , 1, 0)

# print(df.head())


df.to_csv('some_file.csv')

CodePudding user response:

You could use a regular expression to check for hello and name, separated by any other text. Combine this w/the duration, and use np.where to generate the flag.

If hello and name could be in the opposite order, I'd suggest checking out how regular expressions work and play around w/building your own.

import numpy as np
df['Flag_Column'] = np.where((df['Description'].str.contains('hello.*name', case=False) & df['Call_Duration'].le(2)), 1, 0)

CodePudding user response:

You can combine multiple conditions using & like this:

df['Flag_Column'] = np.where(df['Description'].str.contains("hello.*name", case=False, na=False, regex=True) & (df['Call_Duration'] < 2), 1, 0)

.contains also needs regex=True to find any occurrence of hello<anyNumberOfCharactersHere>name

  • Related