I want to create a flag column based off if the description column contains 'Hello' and 'Name' key words and if call_duration is 2 min or less. If both conditions are met flag would be '1', else it would be '0'.
Data example
Description | Call_Duration |
---|---|
Hello, my name | 1 |
Contact error | 42 |
Hello, my name | 3 |
Output should be
Description | Call_Duration | Flag_Column |
---|---|---|
Hello, my name | 1 | 1 |
Contact error | 42 | 0 |
Hello, my name | 3 | 0 |
Below is my code i'm currently using
df = pd.read_csv('file_1.txt', sep=',')
df['call_duration'] = df['call_duration_sec']/60
df.call_duration = df.call_duration.round()
# Adding flag column based off key words in Description column
df['Flag_Column'] = np.where(df['Description'].str.contains("hello&name", case=False, na=False) , 1, 0)
# print(df.head())
df.to_csv('some_file.csv')
CodePudding user response:
You could use a regular expression to check for hello and name, separated by any other text. Combine this w/the duration, and use np.where to generate the flag.
If hello and name could be in the opposite order, I'd suggest checking out how regular expressions work and play around w/building your own.
import numpy as np
df['Flag_Column'] = np.where((df['Description'].str.contains('hello.*name', case=False) & df['Call_Duration'].le(2)), 1, 0)
CodePudding user response:
You can combine multiple conditions using &
like this:
df['Flag_Column'] = np.where(df['Description'].str.contains("hello.*name", case=False, na=False, regex=True) & (df['Call_Duration'] < 2), 1, 0)
.contains also needs regex=True
to find any occurrence of hello<anyNumberOfCharactersHere>name