Not sure what is the issue in my flow:
This is sample df:
df = pd.DataFrame({'customer':['A','B','C','D','E','F'],
'Traveled':[1,1,1,0,1,0],
'Travel_count':[2,3,5,0,1,0],
'country1':['UK','Italy','CA', '0','UK','0'],
'country2':['JP','IN','CO','0','EG','0'],
'shopping':['High','High','High','High','Medium','Medium']
})
gives:
customer Traveled Travel_count country1 country2 shopping
0 A 1 2 UK JP High
1 B 1 3 Italy IN High
2 C 1 5 CA CO High
3 D 0 0 0 0 High
4 E 1 1 UK EG Medium
5 F 0 0 0 0 Medium
I want to create some functions that automate filtering and then create a custom made df, so here are two functions that check customers on columns: Traveled ==
1 and shopping == High
:
def travel():
if (df['Traveled'] == 1):
return True
else:
return False
def shop_high():
if (df['shopping'] == 'High'):
return True
else:
return False
Here is a nested ifs code if the above conditions are True, it will check those who traveled more or less than 3 times:
def select(df):
if(travel and shop_high):
if (df['Travel_count'] > 3):
return (df['customer'], df['shopping'], ('Customer {} traveled more than 3 times').format(df['customer']))
elif (df['Travel_count'] < 3):
return (df['customer'], df['shopping'], ('Customer {} traveled less than 3 times').format(df['customer']))
If I apply this function to the original df to automate filtration and checking travel count, I got wrong results:
pd.DataFrame(list(df.apply(select, axis = 1).dropna()))
results:
0 1 2
0 A High Customer A traveled less than 3 times
1 C High Customer C traveled more than 3 times
2 D High Customer D traveled less than 3 times
3 E Medium Customer E traveled less than 3 times
4 F Medium Customer F traveled less than 3 times
Should be:
0 1 2
0 A High Customer A traveled less than 3 times
1 C High Customer C traveled more than 3 times
CodePudding user response:
I would use boolean indexing and numpy.sign
:
import numpy as np
travel = (np.sign(df['Travel_count'].sub(3))
.map({1: ' traveled more than 3 times',
-1: ' traveled less than 3 times'})
)
m1 = df['Traveled'].eq(1)
m2 = df['shopping'].eq('High')
m3 = travel.notna()
out = (df.loc[m1&m2&m3, ['customer', 'shopping']]
.assign(new='Customer ' df['customer'] travel)
)
output:
customer shopping new
0 A High Customer A traveled less than 3 times
2 C High Customer C traveled more than 3 times
CodePudding user response:
Using isin
:
new_df = ( df[df[['Traveled', 'shopping']].isin(['High', 1]).all(axis=1)
& df['Travel_count'].ne(3)].reset_index(drop=True))
new_df['new'] = ('Customer ' new_df['customer'] ' traveled '
pd.Series(np.where(new_df['Travel_count'].lt(3), 'less', 'more'))
' than 3 times')
CodePudding user response:
You can filter the dataframe by the 3 conditions and apply a simple function for the print
des = lambda row: f'Customer {row["customer"]} traveled {"more" if row["Travel_count"] > 3 else "less"} than 3 times'
df = df.loc[(df['Traveled'] == 1) & (df['shopping'] == 'High') & (df['Travel_count'] != 3)]
df['description'] = df.apply(lambda row: des(row), axis=1)
df = df[['customer', 'shopping', 'description']]
Output
customer shopping description
0 A High Customer A traveled less than 3 times
2 C High Customer C traveled more than 3 times