Let's say we have following code in R, what would be it's equivalent Pandas data frame syntax/method in Python ?
network_tickets <- contains(comcast_data$CustomerComplaint, match = 'network', ignore.case = T)
internet_tickets <- contains(comcast_data$CustomerComplaint, match = 'internet', ignore.case = T)
billing_tickets <- contains(comcast_data$CustomerComplaint, match = 'bill', ignore.case = T)
email_tickets <- contains(comcast_data$CustomerComplaint, match = 'email', ignore.case = T)
charges_ticket <- contains(comcast_data$CustomerComplaint, match = 'charge', ignore.case = T)
comcast_data$ComplaintType[internet_tickets] <- "Internet"
comcast_data$ComplaintType[network_tickets] <- "Network"
comcast_data$ComplaintType[billing_tickets] <- "Billing"
comcast_data$ComplaintType[email_tickets] <- "Email"
comcast_data$ComplaintType[charges_ticket] <- "Charges"
comcast_data$ComplaintType[-c(internet_tickets, network_tickets, billing_tickets, c
harges_ticket, email_tickets)] <- "Others"
I could convert the first set of operation like below in Python:
network_tickets = df.ComplaintDescription.str.contains ('network', regex=True, case=False)
But, finding challenge to assign the variable network_tickets as value "Internet" into a new pandas dataframe column i.e. ComplaintType. In R, it seems you can do that in just one single line.
However, not sure how we could do this in Python in one single line of code, tried below ways but with errors:
a) df['ComplaintType'].apply(internet_tickets) = "Internet"
b) df['ComplaintType'] = df.apply(internet_tickets)
c) df['ComplaintType'] = internet_tickets.apply("Internet")
I think we could first create a new column in dataframe :
df['ComplaintType'] = internet_tickets
But not sure about next steps.
CodePudding user response:
Use Series.str.contains
with DataFrame.loc
for set values by list:
df = pd.DataFrame(data = {"ComplaintDescription":["BiLLing is super","email","new"]})
L = [ "Internet","Network", "Billing", "Email", "Charges"]
for val in L:
df.loc[df['ComplaintDescription'].str.contains(val, case=False), 'ComplaintType'] = val
df['ComplaintType'] = df['ComplaintType'].fillna('Others')
print (df)
ComplaintDescription ComplaintType
0 BiLLing is super Billing
1 email Email
2 new Others
EDIT:
If need use values separately:
df.loc[df['ComplaintDescription'].str.contains('network', case=False), 'ComplaintType'] = "Internet"