Home > Software design >  How to convert R code syntax into Python syntax using Pandas data frame?
How to convert R code syntax into Python syntax using Pandas data frame?

Time:10-27

Let's say we have following code in R, what would be it's equivalent Pandas data frame syntax/method in Python ?

network_tickets <- contains(comcast_data$CustomerComplaint, match = 'network', ignore.case = T)
internet_tickets <- contains(comcast_data$CustomerComplaint, match = 'internet', ignore.case = T)
billing_tickets <- contains(comcast_data$CustomerComplaint, match = 'bill', ignore.case = T)
email_tickets <- contains(comcast_data$CustomerComplaint, match = 'email', ignore.case = T)
charges_ticket <- contains(comcast_data$CustomerComplaint, match = 'charge', ignore.case = T)
    
comcast_data$ComplaintType[internet_tickets] <- "Internet"
comcast_data$ComplaintType[network_tickets] <- "Network"
comcast_data$ComplaintType[billing_tickets] <- "Billing"
comcast_data$ComplaintType[email_tickets] <- "Email"
comcast_data$ComplaintType[charges_ticket] <- "Charges"
    
comcast_data$ComplaintType[-c(internet_tickets, network_tickets, billing_tickets, c
                              harges_ticket, email_tickets)] <- "Others"

I could convert the first set of operation like below in Python:

network_tickets = df.ComplaintDescription.str.contains ('network', regex=True, case=False)

But, finding challenge to assign the variable network_tickets as value "Internet" into a new pandas dataframe column i.e. ComplaintType. In R, it seems you can do that in just one single line.

However, not sure how we could do this in Python in one single line of code, tried below ways but with errors:

a) df['ComplaintType'].apply(internet_tickets) = "Internet"
b) df['ComplaintType'] = df.apply(internet_tickets)
c) df['ComplaintType'] = internet_tickets.apply("Internet")

I think we could first create a new column in dataframe :

df['ComplaintType'] = internet_tickets

But not sure about next steps.

CodePudding user response:

Use Series.str.contains with DataFrame.loc for set values by list:

df = pd.DataFrame(data = {"ComplaintDescription":["BiLLing is super","email","new"]})

L = [ "Internet","Network", "Billing", "Email", "Charges"]
for val in L:
    df.loc[df['ComplaintDescription'].str.contains(val, case=False), 'ComplaintType'] = val

df['ComplaintType'] = df['ComplaintType'].fillna('Others')
print (df)
  ComplaintDescription ComplaintType
0     BiLLing is super       Billing
1                email         Email
2                  new        Others

EDIT:

If need use values separately:

df.loc[df['ComplaintDescription'].str.contains('network', case=False), 'ComplaintType'] = "Internet"
  • Related