Home > Enterprise >  Changing object to string in python
Changing object to string in python

Time:10-13

I'm trying to use the re library to use regular expression to substitute text. However, even after I use astype(str), the column that I want to change is still being stored as an object.

import os
import re
import pandas as pd


#https://www.geeksforgeeks.org/update-column-value-of-csv-in-python/


# reading the csv file
df = pd.read_excel('Upload to dashboard - Untitled (1).xlsx', skiprows = 7)
df.drop(df.tail(7).index,inplace=True) # drop last n rows
print(df.head(3))

df['Publisher URL'] = df['Publisher URL'].fillna('')
df['Publisher URL'] = df['Publisher URL'].astype(str)
df['Publisher URL'] = df['Publisher URL'].str.replace('gerd#','')
print(df.dtypes)
df['Publisher URL2'] = df['Publisher URL'].str.replace('www.','')
trim = re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])
#https://docs.python.org/3/library/re.html#re.sub

print(df['trim'])
df.to_csv("C:/out.csv")

this is my output for print(df.dtypes)

[3 rows x 8 columns]
Unnamed: 0        object
Buying Agency     object
Advertiser        object
Publisher URL     object
Date              object
Buy Type          object
Ad Type           object
No. of Ads       float64

And my error is

  File ~\.spyder-py3\temp.py:30 in <module>
    trim = re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])

  File ~\Anaconda3\lib\re.py:210 in sub
    return _compile(pattern, flags).sub(repl, string, count)

TypeError: expected string or bytes-like object

How can I change my script so I can use the re.sub function?

The regex is correct, I had some help from this question:Regex Exclusions

CodePudding user response:

You can use

df['trim'] = df['Publisher URL'].str.replace(r'^(.*?)\bm\.', r'\1', regex=True)

Or, a better idea is to use

df['trim'] = df['Publisher URL'].str.replace(r'^(.*?[/.])m\.', r'\1', regex=True)

See regex demo #1 and regex demo #2

Details:

  • ^ - start of string
  • (.*?[/.]) - Group 1 (in Python, \1, not $1, refers to this group pattern match): any chars other than line break chars, as few as possible, and then a / or .
  • m\. - m. substring.
  • Related