I have a Python BOT that queries a database, saves the output to Pandas Dataframe and writes the data to an Excel template.
Yesterday the data did not saved to the Excel template because one of the fields in a record contain the following characters:
> " , *, /, (),:,\n
Pandas failed to save the data to the file.
This is the code that creates the dataframe:
upload_df = sql_df.copy()
This code prepares the template file with time/date stamp
src = file_name.format(val="")
date_str = " " str(datetime.today().strftime("%d%m%Y%H%M%S"))
dst_file = file_name.format(val=date_str)
copyfile(src, os.path.join(save_path, dst_file))
work_book = load_workbook(os.path.join(save_path, dst_file))
and this code saves the dataframe to the excel file
writer = pd.ExcelWriter(os.path.join(save_path, dst_file), engine='openpyxl')
writer.book = work_book
writer.sheets = {ws.title: ws for ws in work_book.worksheets}
upload_df.to_excel(writer, sheet_name=sheet_name, startrow = 1, index=False, header = False)
writer.save()
My question is how can I clean the special characters from an specific column [description]
in my dataframe before I write it to the Excel template?
I have tried:
upload_df['Name'] = upload_df['Name'].replace(to_replace= r'\W',value=' ',regex=True)
But this removes everything and not a certain type of special character. I guess we could use a list of items and iterate through the list and run the replace but is there a more Pythonic solution ?
CodePudding user response:
You could use the following (pass characters as a list to the method parameter):
upload_df['Name'] = upload_df['Name'].replace(
to_replace=['"', '*', '/', '()', ':', '\n'],
value=' '
)
CodePudding user response:
Use str.replace
:
>>> df
Name
0 (**Hello\nWorld:)
>>> df['Name'] = df['Name'].str.replace(r'''["*/():\\n]''', '', regex=True)
>>> df
Name
0 HelloWorld
Maybe you want to replace line breaks by whitespaces:
>>> df = df.replace({'Name': {r'["*/():]': '',
r'\\n': ' '}}, regex=True)
>>> df
Name
0 Hello World
CodePudding user response:
As some of the special characters to remove are regex meta-characters, we have to escape these characters before we can replace them to empty strings with regex.
You can automate escaping these special character by re.escape
, as follows:
import re
# put the special characters in a list
special_char = ['"', '*', '/', '(', ')', ':', '\n']
special_char_escaped = list(map(re.escape, special_char))
The resultant list of escaped special characters is as follows:
print(special_char_escaped)
['"', '\\*', '/', '\\(', '\\)', ':', '\\\n']
Then, we can remove the special characters with .replace()
as follows:
upload_df['Name'] = upload_df['Name'].replace(special_char_escaped, '', regex=True)
Demo
Data Setup
upload_df = pd.DataFrame({'Name': ['"abc*/(xyz):\npqr']})
Name
0 "abc*/(xyz):\npqr
Run codes:
import re
# put the special characters in a list
special_char = ['"', '*', '/', '(', ')', ':', '\n']
special_char_escaped = list(map(re.escape, special_char))
upload_df['Name'] = upload_df['Name'].replace(special_char_escaped, '', regex=True)
Output:
print(upload_df)
Name
0 abcxyzpqr