I need some help trying to delete non letter characters from strings in python (specifically, column names) but only the ones at the beggining and the end of the string.
Here it is so you can understand better what I am dealing with:
column_names = ['Column_1', 'Column_2_', '_Column_3__', '__Column_4___']
I need the output to be like this:
column_names = ['Column_1', 'Column_2', 'Column_3', 'Column_4']
Can you help me please?
CodePudding user response:
You can do something like this ?
for i in range(0,len(column_names)):
column_names[i] = column_names[i].strip("_")
Strip also works if you specify which chars you want to strip (doc).
CodePudding user response:
You can use .strip('__')
:
column_names = [i.strip('__') for i in column_names]
Output:
['Column_1', 'Column_2', 'Column_3', 'Column_4']
CodePudding user response:
Do you really need regex for this?
Non regex method:
column_names = ['Column_1', 'Column_2_', '_Column_3__', '__Column_4___']
column_names = [x.strip("_") for x in column_names]
print(column_names)
# ['Column_1', 'Column_2', 'Column_3', 'Column_4']
Regex method:
Use the regex: (?<=^)_ |_ (?=$)
This matches any underscore at the end or at the beginning of the string.
import re
column_names = ['Column_1', 'Column_2_', '_Column_3__', '__Column_4___']
column_names = [re.sub(r'(?<=^)_ |_ (?=$)','',x) for x in column_names]
print(column_names)
# ['Column_1', 'Column_2', 'Column_3', 'Column_4']