how to remove special characters in python dictionary?
output = [{'title': 'title 1\u200c',
'subject': 'subject1\u200c','a'},
{'title': 'title 1\u200c',
'subject': ['subject1\u200c','a','b']}]
This is what I tried:
output['title'] = s.replace("\u200c", "") for s in output['title']
CodePudding user response:
What are you iterating for? You just need to replace the character from the string using str.replace()
.
output['title'] = output['title'].replace("\u200c", "")
This only changes value of the 'title'
key of output
{'title': 'title 1', 'subject': 'subject1\u200c'}
If you want to remove the character from all items in output
, you need a loop.:
for key, value in output.items():
output[key] = value.replace("\u200c", "")
Or, as a dict comprehension:
output = {key: value.replace("\u200c", "") for key, value in output.items()}
{'title': 'title 1', 'subject': 'subject1'}
Addressing your comments
I got this error for part one list indices must be integers or slices, not str
I got this error for second answer: 'list' object has no attribute 'items'
Its array of objects
Let's say output
looks like this:
output = [{'title': 'title 1\u200c', 'subject': 'subject1\u200c'},
{'title': 'title 2\u200c', 'subject': 'subject2\u200c'}]
You want to do what I showed above to each dict in output
. Just replace output
from before with elem
for elem in output:
elem['title'] = elem['title'].replace("\u200c", "")
[{'title': 'title 1', 'subject': 'subject1\u200c'},
{'title': 'title 2', 'subject': 'subject2\u200c'}]
Or, using a list and dict comprehension:
output = [
{key: value.replace("\u200c", "") for key, value in elem.items()}
for elem in output
]
[{'title': 'title 1', 'subject': 'subject1'},
{'title': 'title 2', 'subject': 'subject2'}]
CodePudding user response:
This isn't only a special character, those are Unicode Characters. To remove Unicode characters we can use the encode()
python method. The encode will return a bytes object, and you can transform in string by using the decode method.
In [1]: title = "subject1\u200c"
In [2]: title.encode("ascii", "ignore")
Out[2]: b'subject1'
In [3]: title.encode("ascii", "ignore").decode()
Out[3]: 'subject1'
For your list of dicts, what you need is something like:
In [15]: output = [{'title': 'title 1\u200c',
...: 'subject': 'subject1\u200c'}, {'title': 'title 1\u200c',
...: 'subject': 'subject1\u200c'}]
In [16]: decoded_output = [value["title"].encode("ascii", "ignore").decode() for val
...: ue in output]
In [17]: decoded_output
Out[17]: ['title 1', 'title 1']
EDIT:
In [20]: for i in output:
...: for key, value in i.items():
...: value.encode("ascii", "ignore").decode()
...: print(value)
...:
title 1
subject1
title 1
subject1
As you have a list of dicts, you have to iterate in the list, and for each item of the list (that are dicts) you will iterate again using the items()
dict method.