Home > Mobile >  how to remove special characters in python dictionary?
how to remove special characters in python dictionary?

Time:09-28

how to remove special characters in python dictionary?

output = [{'title': 'title 1\u200c',
  'subject': 'subject1\u200c','a'},
{'title': 'title 1\u200c',
  'subject': ['subject1\u200c','a','b']}]

This is what I tried:

output['title'] = s.replace("\u200c", "") for s in output['title']

CodePudding user response:

What are you iterating for? You just need to replace the character from the string using str.replace().

output['title'] = output['title'].replace("\u200c", "")

This only changes value of the 'title' key of output

{'title': 'title 1', 'subject': 'subject1\u200c'}

If you want to remove the character from all items in output, you need a loop.:

for key, value in output.items():
    output[key] = value.replace("\u200c", "")

Or, as a dict comprehension:

output = {key: value.replace("\u200c", "") for key, value in output.items()}
 {'title': 'title 1', 'subject': 'subject1'}

Addressing your comments

I got this error for part one list indices must be integers or slices, not str

I got this error for second answer: 'list' object has no attribute 'items'

Its array of objects

Let's say output looks like this:

output = [{'title': 'title 1\u200c', 'subject': 'subject1\u200c'},
          {'title': 'title 2\u200c', 'subject': 'subject2\u200c'}]

You want to do what I showed above to each dict in output. Just replace output from before with elem

for elem in output:
    elem['title'] = elem['title'].replace("\u200c", "")
[{'title': 'title 1', 'subject': 'subject1\u200c'},
 {'title': 'title 2', 'subject': 'subject2\u200c'}]

Or, using a list and dict comprehension:

output = [
    {key: value.replace("\u200c", "") for key, value in elem.items()}
    for elem in output
    ]
[{'title': 'title 1', 'subject': 'subject1'},
 {'title': 'title 2', 'subject': 'subject2'}]

CodePudding user response:

This isn't only a special character, those are Unicode Characters. To remove Unicode characters we can use the encode() python method. The encode will return a bytes object, and you can transform in string by using the decode method.

In [1]: title = "subject1\u200c"

In [2]: title.encode("ascii", "ignore")
Out[2]: b'subject1'

In [3]: title.encode("ascii", "ignore").decode()
Out[3]: 'subject1'

For your list of dicts, what you need is something like:

In [15]: output = [{'title': 'title 1\u200c',
    ...:   'subject': 'subject1\u200c'}, {'title': 'title 1\u200c',
    ...:   'subject': 'subject1\u200c'}]

In [16]: decoded_output = [value["title"].encode("ascii", "ignore").decode() for val
    ...: ue in output]

In [17]: decoded_output
Out[17]: ['title 1', 'title 1']

EDIT:

In [20]: for i in output:
    ...:     for key, value in i.items():
    ...:         value.encode("ascii", "ignore").decode()
    ...:         print(value)
    ...: 
title 1‌
subject1‌
title 1‌
subject1‌

As you have a list of dicts, you have to iterate in the list, and for each item of the list (that are dicts) you will iterate again using the items() dict method.

  • Related