Home > Enterprise >  how to remove special characters from the list using python?
how to remove special characters from the list using python?

Time:06-14

I have a list like this.

z=[']\'What type of humans arrived on the Indian subcontinent from Africa?\', \'When did humans first arrive on the Indian subcontinent?\', \'What subcontinent did humans first arrive on?\', \'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?\',\kingdoms were established in Southeast Asia?Indianized\']']

I want to convert it into simple 2d list.

z= [['What type of humans arrived on the Indian subcontinent from Africa?', 'When did humans first arrive on the Indian subcontinent?', 'What subcontinent did humans first arrive on?', 'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?','kingdoms were established in Southeast Asia?Indianized']]

so how to convert this list into 2D list?

CodePudding user response:

The logic is not fully clear. I'd approach it using a regex on 2 or more non-word character to split:

[[x for x in re.split(r'[^a-z0-9\?]{2,}', s, flags=re.I) if x] for s in z]

output:

[['What type of humans arrived on the Indian subcontinent from Africa?',
  'When did humans first arrive on the Indian subcontinent?',
  'What subcontinent did humans first arrive on?',
  'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?',
  'kingdoms were established in Southeast Asia?Indianized']]

CodePudding user response:

You can use the library re. It will replace all the regex the special caracters. With the space at the end (after the 9) it will keep the spaces. If you don't want the spaces, remove it.

import re
re.sub('[^A-Za-z0-9 ] ', '', mystring)
  • Related