Home > database >  Python how to convert a string (containing enter) to a list without ascii characters
Python how to convert a string (containing enter) to a list without ascii characters

Time:03-18

I am trying to get the output of a tool and making a list out of the output. I have managed to get a propper list using regex, but it is not really a list but a string. I have tried using splitlines() and split() to transform it into a propper list but I can't seem to do it.

This is (a part of) the string that needs to be converted to a list:

www.nu.nl
2017.nu.nl
account.nu.nl
accounts.nu.nl
actie.nu.nl
admin.nu.nl
admin-2.nu.nl
admin-2-public.nu.nl
adverteren.nu.nl
www.adverteren.nu.nl
privacy.adverteren.nu.nl
advertorial.nu.nl
api.nu.nl
api2.nu.nl
autodiscover.nu.nl
beta.nu.nl
privacy.beta.nu.nl
brandedcontent.nu.nl
cdn.nu.nl
cmp.nu.nl
editorialinsights.nu.nl
f1.nu.nl
f1spel.nu.nl
facebook.nu.nl
foto.nu.nl

When I use the split() or splitlines() I get the following output:

['\x1b[0m', '\x1b[92mwww.nu.nl\x1b[0m', '\x1b[92m2017.nu.nl\x1b[0m', '\x1b[92maccount.nu.nl\x1b[0m', '\x1b[92maccounts.nu.nl\x1b[0m', '\x1b[92mactie.nu.nl\x1b[0m', '\x1b[92madmin.nu.nl\x1b[0m', '\x1b[92madmin-2.nu.nl\x1b[0m', '\x1b[92madmin-2-public.nu.nl\x1b[0m', '\x1b[92madverteren.nu.nl\x1b[0m', '\x1b[92mwww.adverteren.nu.nl\x1b[0m', '\x1b[92mprivacy.adverteren.nu.nl\x1b[0m', '\x1b[92madvertorial.nu.nl\x1b[0m', '\x1b[92mapi.nu.nl\x1b[0m', '\x1b[92mapi2.nu.nl\x1b[0m', '\x1b[92mautodiscover.nu.nl\x1b[0m', '\x1b[92mbeta.nu.nl\x1b[0m', '\x1b[92mprivacy.beta.nu.nl\x1b[0m', '\x1b[92mbrandedcontent.nu.nl\x1b[0m', '\x1b[92mcdn.nu.nl\x1b[0m', '\x1b[92mcmp.nu.nl\x1b[0m', '\x1b[92meditorialinsights.nu.nl\x1b[0m', '\x1b[92mf1.nu.nl\x1b[0m', '\x1b[92mf1spel.nu.nl\x1b[0m', '\x1b[92mfacebook.nu.nl\x1b[0m', '\x1b[92mfoto.nu.nl\x1b[0m', '\x1b[92mgadgets.nu.nl\x1b[0m', '\x1b[92mgraph.nu.nl\x1b[0m', '\x1b[92mi.nu.nl\x1b[0m', '\x1b[92miphone.nu.nl\x1b[0m', '\x1b[92mlink.nu.nl\x1b[0m', '\x1b[92mlive.nu.nl\x1b[0m', '\x1b[92mlogin.nu.nl\x1b[0m', '\x1b[92mlogin2.nu.nl\x1b[0m', '\x1b[92mm.nu.nl\x1b[0m', '\x1b[92mprivacy.m.nu.nl\x1b[0m', '\x1b[92mmedia.nu.nl\x1b[0m', '\x1b[92mmedia-staging.nu.nl\x1b[0m', '\x1b[92mmediatoolui.nu.nl\x1b[0m', '\x1b[92mmeedoen.nu.nl\x1b[0m', '\x1b[92mmessagent.nu.nl\x1b[0m', '\x1b[92mmetrics.nu.nl\x1b[0m', '\x1b[92mmijnomgeving.nu.nl\x1b[0m', '\x1b[92mmijnomgeving-acc.nu.nl\x1b[0m', '\x1b[92mmijnteam.nu.nl\x1b[0m', '\x1b[92mprivacy.mijnteam.nu.nl\x1b[0m', '\x1b[92mmobi.nu.nl\x1b[0m', '\x1b[92mmobiel.nu.nl\x1b[0m', '\x1b[92mprivacy.mobiel.nu.nl\x1b[0m', '\x1b[92mmsoid.nu.nl\x1b[0m', '\x1b[92mnewsquiz.nu.nl\x1b[0m', '\x1b[92mwww.nu.nu.nl\x1b[0m', '\x1b[92mnumobileapp.nu.nl\x1b[0m', '\x1b[92mold.nu.nl\x1b[0m', '\x1b[92mop.nu.nl\x1b[0m', '\x1b[92morange.nu.nl\x1b[0m', '\x1b[92mpreview.nu.nl\x1b[0m', '\x1b[92mprivacy.nu.nl\x1b[0m', '\x1b[92msecure.nu.nl\x1b[0m', '\x1b[92msentry.nu.nl\x1b[0m', '\x1b[92mservice.nu.nl\x1b[0m', '\x1b[92mshop.nu.nl\x1b[0m', '\x1b[92mwww.shop.nu.nl\x1b[0m', '\x1b[92msimonly-advertorial.nu.nl\x1b[0m', '\x1b[92mprivacy.simonly-advertorial.nu.nl\x1b[0m', '\x1b[92msip.nu.nl\x1b[0m', '\x1b[92mspecials.nu.nl\x1b[0m', '\x1b[92mstaging.nu.nl\x1b[0m', '\x1b[92mwww.staging.nu.nl\x1b[0m', '\x1b[92mapi.staging.nu.nl\x1b[0m', '\x1b[92mtalk-cdn.staging.nu.nl\x1b[0m', '\x1b[92mstaging-shop.nu.nl\x1b[0m', '\x1b[92mstatic.nu.nl\x1b[0m', '\x1b[92mstories.nu.nl\x1b[0m', '\x1b[92mtalk.nu.nl\x1b[0m', '\x1b[92mtalk-cdn.nu.nl\x1b[0m', '\x1b[92mtest.nu.nl\x1b[0m', '\x1b[92mwww.test.nu.nl\x1b[0m', '\x1b[92mapi.test.nu.nl\x1b[0m', '\x1b[92mapi-cms2test.test.nu.nl\x1b[0m', '\x1b[92mtalk-cdn.test.nu.nl\x1b[0m', '\x1b[92mtalk2022-cdn.test.nu.nl\x1b[0m', '\x1b[92mwww-cms2test.test.nu.nl\x1b[0m', '\x1b[92mwww1.test.nu.nl\x1b[0m', '\x1b[92mtest-shop.nu.nl\x1b[0m', '\x1b[92mtest-voordeel.nu.nl\x1b[0m', '\x1b[92mtools.nu.nl\x1b[0m', '\x1b[92mtourtopper.nu.nl\x1b[0m', '\x1b[92murl8180.nu.nl\x1b[0m', '\x1b[92mverkiezingen.nu.nl\x1b[0m', '\x1b[92mvoordeel.nu.nl\x1b[0m', '\x1b[92mwidgets.nu.nl\x1b[0m', '\x1b[92macceptatie.widgets.nu.nl\x1b[0m', '\x1b[92mwintickets.nu.nl\x1b[0m', '\x1b[92mprivacy.wintickets.nu.nl\x1b[0m', '\x1b[92mprivacy.www.nu.nl\x1b[0m', '\x1b[92mwww1.nu.nl\x1b[0m', '\x1b[92mzon.nu.nl\x1b[0m', '\x1b[92mbrandedcontent.oudersvannu.nl\x1b[0m', '\x1b[92mmedia.oudersvannu.nl\x1b[0m']

I figured it were ascii characters and I have tried to filter them out using the

.encode("ascii", "ignore")

and then .decode() method, but that makes no difference.

My code:

pattern = '(?<=Total Unique Subdomains Found: ..)(?s)(.*$)'
result = subprocess.run(['python3', '/opt/sublist3r/sublist3r.py', '-d', self.domain], stdout=subprocess.PIPE).stdout.decode('utf-8')
regexOutput = re.findall(pattern, result)

print(regexOutput[0]))

This gives me the list that is at the beginning of this post.

Could anyone help me on what to do?

CodePudding user response:

Use regular expression to delete them:

import re
ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
a = ['\x1b[0m', '\x1b[92mwww.nu.nl\x1b[0m', '\x1b[92m2017.nu.nl\x1b[0m', '\x1b[92maccount.nu.nl\x1b[0m', '\x1b[92maccounts.nu.nl\x1b[0m', '\x1b[92mactie.nu.nl\x1b[0m', '\x1b[92madmin.nu.nl\x1b[0m', '\x1b[92madmin-2.nu.nl\x1b[0m', '\x1b[92madmin-2-public.nu.nl\x1b[0m', '\x1b[92madverteren.nu.nl\x1b[0m', '\x1b[92mwww.adverteren.nu.nl\x1b[0m', '\x1b[92mprivacy.adverteren.nu.nl\x1b[0m', '\x1b[92madvertorial.nu.nl\x1b[0m', '\x1b[92mapi.nu.nl\x1b[0m', '\x1b[92mapi2.nu.nl\x1b[0m', '\x1b[92mautodiscover.nu.nl\x1b[0m', '\x1b[92mbeta.nu.nl\x1b[0m', '\x1b[92mprivacy.beta.nu.nl\x1b[0m', '\x1b[92mbrandedcontent.nu.nl\x1b[0m', '\x1b[92mcdn.nu.nl\x1b[0m', '\x1b[92mcmp.nu.nl\x1b[0m', '\x1b[92meditorialinsights.nu.nl\x1b[0m', '\x1b[92mf1.nu.nl\x1b[0m', '\x1b[92mf1spel.nu.nl\x1b[0m', '\x1b[92mfacebook.nu.nl\x1b[0m', '\x1b[92mfoto.nu.nl\x1b[0m', '\x1b[92mgadgets.nu.nl\x1b[0m', '\x1b[92mgraph.nu.nl\x1b[0m', '\x1b[92mi.nu.nl\x1b[0m', '\x1b[92miphone.nu.nl\x1b[0m', '\x1b[92mlink.nu.nl\x1b[0m', '\x1b[92mlive.nu.nl\x1b[0m', '\x1b[92mlogin.nu.nl\x1b[0m', '\x1b[92mlogin2.nu.nl\x1b[0m', '\x1b[92mm.nu.nl\x1b[0m', '\x1b[92mprivacy.m.nu.nl\x1b[0m', '\x1b[92mmedia.nu.nl\x1b[0m', '\x1b[92mmedia-staging.nu.nl\x1b[0m', '\x1b[92mmediatoolui.nu.nl\x1b[0m', '\x1b[92mmeedoen.nu.nl\x1b[0m', '\x1b[92mmessagent.nu.nl\x1b[0m', '\x1b[92mmetrics.nu.nl\x1b[0m', '\x1b[92mmijnomgeving.nu.nl\x1b[0m', '\x1b[92mmijnomgeving-acc.nu.nl\x1b[0m', '\x1b[92mmijnteam.nu.nl\x1b[0m', '\x1b[92mprivacy.mijnteam.nu.nl\x1b[0m', '\x1b[92mmobi.nu.nl\x1b[0m', '\x1b[92mmobiel.nu.nl\x1b[0m', '\x1b[92mprivacy.mobiel.nu.nl\x1b[0m', '\x1b[92mmsoid.nu.nl\x1b[0m', '\x1b[92mnewsquiz.nu.nl\x1b[0m', '\x1b[92mwww.nu.nu.nl\x1b[0m', '\x1b[92mnumobileapp.nu.nl\x1b[0m', '\x1b[92mold.nu.nl\x1b[0m', '\x1b[92mop.nu.nl\x1b[0m', '\x1b[92morange.nu.nl\x1b[0m', '\x1b[92mpreview.nu.nl\x1b[0m', '\x1b[92mprivacy.nu.nl\x1b[0m', '\x1b[92msecure.nu.nl\x1b[0m', '\x1b[92msentry.nu.nl\x1b[0m', '\x1b[92mservice.nu.nl\x1b[0m', '\x1b[92mshop.nu.nl\x1b[0m', '\x1b[92mwww.shop.nu.nl\x1b[0m', '\x1b[92msimonly-advertorial.nu.nl\x1b[0m', '\x1b[92mprivacy.simonly-advertorial.nu.nl\x1b[0m', '\x1b[92msip.nu.nl\x1b[0m', '\x1b[92mspecials.nu.nl\x1b[0m', '\x1b[92mstaging.nu.nl\x1b[0m', '\x1b[92mwww.staging.nu.nl\x1b[0m', '\x1b[92mapi.staging.nu.nl\x1b[0m', '\x1b[92mtalk-cdn.staging.nu.nl\x1b[0m', '\x1b[92mstaging-shop.nu.nl\x1b[0m', '\x1b[92mstatic.nu.nl\x1b[0m', '\x1b[92mstories.nu.nl\x1b[0m', '\x1b[92mtalk.nu.nl\x1b[0m', '\x1b[92mtalk-cdn.nu.nl\x1b[0m', '\x1b[92mtest.nu.nl\x1b[0m', '\x1b[92mwww.test.nu.nl\x1b[0m', '\x1b[92mapi.test.nu.nl\x1b[0m', '\x1b[92mapi-cms2test.test.nu.nl\x1b[0m', '\x1b[92mtalk-cdn.test.nu.nl\x1b[0m', '\x1b[92mtalk2022-cdn.test.nu.nl\x1b[0m', '\x1b[92mwww-cms2test.test.nu.nl\x1b[0m', '\x1b[92mwww1.test.nu.nl\x1b[0m', '\x1b[92mtest-shop.nu.nl\x1b[0m', '\x1b[92mtest-voordeel.nu.nl\x1b[0m', '\x1b[92mtools.nu.nl\x1b[0m', '\x1b[92mtourtopper.nu.nl\x1b[0m', '\x1b[92murl8180.nu.nl\x1b[0m', '\x1b[92mverkiezingen.nu.nl\x1b[0m', '\x1b[92mvoordeel.nu.nl\x1b[0m', '\x1b[92mwidgets.nu.nl\x1b[0m', '\x1b[92macceptatie.widgets.nu.nl\x1b[0m', '\x1b[92mwintickets.nu.nl\x1b[0m', '\x1b[92mprivacy.wintickets.nu.nl\x1b[0m', '\x1b[92mprivacy.www.nu.nl\x1b[0m', '\x1b[92mwww1.nu.nl\x1b[0m', '\x1b[92mzon.nu.nl\x1b[0m', '\x1b[92mbrandedcontent.oudersvannu.nl\x1b[0m', '\x1b[92mmedia.oudersvannu.nl\x1b[0m']
ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
print([ansi_escape.sub('', i) for i in a])

CodePudding user response:

These are escape sequences to be able to print in color to e.g bash. \x1b[92 means light green and \x1b[0m means turn everything off i. e. stop writing in green.

FYI: Here a nice enter image description here

  • Related