Home > Net >  Splitting string using word, expression, or pattern in python
Splitting string using word, expression, or pattern in python

Time:10-27

I am parsing information using the GET request with a web client. I have a concatenated string based on that data where I'd like to split the string based on this pattern: "\r\n". I basically want each bit of header info on its own line. Also I'd like to exclude the body information.

Here is a portion of a sample string I'd like to split:

'HTTP/1.1 400 Bad Request\\r\\nDate: Tue, 26 Oct 2021 11:26:46 GMT\\r\\nServer:

I have a function where I parse the information and I've tried using regex and split but I keep getting errors (I am new to python and networking). Here are some examples of what I've tried (webinformation is the string to split):

header = webinformation.splitlines()

for x in range(len(header)):
    print(header[x])

Here is one example of the regular expressions I've tried

print(re.split('\\r\\n', webinformation))

How could I print each bit of information on its own line? I'm not sure if this is an issue with escape characters maybe?

CodePudding user response:

You have \r\n four-char line separators.

You do not need a regex since it is a fixed text. Use str.split:

text = 'HTTP/1.1 400 Bad Request\\r\\nDate: Tue, 26 Oct 2021 11:26:46 GMT\\r\\nServer:'
for line in text.split(r'\r\n'):
    print(line)

See the Python demo.

Output:

HTTP/1.1 400 Bad Request
Date: Tue, 26 Oct 2021 11:26:46 GMT
Server:

CodePudding user response:

Just like this:

➜  ~ ipython
Python 3.8.10 (default, Jun  2 2021, 10:49:15)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.28.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: s = 'HTTP/1.1 400 Bad Request\\r\\nDate: Tue, 26 Oct 2021 11:26:46 GMT\\r\\nServer:'

In [2]: s.replace('\\r\\n', '\n').splitlines()
Out[2]: ['HTTP/1.1 400 Bad Request', 'Date: Tue, 26 Oct 2021 11:26:46 GMT', 'Server:']

CodePudding user response:

You can replace the space with \n without the use of regex :

a = 'HTTP/1.1 400 Bad Request\\r\\nDate: Tue, 26 Oct 2021 11:26:46 GMT\\r\\nServer:'
print(a.replace('\\r\\n', '\n'))

the output:

HTTP/1.1 400 Bad Request
Date: Tue, 26 Oct 2021 11:26:46 GMT
Server:
  • Related