Home > Back-end >  How do I remove line breaks and only return characters between two substrings in a list?
How do I remove line breaks and only return characters between two substrings in a list?

Time:11-07

If I have a string like:

query = """
SELECT
   alias.column_one,
   alias.column_two,
   alias.column_three
FROM
   table_name
"""

How do I just return the characters between "SELECT" and "FROM" in a list so I have a list like below which also accounts for line breaks:

columns = ['alias.column_one', 'alias.column_two', 'alias.column_three']

I've tried importing re but I'm still stuck

CodePudding user response:

You could try this:

query = """
SELECT
   alias.column_one,
   alias.column_two,
   alias.column_three
FROM
   table_name
"""

columns = query.split('\n') # split the string at every line break - '\n'
columns = [column.strip() for column in columns] # strip out the leading spaces

# take the slice of the list between 'SELECT' and 'FROM':
columns = columns[columns.index('SELECT') 1:columns.index('FROM')] 

columns.index('SELECT') returns the index of the list columns at which the word 'SELECT' appears. columns.index('FROM') does the same thing for 'FROM'. By taking a slice of columns between the first index (plus 1 - because the slice returns the value AT the first index) and the second index, you get only those elements between 'SELECT' and 'FROM'.

CodePudding user response:

You basically need the substring between 'SELECT' and 'FROM'. You could try something like this:

>>> s
'\nSELECT\n   alias.column_one,\n   alias.column_two,\n   
alias.column_three\nFROM\n   table_name\n'

>>> print((s.split('SELECT'))[1].split('FROM')[0])

alias.column_one,
alias.column_two,
alias.column_three

Since the question mentioned to get characters. Considering you wanted the result as a list, you could extend the above as:

>>> t = (s.split('SELECT'))[1].split('FROM')[0]
>>> [x.strip() for x in t.split(',\n')]
['alias.column_one', 'alias.column_two', 'alias.column_three']

CodePudding user response:

use re to get the string between SELECT and FROM, in this case is

   alias.column_one,
   alias.column_two,
   alias.column_three

then split the string by ',' and use strip() method to remove blank characters at the beginning and end

import re

query = """
SELECT
   alias.column_one,
   alias.column_two,
   alias.column_three
FROM
   table_name
"""

pattern = re.compile(r'SELECT(?P<column>[\S\s] ?)FROM')
result = pattern.search(query)
columns = result.group('column').split(',')
columns = list(map(lambda x:x.strip(),columns))
print(columns)
  • Related