Home > Software engineering >  Search and replace from a selected text using regex in Python
Search and replace from a selected text using regex in Python

Time:10-07

I would like to select a text from a file in Python and replace only from the selected phrase until a certain text.

with open ('searchfile.txt', 'r' ) as f:
    content = f.read()
    content_new = re.sub('^\S*', '(.*?\/)', content, flags = re.M)
with open ('searchfile.txt', 'w') as f:
    f.write(content_new)

searchfile.txt contains the below text:

abc/def/efg 212 234 asjakj
hij/klm/mno 213 121 ashasj

My aim is to select everything from the line until the first space and then replace it with the text until the first occurance of backslash /

Example:

^\S* selects everything until the first space in my file which is "abc/def/efg".

I would like to replace this text with only "abc" and "hij" in different lines

My regexp (.*?\/) does not work for me here.

CodePudding user response:

You can split the content with whitespace, get the first item and split it with / and take the first item:

content_new = content.split()[0].split('/')[0]

See the Python demo.

If you plan to use a regex, you may use

match = re.search(r'^[^\s/] ', content, flags = re.M)
if match:
    content_new = match.group()

See the Python demo. Details:

  • ^ - start of a line (due to re.M)
  • [^\s/] - one or more chars other than whitespace and /.

CodePudding user response:

Try this:

>>> s = 'abc/def/efg 212 234 asjakj'
>>> p = s.split(' ', maxsplit=1)
>>> p
['abc/def/efg', '212 234 asjakj']
>>> p[0] = p[0].split('/', maxsplit=1)[0]
>>> p
['abc', '212 234 asjakj']
>>> s = ' '.join(p)
>>> s
'abc 212 234 asjakj'

One-liner solution:

>>> s.replace(s[:s.index(' ')], s[:s.index('/')], 1)
'abc 212 234 asjakj'

CodePudding user response:

May be this can help

import re

s = "abc/def/efg 212 234 asjakj"
pattern = r"^(.*?\/)"
replace = "xyz/"
op = re.sub(pattern, replace, s)
print (op)

CodePudding user response:

Rephrased expected behavior

  1. Given a string that has this pattern: <path><space>.
  2. If the first part of given string (<path>) has at least one slash / surrounded by words.
  3. Then return the string before the slash.
  4. Else return empty string.

Where path is words delimited by slashes. For example abc/de. But but not one of those:

  • abc
  • /de
  • abc/file.txt
  • abc/

Solution

Matching lines

Could also match for the pattern and only extract the first path-element before the slash then.

import re

line = "abc/def/efg 212 234 asjakj"

extracted = ''  # default
if re.match(r'^(\w /\w )  ', line):
    extracted = line.split('/')[0]  # even simpler than Wiktors split

print(extracted)

Extraction

The extraction can be done in two ways:

(1) Just the first path-element, like Wiktor answered.

first_path_element = "abc/def/efg 212 234 asjakj".split('/')[0]
print(first_path_element)

(2) Some may find a regex shorter and more expressive:

import re

first_path_element = re.findall(r'^(\w )/', "abc/def/efg 212 234 asjakj")[0]
print(first_path_element)
  • Related