Home > Software engineering >  How to extract some piece of word from a string?
How to extract some piece of word from a string?

Time:10-31

I have the following string in python:

datastring = """
Animals {
    idAnimal
    nameAnimal
    animalko5854hg[name="Jazz"]
    animal6ljkjh[name="Pinky"]
    animal595s422d1252g55[name="Steven"]
    animalko5854hg[name="David"]
}
"""

print(type(datastring))#->str

My string is data than a read before from a file text, now I have that data in datastring. In datastring always in the fourth line, the data is showed in the next way: animalidAnimal[name="nameAnimal"

So I would like to code a function that takes as a parameter a string like above, and return the part of idAnimal of the first line that starts in the following way: animalidAnimal[name="nameAnimal" So for example in the first string my expected output would be:

ko5854hg

Other example:

datastring = """
Animals {
    idAnimal
    nameAnimal
    animal456jlk165ut[name="Dalty"]
    animal6ljkj[name="Moon"]

}

Expected output:

456jlk165ut

Last example:

datastring = """
Animals {
    idAnimal
    nameAnimal
    animalk45lil69lhfr5942lk[name="Jazz"]
    animal6ljkjh[name="Pinky"]
    animal595s422d1252g55[name="Steven"]
    animalko5854hg[name="David"]
    animalko5854hg[name="Oty"]
    animalko5854hg[name="Dan"]
}

Expected output:

k45lil69lhfr5942lk

I don´t want to be considered as a lazy person, but I don´t really know how to start coding that, I read about startswith and endswith functions, but those only return True/False values.

Thanks.

CodePudding user response:

Have you tried using regexes? Using re.find_all(r"(?<=animal)(.*?)(?=\[)",datastring) would show up the list of IDs so if you want the first occurence you can get the ID with the 0 index, good luck

Thanks for notifying me about that, here's a simplier way thanks again for letting me know:

for line in datastring.splitlines():
    if line.startswith("animal"):
        id = line.replace("animal","").split("]")[0]

I think KillerRebooted's answer is more effective but as I said this is more simple

CodePudding user response:

You can start the match with { and use a capture group for the animalId:

{[^{}]*?\banimal(\w )\[name="[^\s"*]*"]

The pattern matches:

  • { Match a { char
  • [^{}]*? Match any character except { and } as few as possible
  • \banimal Match animal with a leading word boundary
  • (\w ) Capture group 1, match 1 word characters
  • \[name="[^\s"*]*"] Match the `[name="...."]

Regex demo

Example code

import re

pattern = r"{[^{}]*?\banimal(\w )\[name=\"[^\s\"*]*\"]"

s = ("Animals {\n"
            "    idAnimal\n"
            "    nameAnimal\n"
            "    animal456jlk165ut[name=\"Dalty\"]\n"
            "    animal6ljkj[name=\"Moon\"]\n\n"
            "}")

m = re.search(pattern, s)
if m:
    print(m.group(1))

Output

456jlk165ut

CodePudding user response:

You should probably allow for the line starting with 'animal' not necessarily being the fourth line. This might be more robust:

datastring = """
Animals {
    idAnimal
    nameAnimal
    animalko5854hg[name="Jazz"]
    animal6ljkjh[name="Pinky"]
    animal595s422d1252g55[name="Steven"]
    animalko5854hg[name="David"]
}
"""
ANIMAL = 'animal'

def get_animal_id(ds):
    for line in map(str.lstrip, ds.splitlines()):
        if line.startswith(ANIMAL):
            return line[len(ANIMAL):line.index('[')]

print(get_animal_id(datastring))

Output:

ko5854hg

Note:

If the first line observed starting with 'animal' does not contain '[' this will fail with ValueError

You could also do this using a regular expression thus:

import re

print(re.search(r'(?<=animal)(.*?)(?=\[)', datastring).group(1))
  • Related