Home > Enterprise >  How do you find all instances of ISBN number using Python Regex
How do you find all instances of ISBN number using Python Regex

Time:11-23

I would really appreciate some assistance...

I'm trying to retrieve an ISBN number (13 digits) from pages, but the number set in so many different formats and that's why I can't retrieve all the different instances:

ISBN-13: 978 1 4310 0862 9
ISBN: 9781431008629
ISBN9781431008629
ISBN 9-78-1431-008-629
ISBN: 9781431008629 more text of the number
isbn : 9781431008629 

My output should be: ISBN: 9781431008629

myISBN = re.findall("ISBN"   r'\[\\w\\W\]{1,17}',text)
myISBN = myISBN\[0\]
print (myISBN)

I appreciate your time

CodePudding user response:

You can use

(?i)ISBN(?:-13)?\D*(\d(?:\W*\d){12})

See the regex demo. Then, remove all non-digits from Group 1 value.

Regex details:

  • (?i) - case insensitive modifier, same as re.I
  • ISBN - an ISBN string
  • (?:-13)? - an optional -13 string
  • \D* - zero or more non-digits
  • (\d(?:\W*\d){12}) - Group 1: a digit and then twelve occurrences of any zero or more non-word chars and then a digit.

See the Python demo:

import re
texts = ['ISBN-13: 978 1 4310 0862 9',
    'ISBN: 9781431008629',
    'ISBN9781431008629',
    'ISBN 9-78-1431-008-629',
    'ISBN: 9781431008629 more text of the number',
    'isbn : 9781431008629']
rx = re.compile(r'ISBN(?:-13)?\D*(\d(?:\W*\d){12})', re.I)
for text in texts:
    m = rx.search(text)
    if m:
        print(text, '=> ISBN:', ''.join([d for d in m.group(1) if d.isdigit()]))

Output:

ISBN-13: 978 1 4310 0862 9 => ISBN: 9781431008629
ISBN: 9781431008629 => ISBN: 9781431008629
ISBN9781431008629 => ISBN: 9781431008629
ISBN 9-78-1431-008-629 => ISBN: 9781431008629
ISBN: 9781431008629 more text of the number => ISBN: 9781431008629
isbn : 9781431008629 => ISBN: 9781431008629

CodePudding user response:

import re

text = "ISBN-13: 978 1 4310 0862 9" \
    "ISBN: 9781431008629" \
    "ISBN9781431008629" \
    "ISBN 9-78-1431-008-629" \
    "ISBN: 9781431008629" \
    "isbn : 9781431008629 "

myISBN = re.findall(r"ISBN:\s\d{13}", text)
print(myISBN)

Output:

['ISBN: 9781431008629', 'ISBN: 9781431008629']
  • \s : one whitespace.
  • \d{13}: exactly 13 digits.

CodePudding user response:

I'd split the problem to two steps. First to extract the potential ISBN and in the second step to check if the ISBN is correct (13 numbers):

import re

text = """\
ISBN-13: 978 1 4310 0862 9
ISBN: 9781431008629
ISBN9781431008629
ISBN 9-78-1431-008-629
ISBN: 9781431008629 more text of the number
isbn : 9781431008629"""

pat1 = re.compile(r"(?i)ISBN(?:-13)?\s*:?([ \d-] )")
pat2 = re.compile(r"\d ")

for m in pat1.findall(text):
    numbers = "".join(pat2.findall(m))
    if len(numbers) == 13:
        print("ISBN:", numbers)

Prints:

ISBN: 9781431008629
ISBN: 9781431008629
ISBN: 9781431008629
ISBN: 9781431008629
ISBN: 9781431008629
ISBN: 9781431008629
  • Related