Home > database >  Find and convert scripture reference to wiki link
Find and convert scripture reference to wiki link

Time:07-25

I use a Bible inside of Obsidian that works fine when linking to one verse at a time. However, I still need to be able to link to ranges of verses. I decided to try to write a Python script that finds scripture references, and when it sees a "-" or " " it writes out every verse of said range as it’s own link.

Example: An Obsidian file (let’s say Scripture reference test.md) would have text in it like: “In [[Gen 1:1-4]], we can draw out fundamental principles which can also be seen in the great battle between Christ and Satan, as well as the salvation of man. See more at [[The Great Controversy]].” The string “Gen” is a part of a list which the script recognizes and acts upon, whereas “[[The Great Controversy]]” is not. The script makes a new file with the name of the reference (Gen 1:1-4) - which correlates to the link in Scripture reference test.md . Then, it takes the verse range 1-4 and prints out an appropriate list of embed links to that new file (represented by rtemp.md in the script) so that, all scriptures in the given range are seen when the link is previewed in Obsidian. This method also doesn’t touch the notes containing the actual scriptures being linked to.

![[Gen 1#1|Gen 1:1]]
![[Gen 1#2|Gen 1:2]]
![[Gen 1#3|Gen 1:3]]
![[Gen 1#4|Gen 1:4]]

Example 2: "Gen 1:1-2 5" would generate:

![[Gen 1#1|Gen 1:1]]
![[Gen 1#2|Gen 1:2]]
![[Gen 1#5|Gen 1:5]]

In the new file.

I’m new to Python (teaching myself in my spare time). Besides my code not doing anything (no errors, prints, or writing), I’m sure there is a more efficient way to go about writing this script. What am I missing or doing wrong?

import os
import re
from itertools import groupby

efile = "Scripture reference test.md"

tfile = "srtemp.md"

dirName = '/private/var/mobile/Library/Mobile Documents/iCloud~md~obsidian/Documents/Obsidian-vault/The Bible (KJV)';

chRefs = ["Gen ", "Exod ", "Lev ", "Num ", "Deut ", "Josh ", "Judg ", "Ruth ", "1Sam ", "2Sam ", "1Kings ", "2Kings ", "1Chron ", "2Chron ", "Ezr ", "Neh ", "Esth ", "Job ", "Ps ", "Prov ", "Eccless ", "Song ", "Isa ", "Jer ", "Lam ", "Ezek ", "Dan ", "Hos ", "Joel ", "Amos ", "Obad ", "Jonah ", "Micah ", "Nah ", "Hab ", "Zeph ", "Hag ", "Zech ", "Mal ", "Matt ", "Mark ", "Luke ", "John ", "Acts ", "Rom ", "1Cor ", "2Cor ", "Gal ", "Ephes ", "Phil ", "Col ", "1Thess ", "2Thess ", "1Tim ", "2Tim ", "Phil ", "Heb ", "James ", "1Pet ", "2Pet ", "1John ", "2John ", "3John ", "Jude ", "Rev "]
#for chRef in chRefs:
    #pass

#check for referrences:
def rangeInt():
    with open(tfile, 'w') as wf:
        with open(efile, 'r ') as ef:
            efl = ef.readlines()
            for line in efl:
                if any(chRef in line for chRef in chRefs):
                    m = re.search('\[\[(. ?)]]', line)
                    if m:
                        foundRef = m.group(1)   "\n"

                        ref = foundRef.split(' ')
                        book = ref[0]
                        rnumsL1 = ref[1]
                        rnumsL1.split(':')
                        chapter = rnumsL1[0]
                        rnumsL2 = rnumsL1[1]
                        if '-' in rnumsL2:
                            if rnumsL2.find('-') < rnumsL2.find(' '):
                                rnumsL2.split('-',1)
                                firstVerse = rnumsL2[0]
                                ad1 = rnumsL2[1]
                                if ' ' in rnumsL2:
                                    ad1.split(' ',1)
                                    aP1 = ad1[1]
                                    if '-' in aP1:
                                        aP1.split('-',1)
                                        ad2 = aP1[1]
                                        sfv = aP1[0]
                                    else:
                                        lastVerse = aP1
                                else: lastVerse = ad1

                            elif rnumsL2.find('-') > rnumsL2.find(' ') or ' ' in rnumsL2 and '-' not in rnumsL2:
                                #check this area if problem:
                                ad1.split(' ',1)
                                secfv = ad1[1]
                                if '-' in rnumsL3:
                                    ad1.split('-',1)
                                    aP1 = rnumsL3[1]
                                else:
                                    lastVerse = ad1[1]
                            else: lastVerse = rnumsL2[1]
                        else:
                            lastVerse = rnumsL2[0]

                def rangeConv():
                    if book != '' and chapter != '' and lastVerse > rnumsL2:
                        if rnumsL2[1] != '':
                            for n in foundRef:
                                if n < lastVerse:
                                    print('![['   book   ' '   chapter   '#'   n   '|'   book   ' '   chapter   ':'   n   ']]\n')

                                    wf.write('[['   book   ' '   chapter   '#'   n   '|'   book   ' '   chapter   ':'   n   ']]\n')
                        else:
                            print('only one verse given')
                rangeConv()

rangeInt()

Also, instead of only being able to handle a scripture reference with up to 2 "-", or 2 " ", I would prefer the scripture to be able to recognize and write verses for an unlimited amount of "-" and " ".

CodePudding user response:

If your input files look like this:

[[Gen 1:1-5]] shows the first day of creation. Learn more about the symbolic significances [[here]].
Or have a quick read of [[Mark 5:6-13 16]] for something completely different.

And you're looking for output like this:

[[Gen 1:1]]
..             (continued, .. for brevity)
[[Gen 1:5]]
[[Mark 5:6]]
..
[[Mark 5:13]]
[[Mark 5:16]]

Then your script could be as simple as:

import re

ch_refs = ["Gen ", "Exod ", "Lev ", "Num ", "Deut ", "Josh ", "Judg ", "Ruth ", "1Sam ", "2Sam ", "1Kings ", "2Kings ", "1Chron ", "2Chron ", "Ezr ", "Neh ", "Esth ", "Job ", "Ps ", "Prov ", "Eccless ", "Song ", "Isa ", "Jer ", "Lam ", "Ezek ", "Dan ", "Hos ", "Joel ", "Amos ", "Obad ", "Jonah ", "Micah ", "Nah ", "Hab ", "Zeph ", "Hag ", "Zech ", "Mal ", "Matt ", "Mark ", "Luke ", "John ", "Acts ", "Rom ", "1Cor ", "2Cor ", "Gal ", "Ephes ", "Phil ", "Col ", "1Thess ", "2Thess ", "1Tim ", "2Tim ", "Phil ", "Heb ", "James ", "1Pet ", "2Pet ", "1John ", "2John ", "3John ", "Jude ", "Rev "]

with open('test.txt') as f_in:
    with open('out.txt', 'w') as f_out:
        books = "|".join(ch_refs)
        for match in re.findall(rf'\[\[({books})(\d ):(\d )(?:-(\d ))?(?:\ (\d ))?]]', f_in.read()):
            print(match)  # just printing this to show on screen what's happening
            b, c, v1, v2, ve = match
            for v in range(int(v1), int(v2) 1 if v2 else int(v1) 1):
                f_out.write(f'[[{b} {c}:{v}]]\n')
            if ve:
                f_out.write(f'[[{b} {c}:{ve}]]\n')

Writes the file as specified and shows on screen:

('Gen ', '1', '1', '5', '')
('Mark ', '5', '6', '13', '16')

Note that this doesn't deal with additional possible formats, like you asked about in the comments. I don't think "indefinitely long" is what you'd be going for here, but you might give a few grammar rules, like:

  • The format is always Book Chapter:Verses
  • Book is one from a select set of capitalised words
  • Chapter is a positive integer
  • Verses is Range, which may be followed by any number of additional Range
  • Range is either Verse or Verse-Verse
  • Verse is a positive integer
  • (space), - and are symbols

That set of rules covers all the examples you gave, but also Mark 5:2-4 10 11-14, for example.

There's many ways to parse something like that out of a text, but regular expressions cannot do all the work in one go anymore. They can still match the part of the string that has all the verses, and by then we'll know for sure it's correctly structured - so we can use .split to loop over the parts:

import re

ch_refs = ["Gen ", "Exod ", "Lev ", "Num ", "Deut ", "Josh ", "Judg ", "Ruth ", "1Sam ", "2Sam ", "1Kings ", "2Kings ", "1Chron ", "2Chron ", "Ezr ", "Neh ", "Esth ", "Job ", "Ps ", "Prov ", "Eccless ", "Song ", "Isa ", "Jer ", "Lam ", "Ezek ", "Dan ", "Hos ", "Joel ", "Amos ", "Obad ", "Jonah ", "Micah ", "Nah ", "Hab ", "Zeph ", "Hag ", "Zech ", "Mal ", "Matt ", "Mark ", "Luke ", "John ", "Acts ", "Rom ", "1Cor ", "2Cor ", "Gal ", "Ephes ", "Phil ", "Col ", "1Thess ", "2Thess ", "1Tim ", "2Tim ", "Phil ", "Heb ", "James ", "1Pet ", "2Pet ", "1John ", "2John ", "3John ", "Jude ", "Rev "]

with open('test.txt') as f_in:
    with open('out.txt', 'w') as f_out:
        books = "|".join(ch_refs)
        for match in re.findall(rf'\[\[({books})(\d ):(\d (?:-\d )?(?:\ \d (?:-\d )?)*)]]', f_in.read()):
            print(match)  # just printing this to show on screen what's happening
            b, c, all_verses = match
            for verses in all_verses.split(' '):
                vs = verses.split('-')
                for v in range(int(vs[0]), int(vs[1]) 1 if len(vs) > 1 else int(vs[0]) 1):
                    f_out.write(f'[[{b} {c}:{v}]]\n')

Prints:

('Gen ', '1', '1-5')
('Mark ', '5', '6-13 16')

Have a look at https://regex101.com if you're wondering about those regular expressions. Just note that {books} is not part of it, it gets replaced with Gen |Exod |Lev (etc., the value of books) so that it matches one of them for the first group. But literally writing it out in the regex makes it entirely unreadable.

Edit: note that the code above doesn't catch mistakes in the source text, like [[Gen 1:10-8]] (wrong order) or [[Gen 1:1-100]] (non-existent verse). It also doesn't account for something like [[Gen 1:1-5, 2:2-4]] - but from the solution above, you should be able to see that adding that is just more of the same.

  • Related