Home > Mobile >  Python: Join two elements of an array at a specific character and then remove the character
Python: Join two elements of an array at a specific character and then remove the character

Time:07-23

I'm writing a program to convert html to markdown. Unfortunately, the exporter retains the newline characters, "\n", in the html, which means that links are broken in the markdown.

My markdown files are saved in an array like this...

array = ['I am a markdown file. I have l', 'inks contained in me that are ', 'broken like this:[](www.brok\n','enlink.org). This means that a', 'newline is contained in the li', 'nk and the link does not work.']

...where each element of the array is a line of text.

Using Python, I'd like to remove the newline character ("\n") within the link and join the two elements of the array (the element of the array that had the newline and the element directly following it).

I do NOT want to join elements of the array that are not ended in a newline character.

There are a few similar answers on StackOverflow (see this one) but nothing I can find that's comparable to my problem. Any suggestions? I am very new to Python.

CodePudding user response:

You can use a for loop and iterate through the array elements, adding them into the correct spot in an output variable:

array = ['I am a markdown file. I have l', 'inks contained in me that are ', 'broken like this:[](www.brok\n','enlink.org). This means that a', 'newline is contained in the li', 'nk and the link does not work.']

# Initialize result as a list with its only value being the first value of array
result = [array[0]] 

for line in array[1:]: # Iterate through elements of array, starting from the second one
    
    # If the last character of the most recent element in result is "\n"
    if result[-1][-1] == "\n": 
    
        # Then set the most recent value of result to be that value except for the last character (so we remove the "\n) along with the next value in the array.
        result[-1] = result[-1][:-1]   line
    
    # Otherwise just add the next value to the output.
    else: result.append(line)

    
print(result)

Output:

['I am a markdown file. I have l', 'inks contained in me that are ', 'broken like this:[](www.brokenlink.org). This means that a', 'newline is contained in the li', 'nk and the link does not work.']

CodePudding user response:

Python's built-in string function 'replace' should fix the link, and join can do the rest.

Use it like so:

array = ['I am a markdown file. I have l', 'inks contained in me that are ', 'broken like this:[](www.brok\n','enlink.org). This means that a', 'newline is contained in the li', 'nk and the link does not work.']

# Join array
fixedText = "".join(array);

# Replace newline character
fixedText = fixedText.replace('\n','')

Update

To only join the string broken by a newline character, you can iterate through the array and build a new array with if else statements:

array = ['I am a markdown file. I have l', 'inks contained in me that are ', 'broken like this:[](www.brok\n','enlink.org). This means that a', 'newline is contained in the li', 'nk and the link does not work.']

newArray = [];

# 
# Iterate through the array and grab each string.
# check if there is a newline character in the string.
# 
# If so, replace it and join the next string, then delete the next string, 
# and finally, add it to the new array.
#
# Else, add to the new array
#

for i, s in enumerate(array):
    if "\n" in s:
        newArray.append(s.replace('\n','')   array[i 1])
        del array[i 1]
    else:
        newArray.append(s)


I hope this helps!

  • Related