Home > Back-end >  Find nth occurrence of a character in a row and replace it in Python, in order to fix/replace a wron
Find nth occurrence of a character in a row and replace it in Python, in order to fix/replace a wron

Time:05-07

I would like to import a text file, with 14 columns with the symbol ";" as a column separator. So, if we have 14 columns, it means we'll find 13 ";" separators in every line/row.

The problem is, as you can see below, there is a row, with a ";" character inside of one of the columns. It makes this line to have 14 ";" separators (translated to 15 columns). enter image description here

If we open the file in Excel, we can see this wrong "extra" column there: enter image description here

Trying to read this file with "read_csv" pandas function, it raises an error telling that you have 15 columns (instead of 14). enter image description here

So, before using the read_csv function I would like to open the file, find the rows with 14 ";" separators in the line, modify/replace the 13th ocurrence of the ";" symbol with a "." and writing back in another text file.

As an string is immutable in Python, theoretically, I don't know how to continue with my Python script. Doubting if the solution is appending in a list or pasting text. By the way, this code will be part of a final script, because this problem will happen more times in the future.

# Creating an empty list to store all the lines from the file will be readed.
listoflines = []

# Opening the file.txt in only "reading" model
with open(r"D:\file.txt", mode='r') as reader:
    for line in reader.readlines():
        listoflines.append(line)

# Creating an empty list to store all the lines including the modified lines.
finaldocument = []
# Detecting the row(s) where the count of the ";" symbol is equal to 14.
for row in listoflines:
    if row.count(';') == 14:
        # Creating a counter.
        n=0
        # Iterating over the letters of the row
        for letter in row:
            if letter == ';':
                n =1
                # We need to detect the 13rd ";" symbol of the row.
                if n==13:
                    print(letter)
                    letter = letter.replace(';','.')
                    print(letter)
                    
    ## HOW TO APPEND/REPLACE THE MODIFIED LETTER TO THE ROW???? ##
      
    # Storing, again, all the lines (including modified lines) to a final object.
    finaldocument.append(row)

print(finaldocument)

CodePudding user response:

You should not replace that semi-colon with another character, but instead wrap the string with double quotes (and escape any double quotes that might already be part of the string by doubling them). This is how in CSV syntax you can include such delimiters. In fact, it is not bad practice to wrap such text with double quotes always.

Here is how I'd adapt your loop:

for row in listoflines:
    if row.count(';') > 13:  # maybe there are even two or more of them
        cells = row.split(';')
        # Escape double quotes and wrap in double quotes
        s = '"'   ';'.join(cells[12:-1]).replace('"', '""')   '"'
        # Replace the involved parts with that single part
        cells[12:-1] = [s]
        # ...and convert back to the string format
        row = ';'.join(cells)
      
    finaldocument.append(row)

CodePudding user response:

Instead of doing this:

if row.count(';') == 14:
    # Creating a counter.
    n=0
    # Iterating over the letters of the row
    for letter in row:
        if letter == ';':
            n =1
            # We need to detect the 13rd ";" symbol of the row.
            if n==13:
                print(letter)
                letter = letter.replace(';','.')
                print(letter)

Do this : row = row.split(";") this will convert row to a list, if you only need len 14 you can add this: row = row[:14]

  • Related