Home > Software design >  Remove digits in CSV headers
Remove digits in CSV headers

Time:02-12

I have a CSV file that has been converted from a dict - witch has added Numbers to get the columns unique. Now I want to remove the numbers.

Example of CSV headers now:

User,Comment,Comment1,Comment2,Created

I would like to use something like this:

header = ''.join([i for i in header if not i.isdigit()])

I uses this code to create the CSV without any empty columns.

with open('input.csv') as fd, open('output.csv', 'w', newline='') as fdout:
    rd = csv.reader(fd)
    cols = set()
    _ = next(rd)  # skip header line
    for row in rd:
        for i, val in enumerate(row):
            if val != '':
                cols.add(i)
    _ = fd.seek(io.SEEK_SET)
    wr = csv.writer(fdout)
    for row in rd:
        _ = wr.writerow(val for i, val in enumerate(row) if i in cols)

And my desired output are the same file but headers like this:

User,Comment,Comment,Comment,Created

CodePudding user response:

Sounds like

import csv
import re

with open("input.csv") as fd, open("output.csv", "w", newline="") as fdout:
    rd = csv.reader(fd)
    wr = csv.writer(fdout)
    # read header line, remove trailing numbers from each column
    header = [re.sub(r"\d $", "", col) for col in next(rd)]
    # write processed header out
    wr.writerow(header)
    for row in rd:  # just copy other lines
        wr.writerow(row)

should do the trick, then.

CodePudding user response:

You can make use of Python's itemgetter() which can be used to extract just the required values from each row.

If your header only contains numbers at the start or end, a simple approach is to use .strip('0123456789') to remove them.

from operator import itemgetter
import csv
import io

with open('input.csv') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)
    header = next(csv_input)

    # Test the first row for empty values
    first_row = next(csv_input)
    required_cols = itemgetter(*(index for index, value in enumerate(first_row) if value))
    
    f_input.seek(io.SEEK_SET)   # restart
    header = next(csv_input)
    csv_output.writerow(value.strip('0123456789') for value in required_cols(header))
    
    for row in csv_input:
        csv_output.writerow(required_cols(row))

For example, if the CSV contained the following:

User,Comment,Comment1,Comment2,Created
Fred,a,b,,11-02-2022
Wilma,a,b,,11-02-2022

The Comment2 column would be completely removed and also the digits from the header giving:

User,Comment,Comment,Created
Fred,a,b,11-02-2022
Wilma,a,b,11-02-2022
  • Related