I have a CSV file that has been converted from a dict - witch has added Numbers to get the columns unique. Now I want to remove the numbers.
Example of CSV headers now:
User,Comment,Comment1,Comment2,Created
I would like to use something like this:
header = ''.join([i for i in header if not i.isdigit()])
I uses this code to create the CSV without any empty columns.
with open('input.csv') as fd, open('output.csv', 'w', newline='') as fdout:
rd = csv.reader(fd)
cols = set()
_ = next(rd) # skip header line
for row in rd:
for i, val in enumerate(row):
if val != '':
cols.add(i)
_ = fd.seek(io.SEEK_SET)
wr = csv.writer(fdout)
for row in rd:
_ = wr.writerow(val for i, val in enumerate(row) if i in cols)
And my desired output are the same file but headers like this:
User,Comment,Comment,Comment,Created
CodePudding user response:
Sounds like
import csv
import re
with open("input.csv") as fd, open("output.csv", "w", newline="") as fdout:
rd = csv.reader(fd)
wr = csv.writer(fdout)
# read header line, remove trailing numbers from each column
header = [re.sub(r"\d $", "", col) for col in next(rd)]
# write processed header out
wr.writerow(header)
for row in rd: # just copy other lines
wr.writerow(row)
should do the trick, then.
CodePudding user response:
You can make use of Python's itemgetter()
which can be used to extract just the required values from each row.
If your header only contains numbers at the
start or end, a simple approach is to use .strip('0123456789')
to remove them.
from operator import itemgetter
import csv
import io
with open('input.csv') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
header = next(csv_input)
# Test the first row for empty values
first_row = next(csv_input)
required_cols = itemgetter(*(index for index, value in enumerate(first_row) if value))
f_input.seek(io.SEEK_SET) # restart
header = next(csv_input)
csv_output.writerow(value.strip('0123456789') for value in required_cols(header))
for row in csv_input:
csv_output.writerow(required_cols(row))
For example, if the CSV contained the following:
User,Comment,Comment1,Comment2,Created
Fred,a,b,,11-02-2022
Wilma,a,b,,11-02-2022
The Comment2
column would be completely removed and also the digits from the header giving:
User,Comment,Comment,Created
Fred,a,b,11-02-2022
Wilma,a,b,11-02-2022