I'm trying to pivot a csv file while keeping a column (dont_pivot) that's not supposed to be pivoted. I have managed to pivot my two columns, but I'm struggling to retain the dont_pivot column. I would then like to output the result to a csv instead of a csv string (so unlike this example: Pivot a CSV string using python without using pandas or any similar library).
In a second step, I need to extract the number that sits between the two underscores from the dont_pivot column. This isn't a problem - it just means that these values are not unique.
The requirement is to only use the standard library.
Input:
dont_pivot,key,value
a_9_bc,x,1
a_9_bc,y,2
a_9_bc,z,3
a_9_bc,p,4
a_9_bc,q,5
b_9_bc,x,11
b_9_bc,y,21
b_9_bc,z,31
b_9_bc,p,41
b_9_bc,q,51
Desired output:
dont_pivot_num,x,y,z,p,q
a_9_bc,1,2,3,4,5
b_9_bc,11,21,31,41,51
I'm happy to then extract 9 and a/b in a second step instead of doing regex within my pivot code:
dont_pivot_letter,dont_pivot_num,x,y,z,p,q
a,9,1,2,3,4,5
a,9,11,21,31,41,51
Current output (as string, but I don't need one single string, but a csv file):
x,y,z,p,q
1,2,3,4,5
11,21,31,41,51
My code:
import csv
import re
with open("myfile.csv", "r") as f:
content = csv.reader(f)
next(content)
#### dont_pivot_num ####
dont_pivot_num = []
lines = []
for row in content:
dont_pivot_num.append(re.search(r"(\d)", row[0]).group(1)) # Can be an extra step once I have my desired csv format
dont_pivot_char.append(re.search(r"\b(\w)", row[0]).group(1)) # Can be an extra step once I have my desired csv format
lines.append(",".join(row[1:]))
lines = [l.replace(" ", "") for l in lines]
#### Pivot csv file ####
cols = ["x", "y", "z", "p", "q"]
csvdata = {k: [] for k in cols}
tempcols = list(cols)
for line in lines:
key, value = line.split(",")
try:
csvdata[key].append(value)
tempcols.remove(key)
except ValueError:
for c in tempcols: # now tempcols has only "missing" attributes
csvdata[c].append("")
tempcols = [c for c in cols if c != key]
for c in tempcols:
csvdata[c].append("")
# Instead of doing this, I'd like to combine dont_pivot_num with csvdata and write individual rows to a csv file
csvfile = ""
csvfile = ",".join(csvdata.keys()) "\n"
# print(csvfile)
for row in zip(*csvdata.values()):
csvfile = ",".join(row) "\n"
print(csv)
CodePudding user response:
I simply created a separate list with unique values from the dont_pivot column, extracted the desired values with regex and added it to the dict before writing it all to a csv file.
import csv
import re
with open("myfile.csv", "r") as f:
content = csv.reader(f)
next(content)
#### dont_pivot_num ####
lines = []
dont_pivot = []
for row in content:
dont_pivot.append(row[0])
lines.append(",".join(row[1:]))
lines = [l.replace(" ", "") for l in lines]
dont_pivot_unique = list(dict.fromkeys(dont_pivot))
dont_pivot_num = []
dont_pivot_letter = []
for a in dont_pivot_unique:
dont_pivot_num.append(re.search(r"(\d)", a).group(1))
dont_pivot_letter.append(re.search(r"\b(\w)", a).group())
#### Pivot csv file ####
cols = ["x", "y", "z", "p", "q"]
csvdata = {k: [] for k in cols}
tempcols = list(cols)
for line in lines:
key, value = line.split(",")
try:
csvdata[key].append(value)
tempcols.remove(key)
except ValueError:
for c in tempcols: # now tempcols has only "missing" attributes
csvdata[c].append("")
tempcols = [c for c in cols if c != key]
for c in tempcols:
csvdata[c].append("")
csvdata["dont_pivot_num"] = dont_pivot_num
csvdata["dont_pivot_letter"] = dont_pivot_letter
print(csvdata)
with open("csvfile_out.csv", "w") as f:
w = csv.writer(f)
w.writerow(csvdata.keys())
w.writerows((zip(*csvdata.values())))
CodePudding user response:
You could use a defaultdict
to build up the values for each entry:
from collections import defaultdict
import csv
entries = defaultdict(list)
keys = {}
with open('myfile.csv') as f_input:
csv_input = csv.reader(f_input)
header = next(csv_input)
for row in csv_input:
entries[row[0]].append(row[2])
keys[row[1]] = None
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['dont_pivot_num', *keys.keys()])
for key, values in entries.items():
csv_output.writerow([key, *values])