Pivot csv and keep key column without pandas-CodePudding

I'm trying to pivot a csv file while keeping a column (dont_pivot) that's not supposed to be pivoted. I have managed to pivot my two columns, but I'm struggling to retain the dont_pivot column. I would then like to output the result to a csv instead of a csv string (so unlike this example: Pivot a CSV string using python without using pandas or any similar library).

In a second step, I need to extract the number that sits between the two underscores from the dont_pivot column. This isn't a problem - it just means that these values are not unique.

The requirement is to only use the standard library.

Input:

dont_pivot,key,value
a_9_bc,x,1
a_9_bc,y,2
a_9_bc,z,3
a_9_bc,p,4
a_9_bc,q,5
b_9_bc,x,11
b_9_bc,y,21
b_9_bc,z,31
b_9_bc,p,41
b_9_bc,q,51

Desired output:

dont_pivot_num,x,y,z,p,q
a_9_bc,1,2,3,4,5
b_9_bc,11,21,31,41,51

I'm happy to then extract 9 and a/b in a second step instead of doing regex within my pivot code:

dont_pivot_letter,dont_pivot_num,x,y,z,p,q
a,9,1,2,3,4,5
a,9,11,21,31,41,51

Current output (as string, but I don't need one single string, but a csv file):

x,y,z,p,q
1,2,3,4,5
11,21,31,41,51

My code:

import csv
import re

with open("myfile.csv", "r") as f:
    content = csv.reader(f)
    next(content)

    #### dont_pivot_num ####
    dont_pivot_num = []
    lines = []

    for row in content:
        dont_pivot_num.append(re.search(r"(\d)", row[0]).group(1)) # Can be an extra step once I have my desired csv format
        dont_pivot_char.append(re.search(r"\b(\w)", row[0]).group(1)) # Can be an extra step once I have my desired csv format
        lines.append(",".join(row[1:]))

    lines = [l.replace(" ", "") for l in lines]


    #### Pivot csv file ####
    cols = ["x", "y", "z", "p", "q"]

    csvdata = {k: [] for k in cols}

    tempcols = list(cols)
    for line in lines:
        key, value = line.split(",")
        try:
            csvdata[key].append(value)
            tempcols.remove(key)
        except ValueError:
            for c in tempcols:  # now tempcols has only "missing" attributes
                csvdata[c].append("")
            tempcols = [c for c in cols if c != key]
    for c in tempcols:
        csvdata[c].append("")

    # Instead of doing this, I'd like to combine dont_pivot_num with csvdata and write individual rows to a csv file 
    csvfile = ""
    csvfile  = ",".join(csvdata.keys())   "\n"
    # print(csvfile)
    for row in zip(*csvdata.values()):
        csvfile  = ",".join(row)   "\n"
    print(csv)

CodePudding user response：

I simply created a separate list with unique values from the dont_pivot column, extracted the desired values with regex and added it to the dict before writing it all to a csv file.

import csv
import re

with open("myfile.csv", "r") as f:
    content = csv.reader(f)
    next(content)

    #### dont_pivot_num ####
    lines = []
    dont_pivot = []

    for row in content:
        dont_pivot.append(row[0])
        lines.append(",".join(row[1:]))

    lines = [l.replace(" ", "") for l in lines]
    dont_pivot_unique = list(dict.fromkeys(dont_pivot))
    dont_pivot_num = []
    dont_pivot_letter = []

    for a in dont_pivot_unique:
        dont_pivot_num.append(re.search(r"(\d)", a).group(1))
        dont_pivot_letter.append(re.search(r"\b(\w)", a).group())

    #### Pivot csv file ####
    cols = ["x", "y", "z", "p", "q"]

    csvdata = {k: [] for k in cols}

    tempcols = list(cols)
    for line in lines:
        key, value = line.split(",")
        try:
            csvdata[key].append(value)
            tempcols.remove(key)
        except ValueError:
            for c in tempcols:  # now tempcols has only "missing" attributes
                csvdata[c].append("")
            tempcols = [c for c in cols if c != key]
    for c in tempcols:
        csvdata[c].append("")

    csvdata["dont_pivot_num"] = dont_pivot_num
    csvdata["dont_pivot_letter"] = dont_pivot_letter
    print(csvdata)
    
    with open("csvfile_out.csv", "w") as f: 
        w = csv.writer(f)
        w.writerow(csvdata.keys())
        w.writerows((zip(*csvdata.values())))

CodePudding user response：

You could use a defaultdict to build up the values for each entry:

from collections import defaultdict
import csv

entries = defaultdict(list)
keys = {}

with open('myfile.csv') as f_input:
    csv_input = csv.reader(f_input)
    header = next(csv_input)
    
    for row in csv_input:
        entries[row[0]].append(row[2])
        keys[row[1]] = None
        
with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['dont_pivot_num', *keys.keys()])

    for key, values in entries.items():
        csv_output.writerow([key, *values])