converting a key value text file into a CSV file-CodePudding

I have a text file that needs to be converted into CSV file using pandas. A piece of it is presented in the following:

time 00:15 min
    cod,10,1=0,2=2,3=2,4=1,5=6,6=4,7=2,8=7,9=1,10=9,11=7
    cod,18,1=27,2=18,3=19,4=20,5=47,6=2,7=2,8=0,9=33,10=61,11=13,12=2,13=3,14=0,15=0

Rows are cod,10, and cod,18 and the columns are 1, 2, 3,..., 15. Any idea? Regards, Ali

CodePudding user response：

I use pandas to deal with the conversion, but vanilla Python to deal with some of aspects of the data, I hope that is alright.

One issue we need to deal with is the fact that there are a different number of columns per row. So I just put NaN in columns that are missing for a row. For instance, row 1 is shorter than row 2, so the missing columns in row 1 are given values as "NaN".

Here is my idea:

import pandas as pd

lines = []
with open('/path/to/test.txt', 'r') as infile:
    for line in infile:
        if "," not in line:
            continue
        else:
            lines.append(line.strip().split(","))

row_names = []
column_data = {}

max_length = max(*[len(line) for line in lines])

for line in lines:
    while(len(line) < max_length):
        line.append(f'{len(line)-1}=NaN')

for line in lines:
    row_names.append(" ".join(line[:2]))
    for info in line[2:]:
        (k,v) = info.split("=")
        if k in column_data:
            column_data[k].append(v)
        else:
            column_data[k] = [v]

df = pd.DataFrame(column_data)
df.index = row_names
print(df)

df.to_csv('/path/to/test.csv')

Output (the printed DataFrame):

         1   2   3   4   5  6  7  8   9  10  11   12   13   14   15
cod 10   0   2   2   1   6  4  2  7   1   9   7  NaN  NaN  NaN  NaN
cod 18  27  18  19  20  47  2  2  0  33  61  13    2    3    0    0

CSV File Output:

,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
cod 10,0,2,2,1,6,4,2,7,1,9,7,NaN,NaN,NaN,NaN
cod 18,27,18,19,20,47,2,2,0,33,61,13,2,3,0,0

CodePudding user response：

You can use Python standard CSV module and its DictWriter class to handle the variability in the column names:

import csv

headers = {'Cod': None, 'Num': None}  # other column names will be added

all_rows = []
with open('input.txt') as f:
    next(f)  # discard first "time" line

    # Iterate text lines
    for line in f:
        line = line.strip()
        cod, num, *pairs = line.split(',')

        row = {'Cod': cod, 'Num': num}  # create new row, and fill it in...
        for pair in pairs:
            pair = pair.strip()
            col_name, value = pair.split('=')
            row[col_name] = value

            # Make sure headers has this column's name
            headers[col_name] = None

        all_rows.append(row)


with open('output.csv', 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=headers)
    writer.writeheader()
    writer.writerows(all_rows)

This gives me the following output:

Cod,Num,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
cod,10,0,2,2,1,6,4,2,7,1,9,7,,,,
cod,18,27,18,19,20,47,2,2,0,33,61,13,2,3,0,0

| Cod | Num | 1  | 2  | 3  | 4  | 5  | 6 | 7 | 8 | 9  | 10 | 11 | 12 | 13 | 14 | 15 |
|-----|-----|----|----|----|----|----|---|---|---|----|----|----|----|----|----|----|
| cod | 10  | 0  | 2  | 2  | 1  | 6  | 4 | 2 | 7 | 1  | 9  | 7  |    |    |    |    |
| cod | 18  | 27 | 18 | 19 | 20 | 47 | 2 | 2 | 0 | 33 | 61 | 13 | 2  | 3  | 0  | 0  |

If you need cod and num joined into a single column, say, id, make these changes:

headers = {'id': None}                      # <<< changed line
...

        cod, num, *pairs = line.split(',')
        id_ = cod   ' '   num               # <<< new line

        row = {'id': id_}                   # <<< changed line
        for pair in pairs:

and now I get:

id,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
cod 10,0,2,2,1,6,4,2,7,1,9,7,,,,
cod 18,27,18,19,20,47,2,2,0,33,61,13,2,3,0,0

| id     | 1  | 2  | 3  | 4  | 5  | 6 | 7 | 8 | 9  | 10 | 11 | 12 | 13 | 14 | 15 |
|--------|----|----|----|----|----|---|---|---|----|----|----|----|----|----|----|
| cod 10 | 0  | 2  | 2  | 1  | 6  | 4 | 2 | 7 | 1  | 9  | 7  |    |    |    |    |
| cod 18 | 27 | 18 | 19 | 20 | 47 | 2 | 2 | 0 | 33 | 61 | 13 | 2  | 3  | 0  | 0  |