Splitting rows in csv with comma separated values-CodePudding

I have csv file, which information (id and text) in column's looks like in example below:

1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3, 
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys

My desired output is to transfer ID to one row and relate it to its text (it's for database). Because of csv file is really big, iam giving you just fraction to understand what i want:

| ID             | Features   
 ---------------- -------------
| 1              | Šildomos grindys
| 2              | Šildomos grindys
| 2              | Rekuperacinė sistema
| 3              | null
| 4              | Skalbimo mašina
| 4              | Su baldais
| 4              | Šaldytuvas
| 4              | Šildomos grindys

How can i do that via python ? Thanks !

CodePudding user response：

Here is a way to do what you've asked:

with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    for row in rows:
        for i in range(1, len(row)):
            records.append([row[0], row[i] if row[i] else 'null'])
    with open('outfoo.txt', 'w', encoding="utf-8") as g:
        g.write('ID,Features\n')
        for record in records:
            g.write(f'{",".join(field for field in record)}\n')

# check the output file:
with open('outfoo.txt', 'r', encoding="utf-8") as f:
    print('contents of output file:')
    [print(row.strip('\n')) for row in f.readlines()]

Output:

contents of output file:
ID,Features
1,Šildomos grindys
2,Šildomos grindys
2,Rekuperacinė sistema
3,null
4,Skalbimo mašina
4,Su baldais
4,Šaldytuvas
4,Šildomos grindys

UPDATE:

An alternative approach would be to look at using pandas (docs). Pandas provides many powerful ways to work with tabular data, but it also has a bit of a learning curve:

import pandas as pd
with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    df = pd.DataFrame([[row[0], row[1:]] for row in rows], columns=['ID', 'Feature'])
    print('Dataframe read from input file:'); print(df)
    df = df.explode('Feature').reset_index(drop=True)
    print('Dataframe with one Feature per row:'); print(df)
    df.to_csv('outfoo.txt', index = False)

    # check the output file:
    df2 = pd.read_csv('outfoo.txt')
    print('Dataframe re-read from output file:'); print(df2)

Output

Dataframe read from input file:
  ID                                            Feature
0  1                                 [Šildomos grindys]
1  2           [Šildomos grindys, Rekuperacinė sistema]
2  3                                                 []
3  4  [Skalbimo mašina, Su baldais, Šaldytuvas, Šild...
Dataframe with one Feature per row:
  ID               Feature
0  1      Šildomos grindys
1  2      Šildomos grindys
2  2  Rekuperacinė sistema
3  3
4  4       Skalbimo mašina
5  4            Su baldais
6  4            Šaldytuvas
7  4      Šildomos grindys
Dataframe re-read from output file:
   ID               Feature
0   1      Šildomos grindys
1   2      Šildomos grindys
2   2  Rekuperacinė sistema
3   3                   NaN
4   4       Skalbimo mašina
5   4            Su baldais
6   4            Šaldytuvas
7   4      Šildomos grindys

Links to docs for pandas are here:

CodePudding user response：

You can read the CSV and then append each row to a new list.

data.csv:

1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3,
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys

Code:

import csv

data = []
with open('data.csv') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        for i in range(1, len(row)):
            value = row[i].strip()
            if value == "":
                value = "null"
            data.append([int(row[0]), value])
print(data)

Output:

[[1, 'Šildomos grindys'], [2, 'Šildomos grindys'], [2, 'Rekuperacinė sistema'], [3, 'null'], [4, 'Skalbimo mašina'], [4, 'Su baldais'], [4, 'Šaldytuvas'], [4, 'Šildomos grindys']]

References:

Python documentation on CSV module

CodePudding user response：

actually, I don't know what's the tool you used but if you work on power BI you can make it easily just right click on the row header and select split column then select the delimiter and finally click ok

I wish that's be helpful for you

good luck