Home > database >  Import CSV with every row containing the column headers
Import CSV with every row containing the column headers


I'm dealing with a csv that repeats its headers name within each rows:

player: John Doe ; level: 45 ; last_login: 7854414174 ; coins: 7600
player: Anckx Uj ; level: 471 ; last_login: 7854418847 ; coins: 684111

I'd like to know how I can only select the values when importing it using pandas so that the output looks like this:

Player       level       last_login       coins

John Doe      45          7854414174       7600
Anckx Uj      471         7854418847       684111

I tried adding the header parameter as I thought it would filter out the repeating in the rows, without success.

  • import pandas as pd df = pd.read_csv('base.txt', sep=';', header=None, names=['player', 'level', 'last_login', 'coins'] returns me exactly the same thing as the csv (without the delimiter)

*Any help would be appreciated

CodePudding user response:

One solution might be to can clean the rows after the loading:

df = df.apply(lambda x: x.str.replace(r"^[^:] :", "").str.strip())


     player level  last_login   coins
0  John Doe    45  7854414174    7600
1  Anckx Uj   471  7854418847  684111

And probably convert the level/coins columns to int:

df[["level", "coins"]] = df[["level", "coins"]].astype(int)

CodePudding user response:

A proposition using pandas.DataFrame.pivot :

df= pd.read_csv("base.txt", header=None, names=["col"])

out = (
        df["col"].str.extractall("(\w : \w \s?\w )")
                 .str.split(":", expand=True)
                 .assign(idx= lambda x: x.groupby(0).cumcount())
                 .pivot(index="idx", columns=0)

out.columns = out.columns.get_level_values(1)

# Output :


0    coins   last_login level     player
0     7600   7854414174    45   John Doe
1   684111   7854418847   471   Anckx Uj

CodePudding user response:

This seems like a row iteration problem, and I think the csv module makes this easy to understand and execute.

  1. Read the input file with the plain reader, which will give us a list of strings for each row.
  2. For each row:
    1. create the empty dict new_row
    2. iterate the columns and split on a colon (':') to get the header name and its value
      1. build up new_row with the name-value pairs
    3. append new_row to the list all_rows
  3. Use the DictWriter to convert all_rows into the final CSV

Here's the reading part:

import csv

all_rows = []
with open("input.csv", newline="") as f:
    reader = csv.reader(f, delimiter=";")
    for row in reader:
        new_row = {}

        # row looks like, ['player: John Doe ', ' level: 45 ', ' last_login: 7854414174 ', ' coins: 7600']
        for col in row:
            name, val = col.split(":", 1)
            new_row[name.strip()] = val.strip()



That gives us:

    {'player': 'John Doe', 'level': '45',  'last_login': '7854414174', 'coins': '7600'}, 
    {'player': 'Anckx Uj', 'level': '471', 'last_login': '7854418847', 'coins': '684111'},

From that, we can use the DictWriter, giving it the first row as a sample of the fieldnames it should expect to find and write:

with open("output.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, delimiter=";", fieldnames=all_rows[0])

Here's output.csv:

John Doe;45;7854414174;7600
Anckx Uj;471;7854418847;684111
  • Related