I have a log files which I want to read to a dataframe, but there is no separator between two objects.
Country|ID|Item_IDCountry|ID|Item_IDCountry|ID|Item_IDCountry|ID|Item_ID
it is in this format where Country is strictly a 2 char string.
I'm trying to figure out how to do it in python as i'm still a beginner. Any help would be much appreciated
I tried read_csv but that was a fail, I tried to look for answers online but didnt find much
CodePudding user response:
The seperator in that format is |
, and assuming the name of the log file in question is logs.csv
:
import pandas
logs = []
with open("logs.csv") as f:
lines = f.readlines()
column_names = lines[0].rstrip('\n').split("|")
for l in lines[1:]:
logs.append(l.rstrip('\n').split("|"))
df = pandas.DataFrame(logs, columns=column_names)
print(df)
lines[0].rstrip('\n').split("|")
basically removes the new-line character from the first line and turns the column names (Country|ID|Item_IDCountry|ID|Item_IDCountry|ID|Item_IDCountry|ID|Item_ID
) into a list.
for l in lines[1:]:
iterates over all lines in the logs file, starting for the second line.