I have a dataset of energy consumption for several households. The dataset is stored in .txt files and I can read them easily in Python. But the header is stored in another file. The extension of this file is .HEAD
So for each building, I have something like this:
processed-H01-Accounts-3-31-power.HEAD
processed-H01-Accounts-3-31-power-CLEAN.txt
Inside the file it looks like this:
# Created by Octave 3.8.0, Tue Jul 29 13:44:58 2014 BST
# name: h
# type: sq_string
# elements: 1
# length: 3051
timestamp timestampWithDST "Loughborough03,LBORO-SMART-020,00-0D-6F-00-00-F8-5C-A1,Freezer(Kitchen/utility room,Downstairs) / No of plugs(Landing,Upstairs)" "Loughborough03,LBORO-SMART-032,00-0D-6F-00-00-F9-2C-9D,Fridge(Kitchen/utility room,Downstairs) / FridgeFreezer(Kitchen/utility room,Downstairs)" "Loughborough03,LBORO-SMART-033,00-0D-6F-00-00-F9-2D-31,Battery Charger(Garage/Shed,Downstairs)" "Loughborough03,LBORO-SMART-022,00-0D-6F-00-00-F9-2C-D5,Toaster(Kitchen/utility room,Downstairs)" "Loughborough03,LBORO-SMART-027,00-0D-6F-00-00-F8-9F-32,Lamp 1(Bedroom 2,Upstairs)" "Loughborough03,LBORO-SMART-035,00-0D-6F-00-00-F8-5C-07,Computing Equipment(Bedroom 4,Upstairs)" "Loughborough03,LBORO-SMART-021,00-0D-6F-00-00-F8-5B-FA,Microwave(Kitchen/utility room,Downstairs)" "Loughborough03,LBORO-SMART-029,00-0D-6F-00-00-F8-BE-1B,TV(Back Room,Downstairs) / Cable Decoder(Back Room,Downstairs) / Stereo(Back Room,Downstairs)" "Loughborough03,LBORO-SMART-016,00-0D-6F-00-00-F9-2B-C6,Computing Equipment / Laptop(Front Room,Downstairs)" "Loughborough03,LBORO-MET-010,00-0D-6F-00-00-C1-43-06,Small Power Down" "Loughborough03,LBORO-SMART-034,00-0D-6F-00-00-F8-BE-33,Dishwasher(Kitchen/utility room,Downstairs)" "Loughborough03,LBORO-MET-008,00-0D-6F-00-00-C1-35-E1,Mains 1"
The last row is the column name of my dataset. I need to read these file and put them together in Python to do my modelling. IS there a way to convert this file format in python to a list?
Thanks
CodePudding user response:
Convert the .HEAD file to a list by splitting the last row by the tab character '\t'
with open('processed-H01-Accounts-3-31-power.HEAD', 'r') as f:
lines = f.readlines()
column_names = lines[-1].split('\t')
print(column_names)
CodePudding user response:
In your case, because you are only interested in the last line of the file, you can first go to the end of the file and ONLY read the last line, then split the line based on the tab character:
with open('processed-H01-Accounts-3-31-power.HEAD', 'r') as f:
# Move the file pointer to the end of the file
f.seek(0, 2)
# Read the last line of the file
last_line = f.readline()
# Split the line by the tab character
columns = last_line.split('\t')
print(column_names)