our electricity provider think it could be very fun to make difficult to read csv files they provide.
This is precise electric consumption, every 30 min but in the SAME column you have hours, and date, example :
19/11/2022
00:00:00 2098
23:30:00 218
23:00:00 606
[...]
01:30:00 216
01:00:00 2672
00:30:00 2816
18/11/2022
00:00:00 1994
23:30:00 260
23:00:00 732
[...]
01:30:00 200
01:00:00 1378
00:30:00 2520
17/11/2022
00:00:00 1830
23:30:00 96
23:00:00 122
[...]
01:30:00 694
01:00:00 2950
00:30:00 3062
16/11/2022
00:00:00 2420
23:30:00 678
23:00:00 644
[...]
How damn can I obtain this kind of lovely formated file :
2022-11-19 00:00:00 2098
2022-11-19 23:30:00 218
2022-11-19 23:00:00 606
etc.
CodePudding user response:
Try:
import pandas as pd
current_date = None
all_data = []
with open("your_file.txt", "r") as f_in:
for row in map(str.strip, f_in):
if row == "":
continue
if "/" in row:
current_date = row
else:
all_data.append([current_date, *row.split()])
df = pd.DataFrame(all_data, columns=["Date", "Time", "Value"])
print(df)
Prints:
Date Time Value
0 19/11/2022 00:00:00 2098
1 19/11/2022 23:30:00 218
2 19/11/2022 23:00:00 606
3 19/11/2022 01:30:00 216
4 19/11/2022 01:00:00 2672
5 19/11/2022 00:30:00 2816
6 18/11/2022 00:00:00 1994
7 18/11/2022 23:30:00 260
8 18/11/2022 23:00:00 732
9 18/11/2022 01:30:00 200
10 18/11/2022 01:00:00 1378
11 18/11/2022 00:30:00 2520
12 17/11/2022 00:00:00 1830
13 17/11/2022 23:30:00 96
14 17/11/2022 23:00:00 122
15 17/11/2022 01:30:00 694
16 17/11/2022 01:00:00 2950
17 17/11/2022 00:30:00 3062
18 16/11/2022 00:00:00 2420
19 16/11/2022 23:30:00 678
20 16/11/2022 23:00:00 644
CodePudding user response:
Okay I have an idiotic brutforce solution for you, so dont take that as coding recommondation but just something that gets the job done:
import itertools
dList = [f"{f}/{s}/2022" for f, s in itertools.product(range(1, 32), range(1, 13))]
i assume you have a text file with that so im just gonna use that:
file = 'yourfilename.txt'
#make sure youre running the program in the same directory as the .txt file
with open(file, "r") as f:
global lines
lines = f.readlines()
lines = [word.replace('\n','') for word in lines]
for i in lines:
if i in dList:
curD = i
else:
with open('output.txt', 'w') as g:
g.write(f'{i} {(i.split())[0]} {(i.split())[1]}')
make sure to create a file called output.txt in the same directory and everything will get writen into that file.
CodePudding user response:
Using pandas operations would be like the following:
data.csv
19/11/2022
00:00:00 2098
23:30:00 218
23:00:00 606
01:30:00 216
01:00:00 2672
00:30:00 2816
18/11/2022
00:00:00 1994
23:30:00 260
23:00:00 732
01:30:00 200
01:00:00 1378
00:30:00 2520
17/11/2022
00:00:00 1830
23:30:00 96
23:00:00 122
01:30:00 694
01:00:00 2950
00:30:00 3062
16/11/2022
00:00:00 2420
23:30:00 678
23:00:00 644
Implementation
import pandas as pd
df = pd.read_csv('data.csv', header=None)
df['amount'] = df[0].apply(lambda item:item.split(' ')[-1] if item.find(':')>0 else None)
df['time'] = df[0].apply(lambda item:item.split(' ')[0] if item.find(':')>0 else None)
df['date'] = df[0].apply(lambda item:item if item.find('/')>0 else None)
df['date'] = df['date'].fillna(method='ffill')
df = df.dropna(subset=['amount'], how='any')
df = df.drop(0, axis=1)
print(df)
output
amount time date
1 2098 00:00:00 19/11/2022
2 218 23:30:00 19/11/2022
3 606 23:00:00 19/11/2022
4 216 01:30:00 19/11/2022
5 2672 01:00:00 19/11/2022
6 2816 00:30:00 19/11/2022
8 1994 00:00:00 18/11/2022
9 260 23:30:00 18/11/2022
10 732 23:00:00 18/11/2022
11 200 01:30:00 18/11/2022
12 1378 01:00:00 18/11/2022
13 2520 00:30:00 18/11/2022
15 1830 00:00:00 17/11/2022
16 96 23:30:00 17/11/2022
17 122 23:00:00 17/11/2022
18 694 01:30:00 17/11/2022
19 2950 01:00:00 17/11/2022
20 3062 00:30:00 17/11/2022
22 2420 00:00:00 16/11/2022
23 678 23:30:00 16/11/2022
24 644 23:00:00 16/11/2022