formating file with hours and date in the same column-CodePudding

our electricity provider think it could be very fun to make difficult to read csv files they provide.

This is precise electric consumption, every 30 min but in the SAME column you have hours, and date, example :

19/11/2022  
00:00:00    2098
23:30:00    218
23:00:00    606
[...]
01:30:00    216
01:00:00    2672
00:30:00    2816
    
18/11/2022  
00:00:00    1994
23:30:00    260
23:00:00    732
[...]
01:30:00    200
01:00:00    1378
00:30:00    2520
    
17/11/2022  
00:00:00    1830
23:30:00    96
23:00:00    122
[...]
01:30:00    694
01:00:00    2950
00:30:00    3062
    
16/11/2022  
00:00:00    2420
23:30:00    678
23:00:00    644
[...]

How damn can I obtain this kind of lovely formated file :

2022-11-19 00:00:00 2098
2022-11-19 23:30:00 218
2022-11-19 23:00:00 606

etc.

CodePudding user response：

Try:

import pandas as pd

current_date = None
all_data = []
with open("your_file.txt", "r") as f_in:
    for row in map(str.strip, f_in):
        if row == "":
            continue
        if "/" in row:
            current_date = row
        else:
            all_data.append([current_date, *row.split()])

df = pd.DataFrame(all_data, columns=["Date", "Time", "Value"])
print(df)

Prints:

          Date      Time Value
0   19/11/2022  00:00:00  2098
1   19/11/2022  23:30:00   218
2   19/11/2022  23:00:00   606
3   19/11/2022  01:30:00   216
4   19/11/2022  01:00:00  2672
5   19/11/2022  00:30:00  2816
6   18/11/2022  00:00:00  1994
7   18/11/2022  23:30:00   260
8   18/11/2022  23:00:00   732
9   18/11/2022  01:30:00   200
10  18/11/2022  01:00:00  1378
11  18/11/2022  00:30:00  2520
12  17/11/2022  00:00:00  1830
13  17/11/2022  23:30:00    96
14  17/11/2022  23:00:00   122
15  17/11/2022  01:30:00   694
16  17/11/2022  01:00:00  2950
17  17/11/2022  00:30:00  3062
18  16/11/2022  00:00:00  2420
19  16/11/2022  23:30:00   678
20  16/11/2022  23:00:00   644

CodePudding user response：

Okay I have an idiotic brutforce solution for you, so dont take that as coding recommondation but just something that gets the job done:

import itertools
dList = [f"{f}/{s}/2022" for f, s in itertools.product(range(1, 32), range(1, 13))]

i assume you have a text file with that so im just gonna use that:

file = 'yourfilename.txt'
#make sure youre running the program in the same directory as the .txt file
with open(file, "r") as f:
    global lines
    lines = f.readlines()
lines = [word.replace('\n','') for word in lines]
for i in lines:
    if i in dList:
        curD = i
    else:
        with open('output.txt', 'w') as g:
            g.write(f'{i} {(i.split())[0]} {(i.split())[1]}')

make sure to create a file called output.txt in the same directory and everything will get writen into that file.

CodePudding user response：

Using pandas operations would be like the following:

data.csv

19/11/2022  
00:00:00    2098
23:30:00    218
23:00:00    606
01:30:00    216
01:00:00    2672
00:30:00    2816
18/11/2022  
00:00:00    1994
23:30:00    260
23:00:00    732
01:30:00    200
01:00:00    1378
00:30:00    2520
17/11/2022  
00:00:00    1830
23:30:00    96
23:00:00    122
01:30:00    694
01:00:00    2950
00:30:00    3062
16/11/2022  
00:00:00    2420
23:30:00    678
23:00:00    644

Implementation

import pandas as pd
df = pd.read_csv('data.csv', header=None)
df['amount'] = df[0].apply(lambda item:item.split(' ')[-1] if item.find(':')>0 else None)
df['time'] = df[0].apply(lambda item:item.split(' ')[0] if item.find(':')>0 else None)
df['date'] = df[0].apply(lambda item:item if item.find('/')>0 else None)
df['date'] = df['date'].fillna(method='ffill')
df = df.dropna(subset=['amount'], how='any')
df = df.drop(0, axis=1)
print(df)

output

   amount      time          date
1    2098  00:00:00  19/11/2022  
2     218  23:30:00  19/11/2022  
3     606  23:00:00  19/11/2022  
4     216  01:30:00  19/11/2022  
5    2672  01:00:00  19/11/2022  
6    2816  00:30:00  19/11/2022  
8    1994  00:00:00  18/11/2022  
9     260  23:30:00  18/11/2022  
10    732  23:00:00  18/11/2022  
11    200  01:30:00  18/11/2022  
12   1378  01:00:00  18/11/2022  
13   2520  00:30:00  18/11/2022  
15   1830  00:00:00  17/11/2022  
16     96  23:30:00  17/11/2022  
17    122  23:00:00  17/11/2022  
18    694  01:30:00  17/11/2022  
19   2950  01:00:00  17/11/2022  
20   3062  00:30:00  17/11/2022  
22   2420  00:00:00  16/11/2022  
23    678  23:30:00  16/11/2022  
24    644  23:00:00  16/11/2022