I have a dataframe which is from a license log file. The log file logs only by continueous hours. In the header of the logfile is a startdate. So everytime the hour starts with 0 a new day should begin. How can i solve this in python?
Here is a Example of which i got. Left is current structe, right is expected output:
CodePudding user response:
I immediately thought of a loop solution; there might be more pythonic ways though.
import pandas as pd
from datetime import timedelta
df=pd.read_csv('date_example.csv', parse_dates=['Date'])
for idx, row in df.iloc[1:].iterrows():
if df.loc[idx,'Hour'] == 0:
df.loc[idx,'Date']= df.loc[idx-1,'Date'] timedelta(days=1)
else:
df.loc[idx,'Date'] = df.loc[idx-1, 'Date']
CodePudding user response:
you didn't add the raw data so I created a similar example this solution assumes there are no days without data.
import pandas as pd
import datetime
import numpy as np
# example data
data = [[datetime.datetime(2021,10,28), 0,5], [np.nan, 1, 6], [np.nan, 23, 7], [np.nan, 1, 8]]
df = pd.DataFrame(data, columns = [['Date', 'Hour','License_Count']])
for i in range(1, len(df)):
if df.iat[i,1] >= df.iat[i-1,1]:
df.loc[i,'Date'] = df.iat[i-1,0]
if df.iat[i,1] <= df.iat[i-1,1]:
df.loc[i,'Date'] = df.iat[i-1,0] datetime.timedelta(days=1)
CodePudding user response:
I have done this by applying the below function.
import pandas as pd
from datetime import timedelta
df["Date"] = pd.to_datetime(df["Date"])
temp=df.copy()
def func(x):
if x['Hours'] == 0:
if x.name == 0:
temp.loc[x.name, 'Date'] = temp.loc[0, 'Date'] timedelta(days=1)
else:
temp.loc[x.name, 'Date'] = temp.loc[x.name - 1, 'Date'] timedelta(days=1)
else:
temp.loc[x.name, 'Date'] = temp.loc[x.name - 1, 'Date']
df.apply(func, axis = 1)
print(temp)
"temp" is your desired output.
CodePudding user response:
I used an Excelsheet as input.xlsx that is similiar to your input. The date automatically starts with the hour 0, therefore I didn't use the column with the hours. The output is then stored in the output.xlsx.
import pandas as pd
from datetime import timedelta
df = pd.read_excel("input.xlsx")
date = df['Date'][0]
for index, row in df.iterrows():
df['Date'][index] = date
date = timedelta(hours=1)
df.to_excel("output.xlsx")