Home > Blockchain >  Fill in Date when only knowing startdate and continous hours? Pandas
Fill in Date when only knowing startdate and continous hours? Pandas

Time:10-30

I have a dataframe which is from a license log file. The log file logs only by continueous hours. In the header of the logfile is a startdate. So everytime the hour starts with 0 a new day should begin. How can i solve this in python?

Here is a Example of which i got. Left is current structe, right is expected output:

enter image description here enter image description here

CodePudding user response:

I immediately thought of a loop solution; there might be more pythonic ways though.

import pandas as pd
from datetime import timedelta

df=pd.read_csv('date_example.csv', parse_dates=['Date'])

for idx, row in df.iloc[1:].iterrows():
    if df.loc[idx,'Hour'] == 0:
        df.loc[idx,'Date']= df.loc[idx-1,'Date'] timedelta(days=1)
    else:
        df.loc[idx,'Date'] = df.loc[idx-1, 'Date']

CodePudding user response:

you didn't add the raw data so I created a similar example this solution assumes there are no days without data.

import pandas as pd
import datetime
import numpy as np

# example data
data = [[datetime.datetime(2021,10,28), 0,5], [np.nan, 1, 6], [np.nan, 23, 7], [np.nan, 1, 8]]
df = pd.DataFrame(data, columns = [['Date', 'Hour','License_Count']])

for i in range(1, len(df)):
    if df.iat[i,1] >= df.iat[i-1,1]:
        df.loc[i,'Date'] = df.iat[i-1,0]
    if df.iat[i,1] <= df.iat[i-1,1]:
        df.loc[i,'Date'] = df.iat[i-1,0]   datetime.timedelta(days=1)

CodePudding user response:

I have done this by applying the below function.

import pandas as pd
from datetime import timedelta

df["Date"] = pd.to_datetime(df["Date"])
temp=df.copy()

def func(x):
    if x['Hours'] == 0:
        if x.name == 0:
            temp.loc[x.name, 'Date'] = temp.loc[0, 'Date']   timedelta(days=1)
        
        else: 
            temp.loc[x.name, 'Date'] = temp.loc[x.name - 1, 'Date']   timedelta(days=1)
        
    else:
        temp.loc[x.name, 'Date'] = temp.loc[x.name - 1, 'Date']

df.apply(func, axis = 1)
print(temp)

"temp" is your desired output.

CodePudding user response:

I used an Excelsheet as input.xlsx that is similiar to your input. The date automatically starts with the hour 0, therefore I didn't use the column with the hours. The output is then stored in the output.xlsx.

import pandas as pd
from datetime import timedelta

df = pd.read_excel("input.xlsx")
date = df['Date'][0]

for index, row in df.iterrows():
    df['Date'][index] = date
    date  = timedelta(hours=1)

df.to_excel("output.xlsx")
  • Related