So, I have a python code like this:
import csv
import pandas as pd
import numpy as np
import time
from pandas import Series,DataFrame
df = pd.read_csv('C:/Users/Desktop/case_study_1.csv',low_memory=False)
df.head()
#convert interaction_time to date time format
df.interaction_time = pd.to_datetime(df.interaction_time)
#remove null on merchant column
df_remove_null = df.dropna(subset=['merchant'])
#count added, comfirmed txn
df_cnt = df_remove_null.groupby([pd.Grouper(key='interaction_time',freq='H'),df_remove_null.fullVisitorid,df_remove_null.action_type]).size().reset_index(name='count')
df_final_cnt = df_cnt.groupby(['interaction_time','action_type'])['fullVisitorid'].size().reset_index(name='count')
#export csv file
df_final_cnt.to_csv(r'C:\Users\Desktop\filename12.csv',index = False, columns = ["interaction_time","action_type","count"])
As you can see, the code outputs a csv file. I saved the csv file to my local directory. All I want to do is just to run the code automatically every 10mins and generate a new csv file. So, every 10mins the new csv file will overwrite the old one.
I dont have much knowledge about automation so any kind of help will be greatly appreciated.
I tried for loop with range(100) but the error show: IndentationError: expected an indented block
Thanks.
CodePudding user response:
Adding this around your code will do the job every ten minutes if the script is running constantly
while(True):
... your code here ...
time.sleep(600)
The indentation error is formatting you will need to find where you are formatting wrong, I recommend looking into a formatting/linting tool for this
CodePudding user response:
You can put all of the work in a function and call this function every 10 minute using modules like sched
.
import sched, time
sd = sched.scheduler(time.time, time.sleep)
def your_func(sc):
df = pd.read_csv('C:/Users/Desktop/case_study_1.csv',low_memory=False)
df.head()
#convert interaction_time to date time format
df.interaction_time = pd.to_datetime(df.interaction_time)
#remove null on merchant column
df_remove_null = df.dropna(subset=['merchant'])
#count added, comfirmed txn
df_cnt = df_remove_null.groupby([pd.Grouper(key='interaction_time',freq='H'),df_remove_null.fullVisitorid,df_remove_null.action_type]).size().reset_index(name='count')
df_final_cnt = df_cnt.groupby(['interaction_time','action_type'])['fullVisitorid'].size().reset_index(name='count')
#export csv file
df_final_cnt.to_csv(r'C:\Users\Desktop\filename12.csv',index = False, columns = ["interaction_time","action_type","count"])
sd.enter(600, 1, your_func, (sc,))
sd.enter(600, 1, your_func, (sd,))
sd.run()
What this is doing is, between two execution it is giving a 10 min of gap. (In case your code execution time is 2 min then, it would execute every 12 minutes).
CodePudding user response:
the most simple solution I think is:
import time
while True:
# your script
time.sleep(10) ```
This is an infinite loop, you can use a condition for break.
CodePudding user response:
If not limit to implement in python, a simple solution is using Windows Task Schedule to execute the script every 10 minutes.
Please refer following topic: Run a task every x-minutes with Windows Task Scheduler