So basically, I'm writing out statistics.
date,students
2022-11-16,22
2022-11-17,29
I want to read this csv back in and pull the col2 value from "yesterdays" row and compare it to the col2 value from "todays" row and look for a threshold difference. Something like a 5% variance. The last part is straightforward but I'm having a heck of a time with pulling the right rows and re-capturing the 'student' count for comparison.
I can do the hunt operation good enough with Pandas but I lose the second column in the match and its just not clicking for me.
import pandas as pd
from datetime import date
from datetime import timedelta
today = date.today()
yesterday = date.today() - timedelta(1)
print("today is ", today, " and yesterday was ", yesterday)
df = pd.read_csv('test.csv')
col1 = df.timestamp
col2 = df.hostcount
for row in col1:
if row == str(yesterday):
print(row)
Any ideas are greatly appreciated! I'm sure this is something goofy that I'm overlooking at 1am.
CodePudding user response:
You can try this:
today = str(date.today())
yesterday = str(date.today() - timedelta(1))
print("today is ", today, " and yesterday was ", yesterday)
df = pd.read_csv('test.csv')
today_value = df.loc[df['date'] == today, 'students'].values[0]
Actually is about how to extract column value based on another column in Pandas:
https://stackoverflow.com/a/36685531/10787867
And pay attention to comparing string to string (and not to a datetime.date object)
CodePudding user response:
You may consider that pandas is somewhat "heavyweight" for something so trivial.
So, without pandas how about:
from datetime import datetime, timedelta
now = datetime.now()
today, *_ = str(now).split()
yesterday, *_ = str(now - timedelta(days=1)).split()
tv = None
yv = None
with open('test.csv') as data:
for line in data.readlines()[1:]:
d, s = line.split(',')
if d == today:
tv = float(s)
elif d == yesterday:
yv = float(s)
if tv and yv:
variance = (tv-yv)/yv*100
print(f'Variance={variance:.2f}%')
break