I have a pandas.DataFrame with columns 'start', 'end', and 'vals_to_sum'. I want to sum all values in the latter column for dates in a list of days in datetime.date format: date_list = [start_date datetime.timedelta(days=i) for i in range(366)]
where start_date is of datetime.date. I have a problem where when I try to index my start and date times, python seems to convert them to str format and I get a TypeError.
My code currently is:
# Initialise empty array to fill with summed values for each day
output = numpy.zeros(len(date_list))
for idx, date in enumerate(date_list):
# Concatonate all values within date range start < x < end
print(type(start),'start') # <class 'datetime.date'> start
print(type(end), 'end') # <class 'datetime.date'> end
print(type(date), 'date') # <class 'datetime.date'> date
to_sum = [value for i, value in enumerate(df['vals_to_sum'])
if df['start'] <= date & df['end'] >= date]
output[idx] = numpy.sum(numpy.array(to_sum).astype(numpy.float))
However, I get the following error: TypeError: unsupported operand type(s) for &: 'str' and 'datetime.date'
CodePudding user response:
The Pandas dataframes store pointers to each string into the type 'object' (check out the docs at https://pandas.pydata.org/pandasdocs/stable/user_guide/text.html). If you'd like to assign it back to the column, you could do something like:
df['column_new'] = df['column'].str.split(',')
Since I'm not sure how you have your lists/tables structured, another issue could be that the date isn't being compared properly. On any datetime object, you can call it's toString method. Here's another link that you can read how to do that: https://www.programiz.com/python-programming/datetime/strftime
You're also trying to use a bitwise and -- if you're unsure what this means, if you were to do 1010 & 0101, it would return 0000. Your understanding of this is fairly irrelevant to your issue, but you do need to know that it is not what you want. You want to use the boolean and. To do this in python: replace '&' with 'and'
I'm not sure exactly how everything you have is structured, but hopefully this is enough help to get your code working!
CodePudding user response:
I don't fully understand your code that you have put, especially I don't know the content of the dataframe, especially the following df['start'] and df['end'] but I could comment on the following that I hope can help you. change this line & for "and" I see that it is missing: and the indentation but I suppose it is due to a copy and paste issue on the platform without realizing it
if df['start'] <= date & df['end'] >= date:
even this line could be like this
if df['start'] <= date >= df['end']:
Regarding the error, use the following function to convert string to datetime and give it the desired format as follows
start = datetime.datetime.strptime(df['start'],'%m/%d/%Y')
end = datetime.datetime.strptime(df['end'],'%m/%d/%Y')
if start <= date >= end:
My language is not English, I hope you can understand it.