My code is returning the following data in CSV
Quantity Date of purchase
1 17 May 2022 at 5:40:20PM BST
1 2 Apr 2022 at 7:41:29PM BST
1 2 Apr 2022 at 6:42:05PM BST
1 29 Mar 2022 at 12:34:56PM BST
1 29 Mar 2022 at 10:52:54AM BST
1 29 Mar 2022 at 12:04:52AM BST
1 28 Mar 2022 at 4:49:34PM BST
1 28 Mar 2022 at 11:13:37AM BST
1 27 Mar 2022 at 8:53:05PM BST
1 27 Mar 2022 at 5:10:21PM BST
1 27 Mar 2022 at 10:34:27AM BST
1 25 Mar 2022 at 11:32:25AM GMT
1 24 Mar 2022 at 6:23:19PM GMT
1 23 Mar 2022 at 3:22:03AM GMT
1 23 Mar 2022 at 1:09:49AM GMT
I am trying to get the dates only and adding the quantity data with the same date but below is the code for that
data = read_csv("products_sold_history_data.csv")
data['Date of purchase'] = pandas.to_datetime(data['Date of purchase'] , format='%d-%m-%Y').dt.date
but its giving me error can anyone please help how can I take the dates only from Date of purchase column and then add the quantity values in the same date.
CodePudding user response:
Date format in your data is not the format that you specified: format='%d-%m-%Y'
.
You could specify it explicitly, or let pandas infer the format for you by not providing the format:
pandas.to_datetime(data['Date of purchase']).dt.date
If you want to specify the format explicitly, you should provide the format that matches your data:
pandas.to_datetime(data['Date of purchase'], format='%d %b %Y at %H:%M:%S%p %Z')
CodePudding user response:
here is one way to do it, where a date is created as a on-fly field and not making part of the DF.
Also, IIUC you're not concerned with the time part and only date is what you need to use for summing it up
extract the date part using regex, create a temp field dte using pandas.assign, and then a groupby to sum up the quantity
df.assign(dte = pd.to_datetime(
df['purchase'].str.extract(r'(.*)(at)')[0].str.strip())
).groupby('dte')['qty'].sum().reset_index()
dte qty
0 2022-02-06 3
1 2022-02-07 3
2 2022-02-08 2
3 2022-02-09 2
4 2022-02-10 2
5 2022-02-11 3
6 2022-02-14 1
7 2022-02-15 1
8 2022-02-19 1