I have a JSON file that I'm trying to build into a chart on Python. [JSON file preview][1]
I converted it to date time so I could attempt to either take the first date from the data (May 5, May 6, May 7, etc...) OR average out the data by adding the same dates together (May 5 may 5 = ex). Can anyone tell me what the best method of doing this is? :( I've tried following someone else's code on here but it gave me an error saying: AttributeError: Can only use .str accessor with string values!
My end goal is to have it be a table with just the time dataset labelled as Floor Price (ETH) so I can plot it into a line/bar graph. Any help would be much appreciated!! Thanks
df["data.date"] = pd.to_datetime(df["data.date"])
df["Time"] = df["data.date"].apply(lambda x : x.date())
print(df)
df['Day'] = df['Time'].str.extract(r'^(\d{4}-\d{2}-\d{2})')
df.groupby('Day').agg({'Time': 'first', 'data.dataset': 'last',}).reset_index(drop=True)```
JSON DATA:
"data": {
"date": [
"2022-05-05 04:19:35",
"2022-05-05 04:39:23",
"2022-05-05 05:09:02",
"2022-05-05 05:39:13",
"2022-05-05 06:11:15",
"2022-05-05 06:39:09",
"2022-05-05 07:08:58",
"2022-05-05 07:38:46",
"2022-05-05 08:09:23",
"2022-05-05 08:38:38",
"2022-05-05 09:08:32",
"2022-05-05 09:38:36",
"2022-05-05 10:05:29",
"2022-05-05 10:38:41",
"2022-05-05 11:08:44",
"2022-05-05 11:38:53",
"2022-05-05 12:08:33", (it keeps going to June 16)
"datasets": [
3.05,
3.05,
3.0,
2.7,
2.83,
2.649,
2.62,
2.62,
2.7,
2.8,
2.75,
2.69,
2.5,
2.52,
2.549,
2.549,
2.38, etc etc
CodePudding user response:
Given a DataFrame that looks like:
date values
0 2022-05-05 04:19:35 3.050
1 2022-05-05 04:39:23 3.050
2 2022-05-05 05:09:02 3.000
3 2022-05-05 05:39:13 2.700
4 2022-05-05 06:11:15 2.830
5 2022-05-05 06:39:09 2.649
6 2022-05-05 07:08:58 2.620
7 2022-05-05 07:38:46 2.620
8 2022-05-05 08:09:23 2.700
9 2022-05-05 08:38:38 2.800
10 2022-05-05 09:08:32 2.750
11 2022-05-05 09:38:36 2.690
12 2022-05-05 10:05:29 2.500
13 2022-05-05 10:38:41 2.520
14 2022-05-05 11:08:44 2.549
15 2022-05-05 11:38:53 2.549
16 2022-05-05 12:08:33 2.380
If your issue is getting your data into this format, you're going to need to provide more information and a Minimal, Reproducible Example.
Formatting:
df.date = pd.to_datetime(df.date)
df.set_index('date', inplace=True)
Resampling, here I use hourly due to the small subset of data, you can do daily with .resample('d')
:
# Average for the Hour:
out = df.resample('h').mean()
print(out)
# First for the Hour:
out = df.resample('h').first()
print(out)
Output:
- Average:
values
date
2022-05-05 04:00:00 3.0500
2022-05-05 05:00:00 2.8500
2022-05-05 06:00:00 2.7395
2022-05-05 07:00:00 2.6200
2022-05-05 08:00:00 2.7500
2022-05-05 09:00:00 2.7200
2022-05-05 10:00:00 2.5100
2022-05-05 11:00:00 2.5490
2022-05-05 12:00:00 2.3800
- First:
values
date
2022-05-05 04:00:00 3.050
2022-05-05 05:00:00 3.000
2022-05-05 06:00:00 2.830
2022-05-05 07:00:00 2.620
2022-05-05 08:00:00 2.700
2022-05-05 09:00:00 2.750
2022-05-05 10:00:00 2.500
2022-05-05 11:00:00 2.549
2022-05-05 12:00:00 2.380