How to take only first date from dataset OR merge data from same dates-CodePudding

I have a JSON file that I'm trying to build into a chart on Python. [JSON file preview][1]

I converted it to date time so I could attempt to either take the first date from the data (May 5, May 6, May 7, etc...) OR average out the data by adding the same dates together (May 5 may 5 = ex). Can anyone tell me what the best method of doing this is? :( I've tried following someone else's code on here but it gave me an error saying: AttributeError: Can only use .str accessor with string values!

My end goal is to have it be a table with just the time dataset labelled as Floor Price (ETH) so I can plot it into a line/bar graph. Any help would be much appreciated!! Thanks


df["data.date"] = pd.to_datetime(df["data.date"])

df["Time"] = df["data.date"].apply(lambda x : x.date())

print(df)

df['Day'] = df['Time'].str.extract(r'^(\d{4}-\d{2}-\d{2})')

df.groupby('Day').agg({'Time': 'first', 'data.dataset': 'last',}).reset_index(drop=True)```


JSON DATA:

"data": {
        "date": [
            "2022-05-05 04:19:35",
            "2022-05-05 04:39:23",
            "2022-05-05 05:09:02",
            "2022-05-05 05:39:13",
            "2022-05-05 06:11:15",
            "2022-05-05 06:39:09",
            "2022-05-05 07:08:58",
            "2022-05-05 07:38:46",
            "2022-05-05 08:09:23",
            "2022-05-05 08:38:38",
            "2022-05-05 09:08:32",
            "2022-05-05 09:38:36",
            "2022-05-05 10:05:29",
            "2022-05-05 10:38:41",
            "2022-05-05 11:08:44",
            "2022-05-05 11:38:53",
            "2022-05-05 12:08:33", (it keeps going to June 16)
"datasets": [
            3.05,
            3.05,
            3.0,
            2.7,
            2.83,
            2.649,
            2.62,
            2.62,
            2.7,
            2.8,
            2.75,
            2.69,
            2.5,
            2.52,
            2.549,
            2.549,
            2.38, etc etc

CodePudding user response：

Given a DataFrame that looks like:

                   date    values
0   2022-05-05 04:19:35     3.050
1   2022-05-05 04:39:23     3.050
2   2022-05-05 05:09:02     3.000
3   2022-05-05 05:39:13     2.700
4   2022-05-05 06:11:15     2.830
5   2022-05-05 06:39:09     2.649
6   2022-05-05 07:08:58     2.620
7   2022-05-05 07:38:46     2.620
8   2022-05-05 08:09:23     2.700
9   2022-05-05 08:38:38     2.800
10  2022-05-05 09:08:32     2.750
11  2022-05-05 09:38:36     2.690
12  2022-05-05 10:05:29     2.500
13  2022-05-05 10:38:41     2.520
14  2022-05-05 11:08:44     2.549
15  2022-05-05 11:38:53     2.549
16  2022-05-05 12:08:33     2.380

If your issue is getting your data into this format, you're going to need to provide more information and a Minimal, Reproducible Example.

Formatting:

df.date = pd.to_datetime(df.date)
df.set_index('date', inplace=True)

Resampling, here I use hourly due to the small subset of data, you can do daily with .resample('d'):

# Average for the Hour:
out = df.resample('h').mean()
print(out)

# First for the Hour:
out = df.resample('h').first()
print(out)

Output:

Average:

                     values
date
2022-05-05 04:00:00  3.0500
2022-05-05 05:00:00  2.8500
2022-05-05 06:00:00  2.7395
2022-05-05 07:00:00  2.6200
2022-05-05 08:00:00  2.7500
2022-05-05 09:00:00  2.7200
2022-05-05 10:00:00  2.5100
2022-05-05 11:00:00  2.5490
2022-05-05 12:00:00  2.3800

First:

                     values
date
2022-05-05 04:00:00   3.050
2022-05-05 05:00:00   3.000
2022-05-05 06:00:00   2.830
2022-05-05 07:00:00   2.620
2022-05-05 08:00:00   2.700
2022-05-05 09:00:00   2.750
2022-05-05 10:00:00   2.500
2022-05-05 11:00:00   2.549
2022-05-05 12:00:00   2.380