Convert dataframe hourly values in columns to a serie [Python]-CodePudding

I am trying to convert a dataframe in which hourly data appears in columns, as you can see here

to a dataframe that only contains two columns [datetime, value].

For example:

Datetime	value
2020-01-01 01:00:00	0
2020-01-01 02:00:00	0
...	...
2020-01-01 09:00:00	106
2020-01-01 10:00:00	2852

Any solution without using a 'for' cycle?

Thank you.

CodePudding user response：

Use DataFrame.melt with convert values to datetimes and add hours by to_timedelta with remove H:

df = df.melt('Date')

td = pd.to_timedelta(df.pop('variable').str.strip('H').astype(int), unit='H')
df['Date'] = pd.to_datetime(df['Date'])   td

CodePudding user response：

You can do it by applying several function to DataFrame:

from datetime import datetime

# Example DataFrame
df = pd.DataFrame({'date': ['1/1/2020', '1/2/2020', '1/3/2020'],
                   'h1': [0, 222, 333],
                   'h2': [44, 0, 0],
                   "h3": [1, 2, 3]})

# To simplify I used only hours in range 1...3, so You must change it to 25
HOURS_COUNT = 4

df["hours"] = df.apply(lambda row: [h for h in range(1, HOURS_COUNT)], axis=1)
df["hour_values"] = df.apply(lambda row: {h: row[f"h{h}"] for h in range(1, HOURS_COUNT)}, axis=1)

df = df.explode("hours")

df["value"] = df.apply(lambda row: row["hour_values"][row["hours"]], axis=1)
df["date_full"] = df.apply(lambda row: datetime.strptime(f"{row['date']} {row['hours']}", "%m/%d/%Y %H"), axis=1)

df = df[["date_full", "value"]]
df = df.loc[df["value"] > 0]

So initial DataFrame is:

       date   h1  h2  h3
0  1/1/2020    0  44   1
1  1/2/2020  222   0   2
2  1/3/2020  333   0   3

And result DataFrame is:

            date_full  value
0 2020-01-01 02:00:00     44
0 2020-01-01 03:00:00      1
1 2020-01-02 01:00:00    222
1 2020-01-02 03:00:00      2
2 2020-01-03 01:00:00    333
2 2020-01-03 03:00:00      3