I have a dataframe containing various data, including a column from Linux Timestamp. For further analysis, I need to extract the minutes of each period (hour minute number, day minute number, week minute number, month minute number, year minute number) from the Linux Timestamp column.
I have:
TimeStamp var1 var2
1659494100 5.22 6.34
1659494160 4.33 7.33
1659494220 5.46 7.21
1659494280 4.33 4.51
1659494340 6.45 5.67
...
I need to have:
TimeStamp var1 var2 minute_of_hour minute_of_day minute_of_week minute_of_month minute_of_year
1659494100 5.22 6.34 35 155 3035 3035 308315
1659494160 4.33 7.33 36 156 3036 3036 308316
1659494220 5.46 7.21 37 157 3037 3037 308317
1659494280 4.33 4.51 38 158 3038 3038 308318
1659494340 6.45 5.67 39 159 3039 3039 308319
I have a large table and using loops is not an option. Do you have any ideas?
CodePudding user response:
import pandas as pd
# Your dataframe here:
df = pd.DataFrame({
"Timestamp": [1659494100, 1659494160, 1659494220, 1659494280, 1659494340],
"var1": [5.22, 4.33, 5.46, 4.33, 6.45],
"var2": [6.34, 7.33, 7.21, 4.51, 5.67]
})
timestamps = pd.to_datetime(df["Timestamp"], unit="s")
freqs = {
"hour": "H",
"day": "D",
"week": "W",
"month": "M",
"year": "Y"
}
for name, freq in freqs.items():
df[f"minute_of_{name}"] = (
timestamps - timestamps.dt.to_period(freq).dt.start_time
) // pd.Timedelta("1Min")
Output:
Timestamp var1 var2 minute_of_hour minute_of_day minute_of_week \
0 1659494100 5.22 6.34 35 155 3035
1 1659494160 4.33 7.33 36 156 3036
2 1659494220 5.46 7.21 37 157 3037
3 1659494280 4.33 4.51 38 158 3038
4 1659494340 6.45 5.67 39 159 3039
minute_of_month minute_of_year
0 3035 308315
1 3036 308316
2 3037 308317
3 3038 308318
4 3039 308319
Note that some columns can be calculated more directly, but this method makes the code consistent for all frequencies.