I am trying to classify the following data into morning, noon, evening
Date-Time Frequency Between Streak
2021-01-01 00:00:00 49.9989 False 1
2021-01-01 00:00:01 49.9981 False 2
2021-01-01 00:00:02 49.9970 False 3
2021-01-01 00:00:03 49.9942 False 4
2021-01-01 00:00:04 49.9928 False 5
Ive found similar questions but I cant get any to work with my data. I get errors like this AttributeError: 'Series' object has no attribute 'date'
I am trying the following:
df['new'] = pd.cut(df.index,
bins=[0,6,12,18,23],
labels=['night','morning','afternoon','evening'],
include_lowest=True)
But I get ValueError: bins must be of datetime64 dtype
Desired Output:
Date-Time Frequency Between Streak Class
2021-01-01 00:00:00 49.9989 False 1 morning
2021-01-01 00:00:01 49.9981 False 2 morning
2021-01-01 00:00:02 49.9970 False 3 morning
2021-01-01 14:00:03 49.9942 False 4 afternoon
2021-01-01 19:00:04 49.9928 False 5 night
CodePudding user response:
I was able to take the data you had in your expected output and get the same results as you using the following code:
import pandas as pd
import numpy as np
df['Date-Time'] = pd.to_datetime(df['Date-Time'], infer_datetime_format=True)
df['Hour'] = df['Date-Time'].dt.hour
condition_list = [df['Hour'] > 18, df['Hour'].between(12, 18), df['Hour'] < 12]
choice_list = ['Night', 'Afternoon', 'Morning']
df['Class'] = np.select(condition_list, choice_list, 0)
df
You can change the condition_list and choice_list to whatever you would so it should be pretty flexible on how you decide to label your different classes based on time.