I have a dataframe with a string column of hours:
-------
|DepTime|
-------
| 13:43|
| 11:25|
| 20:09|
| 09:03|
| 14:23|
| 20:24|
| 17:53|
| 06:22|
| 19:44|
| 14:53|
-------
I want to transform that column considering this intervals:
From 06:00 to 11:59 -> Morning
From 12:00 to 17:00 -> Afternoon
From 17:01 to 20:00 -> Evening
From 20:01 to 05:59 -> Night
Expected output:
------------
|DepTime |
------------
| Afternoon|
| Morning|
| Night|
| Morning|
| Afternoon|
| Night|
| Evening|
| Morning|
| Evening|
| Afternoon|
------------
I have done similar string transformations using functions like rlike
and lit
:
df = df.withColumn("DayOfWeek",
when(col("DayOfWeek").rlike("1"),lit("Monday"))
.when(col("DayOfWeek").rlike("2"),lit("Tuesday"))
.when(col("DayOfWeek").rlike("3"),lit("Wednesday"))
.when(col("DayOfWeek").rlike("4"),lit ("Thursday"))
.when(col("DayOfWeek").rlike("5"),lit("Friday"))
.when(col("DayOfWeek").rlike("6"),lit("Saturday"))
.when(col("DayOfWeek").rlike("7"),lit("Sunday"))
)
For this case I am thinking about using if
(maybe using <
and >
operators) and otherwise
but I don't know how to form the groups (ranges) since hours have a special order.
Any help is appreciated. Thanks in advance.
CodePudding user response:
Try this:
data
.withColumn("Time", date_format(col("DepTime"), "HH:mm"))
.withColumn("PeriodOfTime",
when(col("Time") > "06:00" && col("Time") <= "12:00", "Morning")
.when(col("Time") > "12:00" && col("Time") <= "17:00", "Afternoon")
.when(col("Time") > "17:00" && col("Time") <= "20:00", "Evening")
.otherwise("Night"))
.drop("Time")
Output (tested):
------- ------------
|DepTime|PeriodOfTime|
------- ------------
| 13:43| Afternoon|
| 11:25| Morning|
| 20:09| Night|
| 09:03| Morning|
| 14:23| Afternoon|
| 20:24| Night|
| 17:53| Evening|
| 06:22| Morning|
| 19:44| Evening|
| 14:53| Afternoon|
------- ------------