I have dataframe like below -
tagname tagvalue filename starttime 2022-08-04 10:00:00 abc endtime 2022-08-04 10:20:00 abc startime 2022-08-04 10:10:00 xyz endtime 2022-08-04 10:50:00 xyz
i want this in below format
filename starttime endtime abc 2022-08-04 10:00:00 2022-08-04 10:20:00 xyz 2022-08-04 10:10:00 2022-08-04 10:50:00
please help? i tried many ways it did not work
CodePudding user response:
First, we can group by filename
and collect_set on tagvalue
, renaming it as data
.
.groupBy("filename").agg(collect_set(col("tagvalue")).as("data"))
Then, we create two columns, starttime
and endttime
as below:
.withColumn("starttime", col("data").getItem(0))
.withColumn("endtime", col("data").getItem(1))
We finally drop data
, our final result:
-------- ------------------- -------------------
|filename|starttime |endtime |
-------- ------------------- -------------------
|xyz |2022-08-04 10:10:00|2022-08-04 10:50:00|
|abc |2022-08-04 10:00:00|2022-08-04 10:20:00|
-------- ------------------- -------------------
which is what you need, good luck (you can also order by filename asc, but I did not do that)!