Home > OS >  I am trying to design spark dataframe from column values to column name from existing spark datafram
I am trying to design spark dataframe from column values to column name from existing spark datafram

Time:08-05

I have dataframe like below -

tagname tagvalue filename starttime 2022-08-04 10:00:00 abc endtime 2022-08-04 10:20:00 abc startime 2022-08-04 10:10:00 xyz endtime 2022-08-04 10:50:00 xyz

i want this in below format

filename starttime endtime abc 2022-08-04 10:00:00 2022-08-04 10:20:00 xyz 2022-08-04 10:10:00 2022-08-04 10:50:00

please help? i tried many ways it did not work

CodePudding user response:

First, we can group by filename and collect_set on tagvalue, renaming it as data.

.groupBy("filename").agg(collect_set(col("tagvalue")).as("data"))

Then, we create two columns, starttime and endttime as below:

  .withColumn("starttime", col("data").getItem(0))
  .withColumn("endtime", col("data").getItem(1))

We finally drop data, our final result:

 -------- ------------------- ------------------- 
|filename|starttime          |endtime            |
 -------- ------------------- ------------------- 
|xyz     |2022-08-04 10:10:00|2022-08-04 10:50:00|
|abc     |2022-08-04 10:00:00|2022-08-04 10:20:00|
 -------- ------------------- ------------------- 

which is what you need, good luck (you can also order by filename asc, but I did not do that)!

  • Related