Home > Blockchain >  Fetching HDFS Path In a List Dynamically Using Scala
Fetching HDFS Path In a List Dynamically Using Scala

Time:09-27

I have one scenario where I need to collect a list of filepaths based on the number of hours in the current day folder.

I.e., I have the following path:

/user/hdfs/test/partition_date=2022-09-21/hour=19
/user/hdfs/test/partition_date=2022-09-21/hour=20
/user/hdfs/test/partition_date=2022-09-21/hour=21
/user/hdfs/test/partition_date=2022-09-21/hour=22
/user/hdfs/test/partition_date=2022-09-21/hour=23
/user/hdfs/test/partition_date=2022-09-22/hour=00
/user/hdfs/test/partition_date=2022-09-22/hour=01
/user/hdfs/test/partition_date=2022-09-22/hour=02

So here I will hardcode the path till '/user/hdfs/test/'. My code will append the partition_date using the date function and will take the current date.

Let's say the current timestamp is 2022-09-21 19:00 So here the date will be 2022-09-21 and the value of an hour will be 19.

I will pass the number of hours value from the spark submit command, so if this value is 2, the code will fetch the path of the last 2 hours . Let's say the current time is 2022-09-21 19:00 Then I need to fetch the paths.

/user/hdfs/test/partition_date=2022-09-21/hour=19, 
/user/hdfs/test/partition_date=2022-09-21/hour=18 

Similarly, if hours 3, the need to fetch

/user/hdfs/test/partition_date=2022-09-21/hour=19, 
/user/hdfs/test/partition_date=2022-09-21/hour=18 ,
/user/hdfs/test/partition_date=2022-09-21/hour=17.

At 00, 01,etc. hrs (at 12 am or after 12 am ) the current date will change to the next day's date, so if I pass the number of hours 2, it should fetch the previous hour from the previous date.

so the path will be :

/user/hdfs/test/partition_date=2022-09-22/hour=00,
/user/hdfs/test/partition_date=2022-09-21/hour=23,

If I pass the number of hours at 3 at 1 am hours ,

So basically, it should take

/user/hdfs/test/partition_date=2022-09-22/hour=01   
/user/hdfs/test/partition_date=2022-09-22/hour=00 
/user/hdfs/test/partition_date=2022-09-21/hour=23      

I am trying the below code locally, but here I need to hardcode the number of hours and accordingly I am generating the path.

var currentHour=0
var prevHour=
var flag = 0
var currpath = ""
var prevpath = ""
var CurrentDate = java.time.LocalDate.now
var PreviousDate=java.time.LocalDate.now.minusDays(1)

val now = Calendar.getInstance()
if ( now.get(Calendar.HOUR_OF_DAY) < 1) {
 currentHour = now.get(Calendar.HOUR_OF_DAY)
 prevHour = "23".toInt
flag=0
}
else {
 currentHour = now.get(Calendar.HOUR_OF_DAY)
 prevHour = now.get(Calendar.HOUR_OF_DAY) - 1
flag=1
}
val hdfsConf = new Configuration();

val path = "/user/hdfs/test/"

if(flag==0) {
 currpath =
  (path   "partition_date="   CurrentDate   "/"   "hour="   currentHour   "/")
 prevpath =
  (path   "partition_date="   PreviousDate   "/"   "hour="   prevHour   "/")
}
else{
 currpath =
  (path   "partition_date="   CurrentDate   "/"   "hour="   currentHour   "/")
 prevpath =
  (path   "partition_date="   CurrentDate   "/"   "hour="   prevHour   "/")
}
            

Can someone please help me?

How to make it generic so I can pass the number of hours dynamically and, accordingly, it can take the paths in a list.

CodePudding user response:

Try this:

val currentTs = java.time.LocalDateTime.now
val hours = 3
val paths = (0 until hours)
  .map(h => currentTs.minusHours(h))
  .map(ts => s"/user/hdfs/test/partition_date=${ts.toLocalDate}/hour=${ts.getHour}")
paths.foreach(println)

/user/hdfs/test/partition_date=2022-09-21/hour=14
/user/hdfs/test/partition_date=2022-09-21/hour=13
/user/hdfs/test/partition_date=2022-09-21/hour=12
  • Related