Home > Software design >  Create a List of List by grouping in scala
Create a List of List by grouping in scala

Time:08-14

Hi I've a List which contains foldernames, these foldernames strings may or may not contain _1 till _10 at the end of the folder names , I want to group the similar naming strings in a list for further processing. My initial List looks like this:

scala> val emp: List[String] = List("customer_bal_history_1_36","customer_bal_history_1_36_1","customer_bal_history_1_36_2","customer_bal_history_1_36_3","customer_credit_history_37_72_1","customer_credit_history_37_72_2","customer_credit_history_37_72_3","employee_1", "employee_10", "address","pincode","domain_1","domain_2","vehicle_1","vehicle_2","vendor_account_1","vendor_account_2")
emp: List[String] = List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3, customer_credit_history_37_72_1, customer_credit_history_37_72_2, customer_credit_history_37_72_3, employee_1, employee_10, address, pincode, domain_1, domain_2, vehicle_1, vehicle_2, vendor_account_1, vendor_account_2)

So I tried this code to group them together:

scala> emp.groupBy(_.takeWhile(_ != '_')).values.toList
res0: List[List[String]] = List(List(vehicle_1, vehicle_2), List(employee_1, employee_10), List(domain_1, domain_2), List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3, customer_credit_history_37_72_1, customer_credit_history_37_72_2, customer_credit_history_37_72_3), List(address), List(vendor_account_1, vendor_account_2), List(pincode))

The problem with the above code is that it groups two foldernames customer_bal_history_1_36 and customer_credit_history_37_72 like this

List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3, customer_credit_history_37_72_1, customer_credit_history_37_72_2, customer_credit_history_37_72_3)

I want them to be grouped like this

List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3),List(customer_credit_history_37_72_1, customer_credit_history_37_72_2, customer_credit_history_37_72_3)

and the resultant List of List Looking like this

List(List(vehicle_1, vehicle_2), List(employee_1, employee_10), List(domain_1, domain_2), List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3),List(customer_credit_history_37_72_1, customer_credit_history_37_72_2, customer_credit_history_37_72_3), List(address), List(vendor_account_1, vendor_account_2), List(pincode))

Is there any regular expression that can be matched and grouped them together. Need help with solving this .

CodePudding user response:

One option might be splitting on _ and take the all the parts that do not start with a digit and join them back with an underscore.

emp.groupBy(
  _.split("_")
    .takeWhile(s => !Character.isDigit(s.charAt(0)))
    .mkString("_")
).values.toList

See a Scala demo.


If you want to use a regex, you might group the _ matching no digits:

val pattern = """^\D*_""".r
emp.groupBy(s =>
  pattern
    .findFirstMatchIn(s)
    .map(_.group(0))
    .getOrElse(s)

).values.toList

Output for both examples (formatted for readability)

res0: List[List[String]] = 
List(
    List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3), 
    List(vendor_account_1, vendor_account_2), 
    List(domain_1, domain_2), 
    ....
)

If you only want the values like customer_bal_history_1_36 grouped, you might also use a foldLeft with a Map and a pattern that matches at least 2 underscores and a digit after the underscore:

val pattern = """^[^\d_]*_[^\d_]*_\D*_""".r
emp.foldLeft(Map[String, List[String]]()){ case (acc, curr) =>
  pattern.findFirstMatchIn(curr) match {
    case Some(m) => acc   (m.group(0) -> (curr :: acc.getOrElse(m.group(0), Nil)))
    case _ => acc
  }
}.values.toList

See a Scala demo.

  • Related