Hi I've a List which contains foldernames, these foldernames strings may or may not contain _1 till _10 at the end of the folder names , I want to group the similar naming strings in a list for further processing. My initial List looks like this:
scala> val emp: List[String] = List("customer_bal_history_1_36","customer_bal_history_1_36_1","customer_bal_history_1_36_2","customer_bal_history_1_36_3","customer_credit_history_37_72_1","customer_credit_history_37_72_2","customer_credit_history_37_72_3","employee_1", "employee_10", "address","pincode","domain_1","domain_2","vehicle_1","vehicle_2","vendor_account_1","vendor_account_2")
emp: List[String] = List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3, customer_credit_history_37_72_1, customer_credit_history_37_72_2, customer_credit_history_37_72_3, employee_1, employee_10, address, pincode, domain_1, domain_2, vehicle_1, vehicle_2, vendor_account_1, vendor_account_2)
So I tried this code to group them together:
scala> emp.groupBy(_.takeWhile(_ != '_')).values.toList
res0: List[List[String]] = List(List(vehicle_1, vehicle_2), List(employee_1, employee_10), List(domain_1, domain_2), List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3, customer_credit_history_37_72_1, customer_credit_history_37_72_2, customer_credit_history_37_72_3), List(address), List(vendor_account_1, vendor_account_2), List(pincode))
The problem with the above code is that it groups two foldernames customer_bal_history_1_36 and customer_credit_history_37_72 like this
List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3, customer_credit_history_37_72_1, customer_credit_history_37_72_2, customer_credit_history_37_72_3)
I want them to be grouped like this
List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3),List(customer_credit_history_37_72_1, customer_credit_history_37_72_2, customer_credit_history_37_72_3)
and the resultant List of List Looking like this
List(List(vehicle_1, vehicle_2), List(employee_1, employee_10), List(domain_1, domain_2), List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3),List(customer_credit_history_37_72_1, customer_credit_history_37_72_2, customer_credit_history_37_72_3), List(address), List(vendor_account_1, vendor_account_2), List(pincode))
Is there any regular expression that can be matched and grouped them together. Need help with solving this .
CodePudding user response:
One option might be splitting on _
and take the all the parts that do not start with a digit and join them back with an underscore.
emp.groupBy(
_.split("_")
.takeWhile(s => !Character.isDigit(s.charAt(0)))
.mkString("_")
).values.toList
See a Scala demo.
If you want to use a regex, you might group the _
matching no digits:
val pattern = """^\D*_""".r
emp.groupBy(s =>
pattern
.findFirstMatchIn(s)
.map(_.group(0))
.getOrElse(s)
).values.toList
Output for both examples (formatted for readability)
res0: List[List[String]] =
List(
List(customer_bal_history_1_36, customer_bal_history_1_36_1, customer_bal_history_1_36_2, customer_bal_history_1_36_3),
List(vendor_account_1, vendor_account_2),
List(domain_1, domain_2),
....
)
If you only want the values like customer_bal_history_1_36
grouped, you might also use a foldLeft with a Map and a pattern that matches at least 2 underscores and a digit after the underscore:
val pattern = """^[^\d_]*_[^\d_]*_\D*_""".r
emp.foldLeft(Map[String, List[String]]()){ case (acc, curr) =>
pattern.findFirstMatchIn(curr) match {
case Some(m) => acc (m.group(0) -> (curr :: acc.getOrElse(m.group(0), Nil)))
case _ => acc
}
}.values.toList
See a Scala demo.