Home > Blockchain >  Kotlin/Java/JVM - Parsing Russian dates like "28 фев. 2019"
Kotlin/Java/JVM - Parsing Russian dates like "28 фев. 2019"

Time:11-17

I don't speak Russian, so I'm having trouble validating whether the months are correctly spelled, etc. To be honest, I'm not fully sure that my input is in Russian (Russian is the language detected by Google translate)

I have some code in Kotlin which does a best-effort to parse dates specified in various formats and languages. I'm struggling with parsing Russian dates, however. Here's the relevant part of my code:

sequenceOf(
  "ru-RU", // Russian
  "sr", // Serbian
).forEach {
  val format = DateTimeFormatter.ofPattern("d MMM. yyyy")
    .withLocale(Locale.forLanguageTag(it))
  try {
    return listOf(LocalDate.parse(dateString, format))
  } catch (e: Exception) {
    //Ignore and move on
  }
}

This code correctly parses "27 апр. 2018" and "24 мая. 2013", but fails on "28 фев. 2019".

What's special about "28 фев. 2019" and/or how can I parse this value correctly?

If you provide answers in Java, I can translate it to Kotlin fairly easily.


EDIT: Here's an SSCCE in Kotlin:

import java.time.LocalDate
import java.time.format.DateTimeFormatter
import java.util.*

println("System.getProperty - "   System.getProperty("java.version"));
println("Runtime.version - "   Runtime.version());

val dateString = "28 фев. 2019"

sequenceOf(
    "ru-RU", // Russian
    "sr", // Serbian
).forEach {
    val format = DateTimeFormatter.ofPattern("d MMM. yyyy")
        .withLocale(Locale.forLanguageTag(it))
    try {
        println("Parse successful - "   LocalDate.parse(dateString, format))
    } catch (e: Exception) {
        println("Parse failed - "   e)
    }
}

Output on my system:

System.getProperty - 17.0.4.1
Runtime.version - 17.0.4.1 7-b469.62
Parse failed - java.time.format.DateTimeParseException: Text '28 фев. 2019' could not be parsed at index 3
Parse failed - java.time.format.DateTimeParseException: Text '28 фев. 2019' could not be parsed at index 3

CodePudding user response:

Your input seems to have wrong abbreviation. The correct abbreviation should be февр.. Check this page and this page for more information.

A workaround would be to replace the input with the correct abbreviation before you parse it.

public class Main {
    public static void main(String[] args) {
        String input = "28 фев. 2019";
        input = input.replace("фев.", "февр.");

        DateTimeFormatter dtf = DateTimeFormatter.ofPattern("d MMM uuuu",
                Locale.forLanguageTag("ru-RU"));

        System.out.println(LocalDate.parse(input, dtf));
        System.out.println(LocalDate.of(2019, 2, 28).format(dtf));
    }
}

Output:

2019-02-28
28 февр. 2019

CodePudding user response:

Since you are parsing user input, I believe, the only option is to normalize that input prior parsing it - appealing to standards is not an option there.

In Russian language we use genitive form of month names in dates (M(M) vs L(L) in java DateTimeFormat) and, normally, short forms are produced using rules below (please do not confuse that with programming standards, conventions, habits, tricks, etc):

  • . (dot) denotes the short form of the word
  • short form should not end in vowel, й, ь or double consonant

Based on that and taking into account possible user mistakes, typos, common sense and programming habits you may potentially face with the following short genitive forms of month names in the wild:

  • January: янв, янв.
  • February: фев, февр, фев., февр.
  • March: мар, марта, мар., март.
  • April: апр, апр.
  • May: мая, мая.
  • June: июн, июня, июн.
  • July: июл, июля, июл.
  • August: авг, авг.
  • September: сен, сент, сен., сент.
  • October: окт, окт.
  • November: ноя, нояб, ноя., нояб.
  • December: дек, дек.
  • Related