Need to map monthname to monthnumber in Azure databricks using scala. I have column name PERIOD which have data like months name(like Jan,Feb,Mar,....,Nov,Dec), I want to replace this monthname with monthnumber (like 01,02,03,...,11,12). Result should be like Jan,Feb,Mar,..,Nov,Dec replaced with 01,02,03,...,11,12.
"Jan" -> 01,
"Feb" -> 02,
"Mar" -> 03,
"Apr" -> 04,
"May" -> 05,
"Jun" -> 06,
"Jul" -> 07,
"Aug" -> 08,
"Sep" -> 09,
"Oct" -> 10,
"Nov" -> 11,
"Dec" -> 12
I'm new to scala and azure databricks. I tried mapping approach but not getting desired solution. enter image description here
CodePudding user response:
I have a dataframe with column called month_in_words
as in the following image:
- Now to convert this month in words to month number, first I have created a map to where key is the month name and value is month number as string (Since we need month number as 01, but not 1).
val monthNumber = Map(
"Jan" -> "01",
"Feb" -> "02",
"Mar" -> "03",
"Apr" -> "04",
"May" -> "05",
"Jun" -> "06",
"Jul" -> "07",
"Aug" -> "08",
"Sep" -> "09",
"Oct" -> "10",
"Nov" -> "11",
"Dec" -> "12"
)
- Now you can use the following code to convert month name to month number:
var final_df = df.rdd.map(f => {
val number = monthNumber(f.getString(0))
(f.getString(0),number)
}).toDF("month_in_words","month_in_number")
//display(df)
- To directly convert, you can use the following code:
var final_df = df.rdd.map(f => monthNumber(f.getString(0))).toDF("month_in_number")
//display(final_df)