Home > Net >  What is the best way of converting roman numerals to arabic ones in bigquery?
What is the best way of converting roman numerals to arabic ones in bigquery?

Time:10-30

As in the title. The best thing that comes to my mind is using basic REPLACE function, but it fails with anything more complex than just I, V, X etc.

The Roman numerals in my data are located at the end of the string, so based on the well-known movie Fast & Furious my data looks like this:

movie
Fast & Furious I
Fast & Furious II
Fast & Furious III
Fast & Furious IV
Fast & Furious V
Fast & Furious VI
Fast & Furious VII

CodePudding user response:

The Roman numerals in my data are located at the end of the string

Consider below approach (along with some dummy data in addition to yours)

create temp function deromanize (number STRING) returns STRING 
language js as '''
  var number = number.toUpperCase(),
  validator = /^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/,
  token = /[MDLV]|C[MD]?|X[CL]?|I[XV]?/g,
  key = {M:1000,CM:900,D:500,CD:400,C:100,XC:90,L:50,XL:40,X:10,IX:9,V:5,IV:4,I:1},
  num = 0, m;
  if (!(number && validator.test(number))) return false;
  while (m = token.exec(number)) num  = key[m[0]];
  return num;
''';
with your_table as (
  select 'Fast & Furious I' movie union all
  select 'Fast & Furious II' union all
  select 'Fast & Furious III' union all
  select 'Fast & Furious IV' union all
  select 'Fast & Furious V' union all
  select 'Fast & Furious VI' union all
  select 'Fast & Furious VII' union all
  select 'Fast & Furious XXXIX' union all
  select 'Fast & Furious LXXI' union all
  select 'Fast & Furious MDCCLXXIV' 
)
select movie, 
  replace(movie, roman_number, deromanize(roman_number)) converted_title
from your_table, 
unnest([struct(array_reverse(split(movie, ' '))[offset(0)] as roman_number)])

with output

enter image description here

  • Related