Home > front end >  Apply REGEXP pattern to string column in database
Apply REGEXP pattern to string column in database

Time:03-31

I am trying to manipulate a string column by applying a REGEXP pattern to it.

All values are actually floats but converted to string. Point is to remove any trailing zeros from the numbers as shown below.

input -> output
1.000 -> 1
1.100 -> 1.1
1.001 -> 1.001
0.001 -> 0.001
0.010 -> 0.01

I can use the REGEXP ^(\d (?:\.\d*?[1-9](?=0|\b))?)\.?0*$ to achieve the above as shown here.

Problem: How do apply the above REGEXP to all rows of a string column using SQLalchemy? The manipulation has to be done on the database side, moving data out of the DB is not an option due to the size of the data.

The Postgres equivalent is:

select TRIM(trailing '00' FROM CAST(numeric_col::decimal(32, 8) as text))
from my_table;

Any help is appreciated.

CodePudding user response:

Given a table like this:

test# table t71689267;
 id │  col  
════╪═══════
  1 │ 1.000
  2 │ 1.100
  3 │ 1.001
  4 │ 0.001
  5 │ 0.010

you could do this* in Postgresql (note that regexp_match returns an array):

test# select col, regexp_match(col, '^(\d (?:\.\d*?[^0])?)\.?0*$') from t71689267;
  col  │ regexp_match 
═══════╪══════════════
 1.000 │ {1}
 1.100 │ {1.1}
 1.001 │ {1.001}
 0.001 │ {0.001}
 0.010 │ {0.01}

The equivalent in SQLAlchemy would be

import sqlalchemy as sa
...
with engine.connect() as conn:
    # tbl is a SQLAlchemy Table instance corresponding to the table in the database
    q = sa.select(sa.func.regexp_match(tbl.c.col, r'^(\d (?:\.\d*?[^0])?)\.?0*$'))
    rows = conn.execute(q)
    for row in rows.scalars():
        print(row)
    print()

giving

['1']
['1.1']
['1.001']
['0.001']
['0.01']

* The pattern in the question doesn't seem to work for all cases; the one that I've used works for all the strings in the question. You can of course substitute whatever pattern you want to use.

  • Related