I have a data set of students that calculates average scores (to the nearest decimal) as of a certain date.The data set is riddled with typos. A sample data is:
student_id date avg_test_score
ab_1 1/2/20 95..6
ab_2 1/2/20 60.7
ab_3 2/4/20 88..7
ab_4 2/4/20 98.7.
This may seem straightforward but I'm having a difficult time with the query in postgresql.
Thanks!
CodePudding user response:
You could try the following regex update on the avg_test_score
column:
UPDATE yourTable
SET avg_test_score = SUBSTRING(
REGEXP_REPLACE(avg_test_score, '\.{2,}', '.') FROM '\d (?:\.\d )?');
The above logic first replaces sequences of two or more dots with just a single dot. Then we extract integer or floating point numbers from what remains.