I have a MySQL table with School Data that looks like this:
| Associated Hebrew Schools of Toronto - Central Administration - North York |
| SIr Allen McNab |
| West Ridge |
| CNC |
| krss |
| College of new Caledonia |
| yarmishl1 |
| college of new calendonia |
| ben |
| john mcginnis |
| Thousand Islands Secondary School - Brockville |
| École élémentaire Richardson Elementary - Delta |
| Centre of Excellence for Entrepreneurship |
| St. Gabriel Cyber School - Red Deer |
| Highwood School - Calgary |
| Janus Academy - Calgary
Many Schools have City names at the end e.g. Highwood School - Calgary
and many schools
don't have city name at the end e.g. college of new calendonia
I want to remove the City names from the data that have City name at the end of the column school
I tried to write the following SQL Statements by referring to some of the previously asked questions:
select school from table_name where school REGEXP '\-\\s\-\\s.*$';
select school from table_name where school REGEXP '\-\s\-\s.*$';
Essentially, I am trying to find the schools ending with the pattern - city_name -
and remove the - city_name -
.
If I am able to select the above REGEX, I can use CASE when for removing if I mind a matching pattern.
Please note: I went through the Regex docs and could not find anything related.
CodePudding user response:
You can use
SELECT
CASE
WHEN INSTR(school, " - ") > 0 THEN
LEFT(school, CHAR_LENGTH(school) - LOCATE(" - ", REVERSE(school))-2)
ELSE
school
END
As RESULT FROM table_name;
Also, if you are using MySQL v.8 , you can use REGEXP_REPLACE
:
create table table_name (school varchar(320), ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP);
insert into table_name (school) values ("Associated Hebrew Schools of Toronto - Central Administration - North York");
insert into table_name (school) values ("Associated Hebrew Schools of Toronto - North York");
insert into table_name (school) values ("Associated Hebrew Schools of Toronto");
SELECT REGEXP_REPLACE(school, "(.*)\\s-\\s.*", "$1") As RESULT FROM table_name;
Here, (.*)\s-\s.*
matches and captures into Group 1 any text up to [whitespace]-[whitespace], then the latter is matched and the .*
matches and consumes the rest of the string; the $1
replacement puts back the Group 1 value. See