Home > other >  Why does my regex only change my first entry in SAS?
Why does my regex only change my first entry in SAS?

Time:11-09

I have a number of text entries (municipalities) from which I need to remove the s at the end.

Data test;
input city $;
datalines;
arjepogs
askers
Londons
;
run;

data cities;
set test;
if prxmatch("/^(.*?)s$/",city) 
then city=prxchange("s/^(.*?)s$/$1/",-1,city);
run;

Strangely enough, my s's are only removed from my first entry.

enter image description here

What am I doing wrong?

CodePudding user response:

You defined CITY as length $8. The s in Londons is in the 7th position of the string. Not the LAST position of the string. Use the TRIM() function to remove the trailing spaces from the value of the variable.

data have;
  input city $20.;
datalines;
arjepogs
Kent
askers
Londons
;

data want;
  set have;
  length new_city $20 ;
  new_city=prxchange("s/^(.*?)s$/$1/",-1,trim(city));
run;

Result

Obs    city        new_city

 1     arjepogs    arjepog
 2     Kent        Kent
 3     askers      asker
 4     Londons     London

You could also just change the REGEX to account for the trailing spaces.

new_city=prxchange("s/^(.*?)s\ *$/$1/",-1,city);
  • Related