I have below data with 5 variables with 5 observations: Name Age Gender Weight country
data have;
length string $30.;
input string$;
datalines;
Naresh30Male70India
Venkey29Male50Kenya
Ravi30Male56Pak
Sai67Female40iran
Divya89Female78Dubai
;
run;
I want to Separate these 5 variables in 5 observations Help me on it
CodePudding user response:
This should work for this data set. But it is not a perfect solution, as it only works on set range of ages/weights. So for vast majority of data (maybe all of yours) it will work just fine, but for outliners it will return strange results. I will update the solution later to include those edge cases.
data have;
input Mixeddata $50.;
datalines;
Naresh30Male70India
Venkey29Male50Kenya
Ravi30Male56Pak
Sai67Female40iran
Divya89Female78Dubai
;
run;
data want;
set have;
name=substr(Mixeddata,1, anydigit(Mixeddata)-1);
age=substr(Mixeddata, anydigit(Mixeddata), 2);
gender=substr(Mixeddata, anydigit(Mixeddata) 2,anydigit(mixeddata,-50)-anydigit(mixeddata)-3 );
weight=substr(Mixeddata, anydigit(Mixeddata,-50)-1, 2);
country=substr(Mixeddata, anydigit(Mixeddata, -50) 1);
run;
CodePudding user response:
Consider the makeup of a single string:
Naresh30Male70India
Each desired column is either a number or a character. If we could break this out into two strings, one with only numbers and one with only characters, we can easily pull the needed values:
string_num: 30 70
string char: Naresh Male India
We can do this with regular expressions by replacing letters with spaces and numbers with spaces.
data want;
set have;
string_num = compbl(prxchange('s/[0-9]/ /', -1, string) );
string_char = compbl(prxchange('s/[a-zA-Z]/ /', -1, string) );
name = scan(string_num, 1);
age = scan(string_char, 1);
gender = scan(string_num, 2);
weight = scan(string_char, 2);
country = scan(string_num, 3);
drop string_num string_char;
run;
Note that we use the compbl
function to remove any extra spaces to make it easier to read for learning purposes, but this is an optional step.