Separating character and numeric variables without delimiter in SAS-CodePudding

I have below data with 5 variables with 5 observations: Name Age Gender Weight country

data have;
    length string $30.;
    input string$;
    datalines;
Naresh30Male70India
Venkey29Male50Kenya
Ravi30Male56Pak
Sai67Female40iran
Divya89Female78Dubai
;
run;

I want to Separate these 5 variables in 5 observations Help me on it

CodePudding user response：

This should work for this data set. But it is not a perfect solution, as it only works on set range of ages/weights. So for vast majority of data (maybe all of yours) it will work just fine, but for outliners it will return strange results. I will update the solution later to include those edge cases.

data have;
input Mixeddata $50.;
datalines; 
Naresh30Male70India
Venkey29Male50Kenya
Ravi30Male56Pak
Sai67Female40iran
Divya89Female78Dubai
;
run;

data want;
    set have;
    name=substr(Mixeddata,1, anydigit(Mixeddata)-1);
    age=substr(Mixeddata, anydigit(Mixeddata), 2);
    gender=substr(Mixeddata, anydigit(Mixeddata) 2,anydigit(mixeddata,-50)-anydigit(mixeddata)-3 );
    weight=substr(Mixeddata, anydigit(Mixeddata,-50)-1, 2);
    country=substr(Mixeddata, anydigit(Mixeddata, -50) 1);
run;

CodePudding user response：

Consider the makeup of a single string:

Naresh30Male70India

Each desired column is either a number or a character. If we could break this out into two strings, one with only numbers and one with only characters, we can easily pull the needed values:

string_num:  30 70
string char: Naresh Male India

We can do this with regular expressions by replacing letters with spaces and numbers with spaces.

data want;
    set have;

    string_num  = compbl(prxchange('s/[0-9]/ /', -1, string) );
    string_char = compbl(prxchange('s/[a-zA-Z]/ /', -1, string) );

    name    = scan(string_num, 1);
    age     = scan(string_char, 1);
    gender  = scan(string_num, 2);
    weight  = scan(string_char, 2);
    country = scan(string_num, 3);

    drop string_num string_char;
run;

Note that we use the compbl function to remove any extra spaces to make it easier to read for learning purposes, but this is an optional step.