Home > Enterprise >  Applying custom formats to numeric variables in R
Applying custom formats to numeric variables in R

Time:07-13

I have a df that uses numbers to refer to categorical data and a CSV that defines the categories (1=Smoker, 2=Non-Smoker, etc). In SAS, I was able to convert the format CSV into a format file and apply these formats to the variables:

data want;
    set have;
    formatted = put(varX,custFormat.);
run;

This would provide me with the output:

varX   formatted
1        Smoker
2        Non-Smoker
3        Occasional Smoker
1        Smoker

Given that I have a csv with all the formats, I could bring this in and merge to my R df to have the formats in a new column:

print(have)
varX
1
2
3
1

print(format.file)
formatIndex  group
    1        Smoker
    2        Non-Smoker
    3        Occasional Smoker
    11       Female
    12       Male
    13       Unknown

df.format <- merge(have, format.file, by.x = "varX", by.y = "formatIndex")

print(df.format)
varX    group
  1     Smoker
  2     Non-Smoker
  3     Occasional Smoker
  1     Smoker

The issue with a join approach is I often want to apply the same formats for many columns (i.e. varX, varY, and varZ all use different formatIndex). Is there a similar method of applying formats to variables as SAS has?

CodePudding user response:

You could use plyr::mapvalues within the across verb.

Example:

df <- data.frame(V1 = c(1,2,3,4),
                 V2 = c(2,3,1,3))

  V1 V2
1  1  2
2  2  3
3  3  1
4  4  3


liste_format <- data.frame(ID = c(1,2,3,4),
                           group = c("Smoker","Non-Smoker","Occasional Somker","Unknown"))

  ID             group
1  1            Smoker
2  2        Non-Smoker
3  3 Occasional Somker
4  4           Unknown



  library(dplyr)
df |> 
  mutate(across(V1:V2,
                ~ plyr::mapvalues(.,
                                  from = liste_format$ID,
                                  to = liste_format$group,
                                  warn_missing = F),
                .names = "format_{.col}"))
    
  V1 V2         format_V1         format_V2
1  1  2            Smoker        Non-Smoker
2  2  3        Non-Smoker Occasional Somker
3  3  1 Occasional Somker            Smoker
4  4  3           Unknown Occasional Somker
  •  Tags:  
  • r
  • Related