I have a df that uses numbers to refer to categorical data and a CSV that defines the categories (1=Smoker, 2=Non-Smoker, etc). In SAS, I was able to convert the format CSV into a format file and apply these formats to the variables:
data want;
set have;
formatted = put(varX,custFormat.);
run;
This would provide me with the output:
varX formatted
1 Smoker
2 Non-Smoker
3 Occasional Smoker
1 Smoker
Given that I have a csv with all the formats, I could bring this in and merge to my R df to have the formats in a new column:
print(have)
varX
1
2
3
1
print(format.file)
formatIndex group
1 Smoker
2 Non-Smoker
3 Occasional Smoker
11 Female
12 Male
13 Unknown
df.format <- merge(have, format.file, by.x = "varX", by.y = "formatIndex")
print(df.format)
varX group
1 Smoker
2 Non-Smoker
3 Occasional Smoker
1 Smoker
The issue with a join approach is I often want to apply the same formats for many columns (i.e. varX
, varY
, and varZ
all use different formatIndex
). Is there a similar method of applying formats to variables as SAS has?
CodePudding user response:
You could use plyr::mapvalues
within the across
verb.
Example:
df <- data.frame(V1 = c(1,2,3,4),
V2 = c(2,3,1,3))
V1 V2
1 1 2
2 2 3
3 3 1
4 4 3
liste_format <- data.frame(ID = c(1,2,3,4),
group = c("Smoker","Non-Smoker","Occasional Somker","Unknown"))
ID group
1 1 Smoker
2 2 Non-Smoker
3 3 Occasional Somker
4 4 Unknown
library(dplyr)
df |>
mutate(across(V1:V2,
~ plyr::mapvalues(.,
from = liste_format$ID,
to = liste_format$group,
warn_missing = F),
.names = "format_{.col}"))
V1 V2 format_V1 format_V2
1 1 2 Smoker Non-Smoker
2 2 3 Non-Smoker Occasional Somker
3 3 1 Occasional Somker Smoker
4 4 3 Unknown Occasional Somker