I am doing big data analysis, and I want to add up each individual's physical activity.
pha_04z1
is the number of days of vigorous physical activity in the last week, and pha_05z1
and pha_06z1
are hours and minutes. pha_07z1
is the number of days of moderate physical activity in a week, and pha_08z1
and pha_09z1
are hours and minutes. Although it is not in raw-data, I am trying to create variables called ph_a0100
, ph_0200
, ph_0300
, ph_0400
, ph_0500
to obtain the final physical activity amount.
Moderate or more physical activity is defined as
'in the past week, 20 minutes or more per day and 3 days a week or more' is defined.
The SAS codes for this are as follows.
if 0<=pha_05z1 <=24 and pha_06z1=. then do;
ph_a0100=pha_05z1*60;
end;
if a<=pha_05z1<=24 and 0<=pha_06z1 <=59 then do;
ph_a0100=(pha_05z1*60) pha_06z1;
end;
if pha_05z1=. and 0<=pha_06z1<=59 then do;
ph_a0100=pha_06z1;
end;
if pha_04z1 in (0:2) the ph_a0200=0;
else if pha_04z1 in (3:7) then do;
if ph_a0100=. then ph_a0200=.;
else if 0<=ph_a0100<=19 then ph_a0200=0;
else if 20<=ph_a0100 then ph_a0200=1;
end;
-----
(Abstinence from vigorous physical activity)
-----
if ph_a0200=1 or ph_a0400=1 then ph_a0500=1;
else if ph_a0200=0 and ph_a0400=0 then ph_a0500=0;
Below is what I made in R code.
if (pha_05z1<=24 & pha_05z1>=0 & pha_06z1==88)
{
ch2020_$ph_a0100 <- pha_05z1*60
}
if (pha_05z1<=24 & pha_05z1>=0 & pha_06z1<=59 & pha_06z1>=0)
{
ch2020_$ph_a0100 <- pha_05z1*60 pha_06z1
}
if (pha_05z1==88 & pha_06z1<=59 & pha_06z1>=0)
{
ch2020_$ph_a0100 <- pha_06z1
}
ch2020_$ph_a0200 <-
ifelse(pha_04z1%in%c(0,1,2),0,
ifelse(pha_04z1>=3 & ch2020_$ph_a0100==NA),NA,
ifelse(ch2020_$ph_a0100<=19 & ch2020_$ph_a0100 >=0),0,1)
This code doesn't work. How can I solve this? Thank you in advance. Please help.
CodePudding user response:
Lacking data, I'll throw out some code, perhaps it'll work.
Realize that while in SAS, you can do a conditional reassignment with your
if
statements, in R the<-
operator overwrites everything unless the LHS is an indexed reference (using[
or$
). (I do not know SAS well, so I could be mistaken in the interpretation of your code.)Don't use single-
&
in anif
statement unless it is wrapped in logical aggregators such asany
,all
, etc. While it currently works, the premise of R'sif
operator is that its condition must be length exactly 1, anything else is a mistake (and, in R-4.2.0, will lead to an error, not just a warning).I'm inferring that your references to
pha_06z1
and the like are actually columns withinch2020_
. Lack of data makes this hard to know for sure.
Three ways:
Nested
ifelse
:ch2020_$ph_a0100 <- ifelse(ch2020_$pha_05z1<=24 & ch2020_$pha_05z1>=0 & ch2020_$pha_06z1==88, ch2020_$pha_05z1*60, ifelse(ch2020_$pha_05z1<=24 & ch2020_$pha_05z1>=0 & ch2020_$pha_06z1<=59 & ch2020_$pha_06z1>=0, ch2020_$pha_05z1*60 ch2020_$pha_06z1, ifelse(ch2020_$pha_05z1==88 & ch2020_$pha_06z1<=59 & ch2020_$pha_06z1>=0, ch2020_$pha_06z1, ch2020_$ph_a0100)))
Nested
ifelse
, but wrapping it inwith
to make it a little more readable:ch2020_$ph_a0100 <- with(ch2020_, ifelse(pha_05z1<=24 & pha_05z1>=0 & pha_06z1==88, pha_05z1*60, ifelse(pha_05z1<=24 & pha_05z1>=0 & pha_06z1<=59 & pha_06z1>=0, pha_05z1*60 pha_06z1, ifelse(pha_05z1==88 & pha_06z1<=59 & pha_06z1>=0, pha_06z1, ph_a0100))) )
Assign a default value, then iteratively replace sub-indexed portions.
# ch2020_$ph_a0100 is predefined with some value or just NA ind <- with(ch2020_, pha_05z1<=24 & pha_05z1>=0 & pha_06z1==88) ch2020_$ph_a0100[ind] <- ch2020_$pha_05z1[ind] * 60 ind <- with(ch2020_, pha_05z1<=24 & pha_05z1>=0 & pha_06z1<=59 & pha_06z1>=0) ch2020_$ph_a0100[ind] <- with(ch2020_, pha_05z1[ind]*60 pha_06z1[ind]) ind <- with(ch2020_, pha_05z1==88 & pha_06z1<=59 & pha_06z1>=0) ch2020_$ph_a0100[ind] <- ch2020_$pha_06z1