Home > Back-end >  adding an else if test into a dataframe
adding an else if test into a dataframe

Time:12-23

._883<-quantmod::getSymbols("0883.HK",from="2022-04-21",to="2022-12-22",auto.assign = FALSE)
._600938<-quantmod::getSymbols("600938.SS",from="2022-04-21",to="2022-12-22",auto.assign = FALSE)

x<-cbind(._883[,6],._600938[,6])
x$diff<-x[,2]-x[,1]
na.omit(x)
mean(x[,3],na.rm=TRUE)

x$diff2<-
  if (x[,3]<mean(x[,3],na.rm=TRUE)){
  print("below average")
} else {
  print("above average")
}

I am trying to compare the values in the 3rd column to its mean, and print below or above average accordingly, however NA is returned, mind taking a look and see what went wrong ? Million thanks.

Update 1:

> x$diff2<-
    ifelse (x[,3]<mean(x[,3],na.rm=TRUE), "below average", "above average")
Warning message:
In merge.xts(..., all = all, fill = fill, suffixes = suffixes) :
  NAs introduced by coercion
> na.omit(x)
     [,1] [,2] [,3] [,4]

CodePudding user response:

Use ifelse (the vectorized form of an if else control):

x$diff2<-
  ifelse (x[,3]<mean(x[,3],na.rm=TRUE), "below average", "above average")
   X0883.HK.Adjusted X600938.SS.Adjusted diff       diff2          
2022-04-21 "9.19987"         "12.460269"         "3.260399" "below average"
2022-04-22 "9.301339"        "13.7072"           "4.405861" "below average"
2022-04-25 "8.624878"        "12.912055"         "4.287177" "below average"
2022-04-26 "8.286647"        "12.306663"         "4.020016" "below average"
2022-04-27 "8.844728"        "13.535521"         "4.690793" "below average"
2022-04-28 "9.166047"        "13.969235"         "4.803188" "below average"
2022-04-29 "9.487366"        "15.369774"         "5.882408" "above average"
2022-05-03 "9.385897"        NA                  NA         NA             
2022-05-04 "9.250604"        NA                  NA         NA             
2022-05-05 "9.318251"        "14.63788"          "5.319629" "below average"
2022-05-06 "9.216781"        "14.168022" ...etc.

CodePudding user response:

I didn't manage to get your data. See if that is what you need. I've created some fake data.

# Input
set.seed(123)
df <- data.frame(col1 = rnorm(10), col2 = rnorm(10), col3 = rnorm(10))
df
#>           col1       col2       col3
#> 1  -0.56047565  1.2240818 -1.0678237
#> 2  -0.23017749  0.3598138 -0.2179749
#> 3   1.55870831  0.4007715 -1.0260044
#> 4   0.07050839  0.1106827 -0.7288912
#> 5   0.12928774 -0.5558411 -0.6250393
#> 6   1.71506499  1.7869131 -1.6866933
#> 7   0.46091621  0.4978505  0.8377870
#> 8  -1.26506123 -1.9666172  0.1533731
#> 9  -0.68685285  0.7013559 -1.1381369
#> 10 -0.44566197 -0.4727914  1.2538149

# Code solution
mean_col3 <- mean(df[,3], na.rm = TRUE)
df$col4 <- ifelse(df[,3] < mean_col3, "below average", "above average")

# output
df
#>           col1       col2       col3          col4
#> 1  -0.56047565  1.2240818 -1.0678237 below average
#> 2  -0.23017749  0.3598138 -0.2179749 above average
#> 3   1.55870831  0.4007715 -1.0260044 below average
#> 4   0.07050839  0.1106827 -0.7288912 below average
#> 5   0.12928774 -0.5558411 -0.6250393 below average
#> 6   1.71506499  1.7869131 -1.6866933 below average
#> 7   0.46091621  0.4978505  0.8377870 above average
#> 8  -1.26506123 -1.9666172  0.1533731 above average
#> 9  -0.68685285  0.7013559 -1.1381369 below average
#> 10 -0.44566197 -0.4727914  1.2538149 above average

Created on 2022-12-22 with reprex v2.0.2

CodePudding user response:

A tidyverse approach

library(tidyverse)
library(tidyquant)

df <- left_join(
  tq_get("0883.HK", from = "2022-04-21", to = today()) %>%
    select(date, "HK" = adjusted),
  tq_get("600938.SS", from = "2022-04-21", to = today()) %>%
    select(date, "SS" = adjusted)
)

# A tibble: 169 × 3
   date          HK    SS
   <date>     <dbl> <dbl>
 1 2022-04-21  9.20  12.5
 2 2022-04-22  9.30  13.7
 3 2022-04-25  8.62  12.9
 4 2022-04-26  8.29  12.3
 5 2022-04-27  8.84  13.5
 6 2022-04-28  9.17  14.0
 7 2022-04-29  9.49  15.4
 8 2022-05-03  9.39  NA  
 9 2022-05-04  9.25  NA  
10 2022-05-05  9.32  14.6
# … with 159 more rows

df %>%
  mutate(diff = SS - HK,
         avg = if_else(diff < mean(diff, na.rm = TRUE), "Below", "Above"))

# A tibble: 169 × 5
   date          HK    SS  diff avg  
   <date>     <dbl> <dbl> <dbl> <chr>
 1 2022-04-21  9.20  12.5  3.26 Below
 2 2022-04-22  9.30  13.7  4.41 Below
 3 2022-04-25  8.62  12.9  4.29 Below
 4 2022-04-26  8.29  12.3  4.02 Below
 5 2022-04-27  8.84  13.5  4.69 Below
 6 2022-04-28  9.17  14.0  4.80 Below
 7 2022-04-29  9.49  15.4  5.88 Above
 8 2022-05-03  9.39  NA   NA    NA   
 9 2022-05-04  9.25  NA   NA    NA   
10 2022-05-05  9.32  14.6  5.32 Below
# … with 159 more rows
  • Related