Home > Blockchain >  Conditional changing a value of a variable in dplyr. Can't get `replace()` to work
Conditional changing a value of a variable in dplyr. Can't get `replace()` to work

Time:12-31

There are some mistakes in the Yahoo finance historical price for the Brazilian stock "PFRM3.SA". I was trying to conditionally change a value using dplyr. I managed to do it using case_when() but I couldn't make it work using replace(), which I think should be a more efficient solution.

library(tidyverse) 
library(tidyquant)

price <- tq_get("PFRM3.SA", from = "2020-01-01", get = "stock.prices") #Data

#visual inspection

price %>% 
  ggplot(aes(x = date, y = close))  
  geom_line() 

#the decimal digit was misplaced a few times

#this works fine

price %>%  
  mutate(close = case_when(close > 100 ~ close/100, TRUE ~ close)) 

#However I don't understand why this do not work

price %>%  
  mutate(close = replace(close, close > 100, close/100))

The last line of code generates this error:

Warning messages:
1: Problem with `mutate()` input `close`.
ℹ number of items to replace is not a multiple of replacement length
ℹ Input `close` is `replace(close, close > 100, close/100)`. 
2: In x[list] <- values :
  number of items to replace is not a multiple of replacement length

I don't understand why replace() doesn't return the original close value when the condition isn't met. Also, I don't know how to make it work. Thanks

CodePudding user response:

  • The replace() function in R syntax is very simple and easy to implement.
  • It includes the vector, index vector, and the replacement values as well as shown below.

replace(x, list, values)

x = vector having some values -> close

list = this can be an index vector -> close > 100

Values = the replacement values -> close[close>100]/100

  • Explanation:

You are conditionally indexing the close column, therefore in the Values argument of replace (= the replacement values) you have to conditionally index the replacement value; here close/100 with close[close>100]/100:

price %>%  
  mutate(close = replace(close, close > 100, close[close>100]/100)) %>% 
  data.frame()
    symbol       date   open   high    low close  volume   adjusted
1   PFRM3.SA 2020-01-02   6.59   6.67   6.49  6.60  549100   6.263065
2   PFRM3.SA 2020-01-03   6.50   6.99   6.40  6.96 1202000   6.604687
3   PFRM3.SA 2020-01-06   7.07   7.71   7.07  7.69 2692600   7.297420
4   PFRM3.SA 2020-01-07   7.42   7.50   7.05  7.05 2325900   6.690092
5   PFRM3.SA 2020-01-08   7.01   7.28   6.80  7.13 1588900   6.766008
6   PFRM3.SA 2020-01-09   7.21   7.22   6.98  7.12  638800   6.756518
7   PFRM3.SA 2020-01-10   7.12   7.23   6.96  6.96  658000   6.604687
8   PFRM3.SA 2020-01-13   7.01   7.17   6.96  7.10  586700   6.737539
9   PFRM3.SA 2020-01-14   7.12   7.15   6.95  7.00  663700   6.642644
10  PFRM3.SA 2020-01-15   7.02   7.39   7.02  7.27 1173000   6.898861
11  PFRM3.SA 2020-01-16   7.35   7.45   7.09  7.16  816300   6.794476
12  PFRM3.SA 2020-01-17   7.24   7.29   7.07  7.17  669900   6.803966
13  PFRM3.SA 2020-01-20   7.20   7.29   7.10  7.23  548300   6.860903
14  PFRM3.SA 2020-01-21   7.23   7.40   7.13  7.17  722500   6.803966
15  PFRM3.SA 2020-01-22   7.23   7.29   6.88  7.09  697400   6.728050
16  PFRM3.SA 2020-01-23   7.09   7.15   6.95  7.04  670500   6.680603
17  PFRM3.SA 2020-01-24   7.07   7.29   7.02  7.15  552000   6.784987
18  PFRM3.SA 2020-01-27   7.08   7.08   6.70  6.95  681300   6.595197
19  PFRM3.SA 2020-01-28   6.94   7.06   6.61  6.90  703200   6.547750
20  PFRM3.SA 2020-01-29   6.92   7.03   6.72  6.72  406900   6.376938
21  PFRM3.SA 2020-01-30   6.60   6.65   6.40  6.64  786600   6.301023
22  PFRM3.SA 2020-01-31   6.51   6.83   6.49  6.49  379200   6.158680
23  PFRM3.SA 2020-02-03   6.45   6.69   6.37  6.66  491100   6.320002
24  PFRM3.SA 2020-02-04   6.71   6.90   6.68  6.90  388800   6.547750
25  PFRM3.SA 2020-02-05   6.85   6.94   6.64  6.81  499100   6.462344
26  PFRM3.SA 2020-02-06   6.89   6.89   6.72  6.83  302400   6.481323
27  PFRM3.SA 2020-02-07   6.66   6.82   6.57  6.60  280100   6.263065
28  PFRM3.SA 2020-02-10   6.60   6.60   6.23  6.30  537700   5.978380
29  PFRM3.SA 2020-02-11   6.35   6.52   6.26  6.50  455100   6.168170
30  PFRM3.SA 2020-02-12   6.51   6.57   6.37  6.45  361200   6.120722
31  PFRM3.SA 2020-02-13   6.35   6.38   6.18  6.30  338800   5.978380
32  PFRM3.SA 2020-02-14   6.29   6.35   6.12  6.26  366100   5.940423
33  PFRM3.SA 2020-02-17   6.32   6.44   6.20  6.20  582900   5.883485
34  PFRM3.SA 2020-02-18   6.20   6.50   6.15  6.48  341200   6.149191
35  PFRM3.SA 2020-02-19   6.50   6.83   6.40  6.67  722800   6.329491
36  PFRM3.SA 2020-02-20   6.70   6.78   6.62  6.66  206200   6.320002
37  PFRM3.SA 2020-02-21   6.61   6.62   6.41  6.52  261500   6.187149
38  PFRM3.SA 2020-02-26   6.24   6.24   5.89  5.99  514500   5.684206
39  PFRM3.SA 2020-02-27   5.85   6.04   5.61  5.75  654500   5.456458
40  PFRM3.SA 2020-02-28   5.70   5.76   5.51  5.76  538900   5.465948
41  PFRM3.SA 2020-03-02   5.82   6.10   5.71  6.05  294000   5.741143
42  PFRM3.SA 2020-03-03   6.16   6.25   5.92  6.05  415000   5.741143
43  PFRM3.SA 2020-03-04   6.09   6.16   5.90  6.05  465100   5.741143
44  PFRM3.SA 2020-03-05   6.05   6.06   5.60  5.75  415300   5.456458
45  PFRM3.SA 2020-03-06   5.46   5.60   5.29  5.60  609400   5.314116
46  PFRM3.SA 2020-03-09   4.89   5.16   4.63  4.75  920400   4.507509
47  PFRM3.SA 2020-03-10   5.12   5.32   4.77  5.32 1060000   5.048410
48  PFRM3.SA 2020-03-11   5.28   5.41   4.69  5.21  870500   4.944026
49  PFRM3.SA 2020-03-12   4.50   4.52   4.02  4.02  515900   3.814776
50  PFRM3.SA 2020-03-13   4.50   4.68   4.10  4.49  706000   4.260782
51  PFRM3.SA 2020-03-16   4.01   4.65   3.90  4.35  564900   4.127929
52  PFRM3.SA 2020-03-17   4.50   4.52   3.96  4.24  672800   4.023544
53  PFRM3.SA 2020-03-18   3.80   3.95   3.29  3.50  870200   3.321322
54  PFRM3.SA 2020-03-19   3.20   3.58   2.89  3.58 1378800   3.397238
55  PFRM3.SA 2020-03-20   3.67   3.96   3.22  3.22 1539200   3.055617
56  PFRM3.SA 2020-03-23   3.23   3.34   2.87  3.10 1183900   2.941742
57  PFRM3.SA 2020-03-24   3.22   3.42   3.13  3.13 1064000   2.970211
58  PFRM3.SA 2020-03-25   3.21   3.66   3.15  3.52  890600   3.340301
59  PFRM3.SA 2020-03-26   3.59   4.30   3.50  4.18 1004100   3.966608
60  PFRM3.SA 2020-03-27   3.90   4.33   3.90  4.25  572100   4.033034
61  PFRM3.SA 2020-03-30   4.26   4.68   4.17  4.27  771200   4.052013
62  PFRM3.SA 2020-03-31   4.20   4.38   4.07  4.30  371300   4.080482
63  PFRM3.SA 2020-04-01   4.11   4.25   3.98  4.11  411100   3.900181
64  PFRM3.SA 2020-04-02 415.00 425.00 402.00  4.18  282600 396.660767
65  PFRM3.SA 2020-04-03 410.00 420.00 383.00  4.00  351100 379.579681
66  PFRM3.SA 2020-04-06 421.00 423.00 395.00  3.99  639500 378.630737
67  PFRM3.SA 2020-04-07 418.00 434.00 397.00  4.05  455800 384.324432
68  PFRM3.SA 2020-04-08 410.00 430.00 400.00  4.28  443600 406.150269
69  PFRM3.SA 2020-04-09 431.00 447.00 424.00  4.29  417900 407.099213
70  PFRM3.SA 2020-04-13 429.00 445.00 423.00  4.39  287500 416.588715
71  PFRM3.SA 2020-04-14 450.00 455.00 442.00  4.43  273200 420.384491
72  PFRM3.SA 2020-04-15 432.00 452.00 421.00  4.44  278800 421.333466
73  PFRM3.SA 2020-04-16   4.50   4.53   4.26  4.34  530800   4.118440
74  PFRM3.SA 2020-04-17   4.54   4.78   4.44  4.64  908400   4.403124
75  PFRM3.SA 2020-04-20   4.46   4.76   4.43  4.60  447200   4.365166
76  PFRM3.SA 2020-04-22   4.60   5.04   4.57  4.89  788600   4.640362
77  PFRM3.SA 2020-04-23   4.97   5.17   4.63  4.77  576500   4.526488
78  PFRM3.SA 2020-04-24   4.62   4.72   4.22  4.49  703200   4.260782
79  PFRM3.SA 2020-04-27   4.61   4.65   4.35  4.63  296400   4.393635
80  PFRM3.SA 2020-04-28   4.65   4.82   4.41  4.58  583300   4.346187
81  PFRM3.SA 2020-04-29   4.62   4.75   4.50  4.74  391800   4.498019
82  PFRM3.SA 2020-04-30   4.75   4.79   4.47  4.59  288300   4.355677
83  PFRM3.SA 2020-05-04   4.41   4.63   4.33  4.63  288400   4.393635
84  PFRM3.SA 2020-05-05   4.63   4.76   4.54  4.70  389700   4.460061
85  PFRM3.SA 2020-05-06   4.69   5.26   4.57  5.22 1389800   4.953515
86  PFRM3.SA 2020-05-07   5.35   5.37   4.61  4.82 1877800   4.573936
87  PFRM3.SA 2020-05-08   4.93   4.98   4.50  4.72 1040300   4.479040
88  PFRM3.SA 2020-05-11   4.70   4.74   4.40  4.49  532600   4.260782
89  PFRM3.SA 2020-05-12   4.50   4.58   4.22  4.36  585100   4.137419
90  PFRM3.SA 2020-05-13   4.40   4.40   4.13  4.19  456100   3.976097
91  PFRM3.SA 2020-05-14   4.13   4.27   3.97  4.27  504600   4.052013
92  PFRM3.SA 2020-05-15   4.24   4.36   4.13  4.19  255500   3.976097
93  PFRM3.SA 2020-05-18   4.30   4.32   4.12  4.19  351000   3.976097
94  PFRM3.SA 2020-05-19   4.19   4.30   4.14  4.18  335200   3.966608
95  PFRM3.SA 2020-05-20   4.21   4.25   4.10  4.20  347500   3.985587
96  PFRM3.SA 2020-05-21   4.16   4.23   4.10  4.20  234300   3.985587
97  PFRM3.SA 2020-05-22   4.20   4.39   4.11  4.39  368500   4.165887
98  PFRM3.SA 2020-05-25   4.42   4.66   4.42  4.64  680200   4.403124
99  PFRM3.SA 2020-05-26   4.74   4.84   4.49  4.60  631300   4.365166
100 PFRM3.SA 2020-05-27   4.60   4.74   4.55  4.66  422400   4.422103
101 PFRM3.SA 2020-05-28   4.66   4.78   4.57  4.60  451200   4.365166
102 PFRM3.SA 2020-05-29   4.65   4.65   4.44  4.56  387400   4.327209
103 PFRM3.SA 2020-06-01   4.50   4.93   4.46  4.74  962300   4.498019
104 PFRM3.SA 2020-06-02   4.77   5.05   4.77  5.00 1063000   4.744746
105 PFRM3.SA 2020-06-03   5.05   5.31   4.86  4.86  924300   4.611893
106 PFRM3.SA 2020-06-04   4.85   5.17   4.71  4.86  768600   4.611893
107 PFRM3.SA 2020-06-05   4.94   5.12   4.85  4.85  661700   4.602404
108 PFRM3.SA 2020-06-08   4.95   5.86   4.95  5.79 2498400   5.494416
109 PFRM3.SA 2020-06-09   5.64   5.93   5.50  5.66 1288500   5.371052
110 PFRM3.SA 2020-06-10   5.76   5.81   5.35  5.35 1140600   5.076878
111 PFRM3.SA 2020-06-12   5.25   5.36   4.92  5.27  894600   5.000962
112 PFRM3.SA 2020-06-15   5.06   5.30   5.00  5.22  618700   4.953515
113 PFRM3.SA 2020-06-16   5.31   5.55   5.25  5.41  656000   5.133815
114 PFRM3.SA 2020-06-17   5.41   5.61   5.37  5.59  597600   5.304626
115 PFRM3.SA 2020-06-18   5.61   5.74   5.48  5.65  440600   5.361563
116 PFRM3.SA 2020-06-19   5.71   5.74   5.47  5.50  558500   5.219221
117 PFRM3.SA 2020-06-22   5.52   5.75   5.50  5.72  554600   5.427989
118 PFRM3.SA 2020-06-23   5.75   6.25   5.74  6.00 1652300   5.693695
119 PFRM3.SA 2020-06-24   6.00   6.15   5.77  5.99  830800   5.684206
120 PFRM3.SA 2020-06-25   5.99   6.04   5.79  5.94  827100   5.636758
121 PFRM3.SA 2020-06-26   5.99   6.03   5.59  5.59  767000   5.304626
122 PFRM3.SA 2020-06-29   5.69   5.74   5.33  5.58  739600   5.295136
123 PFRM3.SA 2020-06-30   5.60   5.70   5.45  5.65  457900   5.361563
124 PFRM3.SA 2020-07-01   5.70   5.75   5.49  5.53  634200   5.247689
125 PFRM3.SA 2020-07-02   5.61   5.64   5.41  5.42  448700   5.143305
 [ reached 'max' / getOption("max.print") -- omitted 369 rows ]

benchmark:

enter image description here

CodePudding user response:

You just need to add in rowwise before replace. Essentially, in your replace statement, close/100 is returning a new calculated value for every row. When in reality, you only want to replace the values greater than 100. So, with using rowwise, then it will only calculate per row, rather than for the whole cost column for the values parameter.

library(tidyverse)

price %>%  
  rowwise %>% 
  mutate(close = replace(close, close > 100, close/100))

Output

# A tibble: 493 × 8
# Rowwise: 
   symbol   date        open  high   low close  volume adjusted
   <chr>    <date>     <dbl> <dbl> <dbl> <dbl>   <dbl>    <dbl>
 1 PFRM3.SA 2020-01-02  6.59  6.67  6.49  6.6   549100     6.26
 2 PFRM3.SA 2020-01-03  6.5   6.99  6.4   6.96 1202000     6.60
 3 PFRM3.SA 2020-01-06  7.07  7.71  7.07  7.69 2692600     7.30
 4 PFRM3.SA 2020-01-07  7.42  7.5   7.05  7.05 2325900     6.69
 5 PFRM3.SA 2020-01-08  7.01  7.28  6.8   7.13 1588900     6.77
 6 PFRM3.SA 2020-01-09  7.21  7.22  6.98  7.12  638800     6.76
 7 PFRM3.SA 2020-01-10  7.12  7.23  6.96  6.96  658000     6.60
 8 PFRM3.SA 2020-01-13  7.01  7.17  6.96  7.1   586700     6.74
 9 PFRM3.SA 2020-01-14  7.12  7.15  6.95  7     663700     6.64
10 PFRM3.SA 2020-01-15  7.02  7.39  7.02  7.27 1173000     6.90
# … with 483 more rows

Another option (provided by @DonaldSeinen) is to use transform, which I've added to the Benchmark below.

transform(price, close = ifelse(close > 100, close / 100, close))

Benchmark

So, it looks like transform (provided by @DonaldSeinen) is by far the fastest. @TarJae provides a nice visual of this!

library(microbenchmark)

microbenchmark(
  replace = price %>%
    rowwise %>%
    mutate(close = replace(close, close > 100, close / 100)),
  case_when = price %>%
    mutate(close = case_when(close > 100 ~ close / 100, TRUE ~ close)),
  times = 1000L,
  control = list(order = 'block')
)

# Unit: microseconds
#        expr      min        lq      mean   median       uq      max neval
#     replace 4149.300 4997.8325 5785.2692 5469.816 5993.926 28077.15  1000
#   case_when 1410.959 1514.8980 1906.9804 1662.030 2138.092 11863.49  1000
# DonaldSeinen  247.421  271.7735  410.3955  315.499  472.813 11546.28  1000
#      TarJae 1536.112 1755.8475 2254.4749 2086.729 2560.107 12802.07  1000
  • Related