Home > Software engineering >  R mutate new column based on range of values in other column
R mutate new column based on range of values in other column

Time:04-06

I have r dataframe in following format

 -------- --------------- -------------------- -------- 
| time   | Stress_ratio  | shear_displacement |   CX   |
 -------- --------------- -------------------- -------- 
| <dbl>  |    <dbl>      |    <dbl>           | <dbl>  | 
| 50.1   |    -0.224     |    4.9             |  0     | 
| 50.2   |    -0.219     |    4.98            | 0.0100 | 
| .      | .             | .                  | .      |
| .      | .             | .                  | .      | 
| 249.3  |    -0.217     | 4.97               | 0.0200 |  
| 250.4  |    -0.214     | 4.96               | 0.0300 | 
| 251.1  | -0.222        | 4.91               | 0.06   | 
| 252.1  | -0.222        | 4.91               | 0.06   | 
| 253.3  | -0.222        | 4.91               | 0.06   | 
| 254.5  | -0.222        | 4.91               | 0.06   | 
| 256.8  | -0.222        | 4.91               | 0.06   | 
| .      | .             | .                  | .      | 
| .      | .             | .                  | .      |
| 500.1  | -0.22         | 4.91               | 0.6    |    
| 501.4  | -0.22         | 4.91               | 0.6    | 
| 503.1  | -0.22         | 4.91               | 0.6    | 
 -------- --------------- -------------------- -------- 

and I want a new column which has repetitive values based on the difference between a range of values in column time. The range should be 250 for the column time. For example in all the rows of new_column I should get number 1 when df$time[1] and df$time[1]*4.98 is 250. Similarly this number 1 should change to 2 when the next chunk starts of difference of 250. So the new dataframe should be like

 -------- --------------- -------------------- -------- ------------ 
| time   | Stress_ratio  | shear_displacement |   CX   | new_column |
 -------- --------------- -------------------- -------- ------------ 
| <dbl>  |    <dbl>      |    <dbl>           | <dbl>  | <dbl>      |
| 50.1   |    -0.224     |    4.9             |  0     | 1          |
| 50.2   |    -0.219     |    4.98            | 0.0100 | 1          |
| .      | .             | .                  | .      | 1          |
| .      | .             | .                  | .      | 1          |
| 249.3  |    -0.217     | 4.97               | 0.0200 | 1          |
| 250.4  |    -0.214     | 4.96               | 0.0300 | 2          |
| 251.1  | -0.222        | 4.91               | 0.06   | 2          |
| 252.1  | -0.222        | 4.91               | 0.06   | 2          |
| 253.3  | -0.222        | 4.91               | 0.06   | 2          |
| 254.5  | -0.222        | 4.91               | 0.06   | 2          |
| 256.8  | -0.222        | 4.91               | 0.06   | 2          |
| .      | .             | .                  | .      | .          |
| .      | .             | .                  | .      | .          |
| 499.1  | -0.22         | 4.91               | 0.6    | 2          |
| 501.4  | -0.22         | 4.91               | 0.6    | 3          |
| 503.1  | -0.22         | 4.91               | 0.6    | 3          |
 -------- --------------- -------------------- -------- ------------ 

CodePudding user response:

If I understand what you're trying to do, a base R solution could be:

df$new_column <- df$time %/% 250   1

The %/% operator is integer division (sort of the complement of the modulus operator) and tells you how many copies of 250 would fit into your number; we add 1 to get the value you want.

The tidyverse version:

df <- df %>%
  mutate(new_column = time %/% 250   1)

CodePudding user response:

library(data.table)
setDT(df)[, new_column := rleid(time %/% 250)][]
  • Related