I have a dataframe like so:
TS Device1.max Device2.max Device3.max Device4.max
18:02:44 FALSE FALSE TRUE FALSE
18:02:45 TRUE TRUE FALSE FALSE
18:02:46 FALSE FALSE FALSE TRUE
18:02:47 FALSE FALSE FALSE FALSE
18:02:48 FALSE FALSE FALSE FALSE
18:02:49 FALSE FALSE FALSE FALSE
18:02:50 FALSE FALSE FALSE FALSE
18:02:51 FALSE FALSE FALSE FALSE
18:02:52 FALSE FALSE FALSE TRUE
18:02:53 FALSE TRUE FALSE FALSE
18:02:54 FALSE FALSE FALSE FALSE
To get the true false columns I used the following code:
df$Device1.max = ifelse(df$Device1 == max(df$Device1), 'true','false')
df$Device2.max = ifelse(df$Device2 == max(df$Device2), 'true','false')
df$Device3.max = ifelse(df$Device3 == max(df$Device3), 'true','false')
df$Device4.max = ifelse(df$Device4 == max(df$Device4), 'true','false')
For simplicity I am only showing 4 Device columns. I have about a hundred device columns where I would like to do the comparison. It wont be feasible to specify the hundred columns in a hundred ifelse
statements
How do I compare using regex or specifying generic column name assuming all the device columns of interest will have some sort of name starting like device
?
I then want to filter to or find the row where maximum Device.max
columns satisfy the condition where it is TRUE within /-1 row of it.
Algorithmically, I would create an index column and filter to a dataframe where only the TRUE values are present. Then I would check how many columns have indices within 1 row of each other. In the above case rows 1,2 & 3
have 4 columns satisfying the true condition, whereas rows 9 & 10
have only 2 columns satisfying the condition. Therefore my expected output would be:
TS Device1.max Device2.max Device3.max Device4.max
18:02:44 FALSE FALSE TRUE FALSE
18:02:45 TRUE TRUE FALSE FALSE
18:02:46 FALSE FALSE FALSE TRUE
However this method seems very iterative and inefficient. Is there a better way to do it leveraging dataframe functions in R?
CodePudding user response:
This code should answer the first TRUE/FALSE
question
r <- c();
colum <- c();
for (colu in 2:ncol(example_table)){
example_table[ ,colu] <- example_table[ ,colu]==max(example_table[ ,colu]) # returns True/False
val <- which(example_table[ ,colu]==T) # searching for row indexes
r <- append(r,val) # append row indexes
colum <- append(colum,rep(colu,length(val))) # since one column can contain more than one True, repeat that column index and append it
}
true_values <- cbind(r,colum) # just a matrix-like output
out:
> example_table
V1 V2 V3 V4
1 18:02 5 8 1
2 14:05 7 1 7
3 19:27 7 6 1
# After for:
> example_table
V1 V2 V3 V4
1 18:02 FALSE TRUE FALSE
2 14:05 TRUE FALSE TRUE
3 19:27 TRUE FALSE FALSE
> true_values
r colum
[1,] 2 2
[2,] 3 2
[3,] 1 3
[4,] 2 4
where r
is the row index and colum
is the column index which contains true values. Note that example_table[ ,colu]==max(example_table[ ,colu])
returns a TRUE/FALSE
value, and note that colum <- append(colum,rep(colu,length(val)))
is needed to avoid future dimensional problems too.
For the second question, you have now row indexes which contains TRUE
values. You can then implement a code that when rows above and under the selected row contains a TRUE
value (any()
function should be a good one), select that row. Then subset()
original data frame with this row indexes.