Home > Enterprise >  How to index specific row and column of a dataframe by conditions using R?
How to index specific row and column of a dataframe by conditions using R?

Time:10-15

I have got two dataframe with different dimensions; dataframe a and dataframe b. I want to replace specific fields in dataframe a with the values of fields stored in dataframe b. I have found a way how to filter the value of dataframe b I want to use to replace the fields of dataframe a. However, I do not know how to index the specific fields of dataframe a using conditions to replace values there... Any suggestions?

df_a <- base::cbind(base::as.data.frame(
                    base::matrix(c(3, 5, 6, 1, 23, 6, 7, 58, 9), ncol = 3)),
                    c('bli', 'bla', 'blub'))
# create dataframe a

base::colnames(df_a) <- c('col1', 'col2', 'col3', 'col4')
# set colnames of dataframe a

df_b <- base::as.data.frame(base::matrix(base::seq(1, 27, 3), ncol = 3, byrow = TRUE))
# create dataframe b

repl_val <- df_b[2, 2]
# replace value for dataframe a

spc_col1_val <- 5
# row to index dataframe a: column <col1> should have the value <5>
spc_col <- base::paste0('col', '1')
# column to index dataframe a: the specific column is combined out of different variables e.g. <col> and <3>

df_a[df_a$col1 == spc_col1_val, df_a$spc_col] <- repl_val
# THIS DOES NOT WORK

CodePudding user response:

Since it looks like you are relatively new to R I'm going to try and give a complete answer: I'd strongly recommend against using base:: everywhere for two reasons: 1) Nobody is ever going to unload the base package and b) it looks horrendous. I think I understand your intention and that you would agree that we should give an effort to keep code as clean and legible as possible (to find/avoid bugs) But this style choice makes it more wordy without any benefits... In fact, it makes the code harder to read and understand (as commenters already pointed out there were multiple errors in this few lines of code. And there are still errrors in the example (i.e. there is no variable named "spc_val_col1" (you named it "spc_col1_val") ) Using a cleaner style helps a lot!!

Now, if you care about explicicity, a much cleaner way of creating df_a is:

df_a <- data.frame(col1 = c(3, 5, 6),
                   col2 = c(1, 23, 6),
                   col3 = c(7, 58, 9),
                   col4 = c('bli', 'bla', 'blub'))

Benefits: It's all there, including colnames, no need for further manipulation of the df, less ways to introduce errors...

For the second assignment you use seq and matrix which can be useful. Although I don't see the need for it here (again, explicit is better than implicit) But as.data.frame is not needed here, only diff would be names (that are not used anyways)

df_b <- data.frame(matrix(seq(1, 27, 3), ncol = 3, byrow = TRUE))
  
repl_val <- df_b[2, 2] # replace value for dataframe a
spc_col1_val <- 5 # row to index dataframe a: column <col1> should have the value <5>

In the comment you write is combined out of different variables e.g. < col> and <3> but then you use paste0('col', '1') I hope you can see how this is kind of confusing (again, if we care about clarity, to have all this stuff correct would be actually helpful...)

spc_col <- paste0('col', '1') # column to index dataframe a: the specific column is combined out of different variables e.g. <col> and <3> 

df_a before the assignment:

  col1 col2 col3 col4
1    3    1    7  bli
2    5   23   58  bla
3    6    6    9 blub

The next line had multiple errors in it, but most importantly: We can use which to retrieve an index of elements that satisfy a condition, and we specify the column names directly as string, I'd suggest looking into ?Extract it's a very important function in R!

df_a[which(df_a$col1 == spc_col1_val), spc_col] <- repl_val

df_a after the assignement:

  col1 col2 col3 col4
1    3    1    7  bli
2   13   23   58  bla
3    6    6    9 blub

Please don't take the criticism as a sign of hostility. I think you have the right intentions, and are obviously willing to put in the work. But I'd strongly suggest focusing on stuff that is actually helpful and important ;) Hope this helped you out!

And as a final note: It would be great if you also included an example of the desired output (I assume what I showed is what you wanted but an example of the output would remove all ambiguity (agian, explicit is better ;)

  • Related