Home > Net >  How can I get a certain value from a row in dataframe? [R]
How can I get a certain value from a row in dataframe? [R]

Time:03-12

I'm doing a prediction with a class tree, with "rpart" library, and when I make "predict", I get a table with probabilities and its value/category that test data can take, and I want to get the value/category from the hightest probability. For example (once predict is done), table I get is:

Table1

And I want to have this table:

Tale2

thanks in advance, I've tried a few things but haven't achieved much since I'm pretty new to R, cheers!

CodePudding user response:

One way to achieve your desired output could be:

  1. identify your values in vector pattern
  2. mutate across the relevant columns and use str_detect to check if values are in this column -> if true use cur_column() to place the column name in the new column. the do some tricks with .names and unite and finally select.
library(dplyr)
library(tidyr)
library(stringr)

pattern <- c("0.85|0.5|0.6|0.8")

df %>% 
  mutate(across(starts_with("cat"), ~case_when(str_detect(., pattern) ~ cur_column()), .names = 'new_{col}')) %>%
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  select(index, pred_category = New_Col)
  index pred_category
  <dbl> <chr>        
1     1 cat2         
2     2 cat1         
3     3 cat3         
4     4 cat3  

CodePudding user response:

You didn't post your data so I just put it in a .csv and accessed it from my R folder on my C: drive.

Might be an easier way to do it, but this is the method I use when I might have multiple different types (by column or row) I'd like to sort for. If you're new to R and don't have data.table or dplyr installed yet, you'll need to enter the second parts in the console.

I left the values in but that can be fixed with the last line if you don't want them.

setwd("C:/R")

library(data.table)
library(dplyr)

Table <- read.csv("Table1.csv", check.names = FALSE, fileEncoding = 'UTF-8-BOM')

#Making the data long form makes it much easier to sort as your data gets more complex.
LongForm <- melt(setDT(Table), id.vars = c("index"), variable.name = "Category")

Table1 <- as.data.table(LongForm)

#This gets you what you want.
highest <- Table1 %>% group_by(index) %>% top_n(1, value)

#Then just sort it how you wanted it to look
Table2 <- highest[order(highest$index, decreasing = FALSE), ]

View(Table2)

If you don't have the right packages

install.packages("data.table")

and

install.packages("dplyr")

To get rid of the numbers

Table3 <- Table2[,1:2]
  • Related