Home > Mobile >  categorize data with column header names
categorize data with column header names

Time:05-04

I have a dataframe that I want to make the column names categorical variables.

  topic1 topic2 topic3
1  apple  volvo toyota
2   pear    bwm   audi
3  apple    bmw    bmw
4 orange  volvo   benz
5 orange   fiat   fiat

I want topic1...3 to be become categorical variables like below and collapse the rows to a single column:

     name  terms
1  topic1  apple
2  topic1   pear
3  topic1  apple
4  topic1 orange
5  topic1 orange
6  topic2  volvo
7  topic2    bwm
8  topic2    bmw
9  topic2  volvo
10 topic2   fiat
11 topic3 toyota
12 topic3   audi
13 topic3    bmw
14 topic3   benz
15 topic3   fiat

CodePudding user response:

And a tidyverse option ...

library(tidyverse)

tribble(
  ~topic1, ~topic2, ~topic3,
  "apple", "volvo", "toyota",
  "pear", "bwm", "audi",
  "apple", "bmw", "bmw",
  "orange", "volvo", "benz",
  "orange", "fiat", "fiat"
) |> 
  pivot_longer(everything()) |> 
  arrange(name)
#> # A tibble: 15 × 2
#>    name   value 
#>    <chr>  <chr> 
#>  1 topic1 apple 
#>  2 topic1 pear  
#>  3 topic1 apple 
#>  4 topic1 orange
#>  5 topic1 orange
#>  6 topic2 volvo 
#>  7 topic2 bwm   
#>  8 topic2 bmw   
#>  9 topic2 volvo 
#> 10 topic2 fiat  
#> 11 topic3 toyota
#> 12 topic3 audi  
#> 13 topic3 bmw   
#> 14 topic3 benz  
#> 15 topic3 fiat

Created on 2022-05-03 by the reprex package (v2.0.1)

CodePudding user response:

We may use stack from base R to convert to a two column data.frame

setNames(stack(df1)[2:1], c("name", "terms"))

-output

 name  terms
1  topic1  apple
2  topic1   pear
3  topic1  apple
4  topic1 orange
5  topic1 orange
6  topic2  volvo
7  topic2    bwm
8  topic2    bmw
9  topic2  volvo
10 topic2   fiat
11 topic3 toyota
12 topic3   audi
13 topic3    bmw
14 topic3   benz
15 topic3   fiat

data

df1 <- structure(list(topic1 = c("apple", "pear", "apple", "orange", 
"orange"), topic2 = c("volvo", "bwm", "bmw", "volvo", "fiat"), 
    topic3 = c("toyota", "audi", "bmw", "benz", "fiat")), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

CodePudding user response:

Melting seems to do the trick:

df = data.frame(
  a = c('a','b','c'),
  b = c('x','y','z'),
  c = c('i','j','k')
)

df %>% 
  data.table::melt(id.vars='a')
  •  Tags:  
  • r
  • Related