Home > OS >  Reshape some rows in columns and columns in rows
Reshape some rows in columns and columns in rows

Time:05-08

I'd like to reorganize a data frame and Imake:

# Open my dataset
res_F <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/ds_bandsIVs.csv")
str(res_F)
# 'data.frame': 16 obs. of  7 variables:
#  $ TILE: chr  "T22KBC" "T22KBC" "T22KBC" "T22KBC" ...
#  $ DATE: chr  "2022-04-20" "2022-04-20" "2022-04-20" "2022-04-20" ...
#  $ BAND: chr  "B2" "B3" "B4" "B8" ...
#  $ MEAN: num  1205 1332 1235 3770 1205 ...
#  $ SD  : num  38.2 63.8 104.7 283.7 38.1 ...
#  $ MIN : int  1141 1217 1135 2638 1142 1217 1135 2636 1111 1222 ...
#  $ MAX : int  1494 1800 2218 4928 1496 1803 2218 4924 1456 1786 ...

Here, I,d like to create a new data frame with BANDS in the columns and mean, sd, min, max in the rows, in a new column STATS. For this, I try:

res_F%>%spread(res_F, key = BAND, value = c(MEAN,SD,MIN,MAX))
Error:
! Must extract column with a single valid subscript.
x Subscript `var` has size 4 but must be size 1.
Run `rlang::last_error()` to see where the error occurred.

Doesn't work and if I try with one variable (eg. MEAN) it not works too:

res_F%>%spread(key = BAND, value = MEAN)
#      TILE       DATE        SD  MIN  MAX       B2       B3       B4       B8
# 1  T22KBC 2022-04-20  38.19472 1141 1494 1205.221       NA       NA       NA
# 2  T22KBC 2022-04-20  63.80129 1217 1800       NA 1332.218       NA       NA
# 3  T22KBC 2022-04-20 104.72500 1135 2218       NA       NA 1234.708       NA
# 4  T22KBC 2022-04-20 283.66055 2638 4928       NA       NA       NA 3770.484
# 5  T22KBC 2022-04-25  34.81989 1111 1456 1182.414       NA       NA       NA
# 6  T22KBC 2022-04-25  59.19710 1222 1786       NA 1333.160       NA       NA
# 7  T22KBC 2022-04-25  93.12933 1152 2236       NA       NA 1239.121       NA
# 8  T22KBC 2022-04-25 274.51874 2554 4748       NA       NA       NA 3646.049
# 9  T22KBD 2022-04-20  38.11758 1142 1496 1205.348       NA       NA       NA
# 10 T22KBD 2022-04-20  63.74426 1217 1803       NA 1331.933       NA       NA
# 11 T22KBD 2022-04-20 104.72817 1135 2218       NA       NA 1234.931       NA
# 12 T22KBD 2022-04-20 283.34937 2636 4924       NA       NA       NA 3770.361
# 13 T22KBD 2022-04-25  35.06260 1106 1454 1177.922       NA       NA       NA
# 14 T22KBD 2022-04-25  59.53104 1217 1784       NA 1329.845       NA       NA
# 15 T22KBD 2022-04-25  93.62821 1149 2238       NA       NA 1236.762       NA
# 16 T22KBD 2022-04-25 275.44175 2550 4748       NA       NA       NA 3647.263

I need some help here. My desirable output is:

#      TILE       DATE        B2       B3       B4       B8     STATS
# 1  T22KBC 2022-04-20   1205.221 1332.218 1234.708  3770.484    mean
# 1  T22KBC 2022-04-20   38.19472  63.80129 104.72500 283.66055   sd
# ...

Please, any help with it?

CodePudding user response:

You can do this with pivot_longer(), followed by pivot_wider(), or, equivalently with melt(), followed by dcast()

  1. with tidyverse
library(tidyverse)

res_F %>%
  pivot_longer(cols = MEAN:MAX,names_to = "STATS") %>% 
  pivot_wider(id_cols = c(TILE,DATE,STATS),names_from = BAND,values_from = value)

Output:

   TILE   DATE       STATS     B2     B3     B4    B8
   <chr>  <chr>      <chr>  <dbl>  <dbl>  <dbl> <dbl>
 1 T22KBC 2022-04-20 MEAN  1205.  1332.  1235.  3770.
 2 T22KBC 2022-04-20 SD      38.2   63.8  105.   284.
 3 T22KBC 2022-04-20 MIN   1141   1217   1135   2638 
 4 T22KBC 2022-04-20 MAX   1494   1800   2218   4928 
 5 T22KBD 2022-04-20 MEAN  1205.  1332.  1235.  3770.
 6 T22KBD 2022-04-20 SD      38.1   63.7  105.   283.
 7 T22KBD 2022-04-20 MIN   1142   1217   1135   2636 
 8 T22KBD 2022-04-20 MAX   1496   1803   2218   4924 
 9 T22KBC 2022-04-25 MEAN  1182.  1333.  1239.  3646.
10 T22KBC 2022-04-25 SD      34.8   59.2   93.1  275.
11 T22KBC 2022-04-25 MIN   1111   1222   1152   2554 
12 T22KBC 2022-04-25 MAX   1456   1786   2236   4748 
13 T22KBD 2022-04-25 MEAN  1178.  1330.  1237.  3647.
14 T22KBD 2022-04-25 SD      35.1   59.5   93.6  275.
15 T22KBD 2022-04-25 MIN   1106   1217   1149   2550 
16 T22KBD 2022-04-25 MAX   1454   1784   2238   4748 
  1. with data.table:
library(data.table)

dcast(
  melt(setDT(res_F),measure.vars = c("MEAN", "SD", "MIN","MAX"),variable.name = "STATS"),
  TILE DATE STATS~BAND,
  value.var="value"
)

Output:

      TILE       DATE  STATS         B2         B3         B4        B8
    <char>     <char> <fctr>      <num>      <num>      <num>     <num>
 1: T22KBC 2022-04-20   MEAN 1205.22129 1332.21834 1234.70774 3770.4842
 2: T22KBC 2022-04-20     SD   38.19472   63.80129  104.72500  283.6606
 3: T22KBC 2022-04-20    MIN 1141.00000 1217.00000 1135.00000 2638.0000
 4: T22KBC 2022-04-20    MAX 1494.00000 1800.00000 2218.00000 4928.0000
 5: T22KBC 2022-04-25   MEAN 1182.41399 1333.15975 1239.12132 3646.0489
 6: T22KBC 2022-04-25     SD   34.81989   59.19710   93.12933  274.5187
 7: T22KBC 2022-04-25    MIN 1111.00000 1222.00000 1152.00000 2554.0000
 8: T22KBC 2022-04-25    MAX 1456.00000 1786.00000 2236.00000 4748.0000
 9: T22KBD 2022-04-20   MEAN 1205.34830 1331.93288 1234.93142 3770.3607
10: T22KBD 2022-04-20     SD   38.11758   63.74426  104.72817  283.3494
11: T22KBD 2022-04-20    MIN 1142.00000 1217.00000 1135.00000 2636.0000
12: T22KBD 2022-04-20    MAX 1496.00000 1803.00000 2218.00000 4924.0000
13: T22KBD 2022-04-25   MEAN 1177.92235 1329.84519 1236.76162 3647.2630
14: T22KBD 2022-04-25     SD   35.06260   59.53104   93.62821  275.4417
15: T22KBD 2022-04-25    MIN 1106.00000 1217.00000 1149.00000 2550.0000
16: T22KBD 2022-04-25    MAX 1454.00000 1784.00000 2238.00000 4748.0000
  • Related