My original dataframe is in this format:
I want to take date in columns and variable Sun
and Soil_moisture
in rows and name it metric
. Their values should be placed horizontally as shown below.
Is it doable in r? Since r doesn't allow numbers to be in columns header I have prefixed with d
My sample dataset:
data <- data.frame(country = c("US","US","US","IND","IND","IND"),
state = c("TX","TX","TX","AP","AP","AP"),
date = c("20220601","20220602","20220603","20220601","20220602","20220603"),
Sun = c(30,35,40,20,30,40),
Soil_moisture = c(2.5,3.4,4.5,5.6,6.7,7.8))
I tried the dcast
function but variable Sun
is not coming into rows.
library(data.table)
dcast(data,country state~date,value.var = "Sun")
Please suggest
CodePudding user response:
You can use a combination of melt
and dcast
:
library(data.table)
dcast(
melt(
setDT(data),
id.vars = c("country", "state", "date"),
variable.name = "metric"
),
country state metric ~ paste0("d_", date)
)
Output
country state metric d_20220601 d_20220602 d_20220603
<char> <char> <fctr> <num> <num> <num>
1: IND AP Sun 20.0 30.0 40.0
2: IND AP Soil_moisture 5.6 6.7 7.8
3: US TX Sun 30.0 35.0 40.0
4: US TX Soil_moisture 2.5 3.4 4.5
If you left out paste0
, then it would return just the date as the column name. But as @jdobres, this is generally not advisable.
CodePudding user response:
Fairly straightforward to do with tidyverse and its pivot_wider
and pivot_longer
functions:
library(tidyverse)
new_data <- data %>%
pivot_longer(c(Sun, Soil_moisture), names_to = 'metric', values_to = 'value') %>%
pivot_wider(names_prefix = 'd_', names_from = 'date', values_from = 'value')
country state metric d_20220601 d_20220602 d_20220603
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 US TX Sun 30 35 40
2 US TX Soil_moisture 2.5 3.4 4.5
3 IND AP Sun 20 30 40
4 IND AP Soil_moisture 5.6 6.7 7.8
You could actually force pivot_wider
to use the dates as column names by omitting the names_prefix
argument, but it is generally a bad idea to have numeric column names.