I have the following data frame:
> head(df)
# A tibble: 6 x 6
# Groups: lat, decade [2]
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 55 18 1952-02-03 1952 1950-1959 02-03
2 55 18 1958-02-08 1958 1950-1959 02-08
3 55 18 1958-02-08 1958 1950-1959 02-08
4 55 18 1958-02-08 1958 1950-1959 02-08
5 55 18 1965-02-07 1965 1960-1969 02-07
6 55 18 1966-03-03 1966 1960-1969 03-03
> summary(df)
lat long date year decade
Min. :55.00 Min. :18 Min. :1951-03-22 Length:1414 Length:1414
1st Qu.:56.00 1st Qu.:18 1st Qu.:1987-01-01 Class :character Class :character
Median :58.00 Median :18 Median :2004-04-02 Mode :character Mode :character
Mean :59.07 Mean :18 Mean :1999-02-16
3rd Qu.:62.00 3rd Qu.:18 3rd Qu.:2014-01-01
Max. :68.00 Max. :18 Max. :2021-03-28
month_day
Length:1414
Class :character
Mode :character
I would like to get the median month_day
by degree of latitude (lat
) and per decade
I have tried this but cannot get past an error:
df = df %>%
group_by(lat, decade) %>%
summarise(across(month_day, median)) %>%
ungroup
Error in `summarise()`:
! Problem while computing `..1 = across(month_day, median)`.
Caused by error:
! `month_day` must return compatible vectors across groups.
i Result type for group 1 (lat = 55, decade = "1950-1959"): <double>.
i Result type for group 2 (lat = 55, decade = "1960-1969"): <character>.
I do not know how to solve it, thank you very much for your help.
EDIT:
> ds_filtered_median[ds_filtered_median$lat == '57', ]
# A tibble: 124 x 6
lat long date year decade month_day
<dbl> <dbl> <date> <chr> <chr> <chr>
1 57 18 1955-04-08 1955 1950-1959 04-08
2 57 18 1957-02-19 1957 1950-1959 02-19
3 57 18 1958-04-06 1958 1950-1959 04-06
4 57 18 1959-01-01 1959 1950-1959 01-01
5 57 18 1960-01-03 1960 1960-1969 01-03
6 57 18 1961-01-02 1961 1960-1969 01-02
7 57 18 1962-01-02 1962 1960-1969 01-02
8 57 18 1963-01-01 1963 1960-1969 01-01
9 57 18 1964-01-19 1964 1960-1969 01-19
10 57 18 1965-01-12 1965 1960-1969 01-12
# ... with 114 more rows
CodePudding user response:
What you can do is convert your date to days since the start of a year. From that number you can easily calculate your median. Then convert your days back with any first of january as a reference. You can me one of on leap years though... For date manipulation I used lubridate.
library(lubridate)
data %>%
mutate(
date = ymd(date),
days_since_january = as.numeric(date - ymd(paste(year(date), 1, 1, sep = "-")))
) %>%
group_by(lat, decade) %>%
summarise(across(days_since_january, median), .groups = "keep") %>%
mutate(median_month_date = format(ymd("1960-01-01") days(floor(days_since_january)), "%m-%d"))
# A tibble: 2 x 4
# Groups: lat, decade [2]
lat decade days_since_january median_month_date
<dbl> <chr> <dbl> <chr>
1 55 1950-1959 38 02-08
2 55 1960-1969 49 02-19
# A tibble: 2 x 4
# Groups: lat, decade [2]
lat decade days_since_january median_month_date
<int> <chr> <dbl> <chr>
1 57 1950-1959 72 03-13
2 57 1960-1969 1.5 01-02
CodePudding user response:
You must convert month_day
to numeric to get the median. across
is only needed if something is calculated for multiple columns individually e.g. to get median lon
and lat
using data %>% summarise(across(any_of(c("lat", "long")), median))
library(tidyverse)
data <- tribble(
~lat, ~long, ~date, ~year, ~decade, ~month_day,
55, 18, "1952-02-03", 1952, "1950-1959", "02-03",
55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
55, 18, "1958-02-08", 1958, "1950-1959", "02-08",
55, 18, "1965-02-07", 1965, "1960-1969", "02-07",
55, 18, "1966-03-03", 1966, "1960-1969", "03-03"
)
data %>%
mutate(
month_day_num = month_day %>% str_extract("[0-9] $") %>% as.numeric()
) %>%
group_by(lat, decade) %>%
summarise(
median_month_day = median(month_day_num)
)
#> `summarise()` has grouped output by 'lat'. You can override using the `.groups`
#> argument.
#> # A tibble: 2 × 3
#> # Groups: lat [1]
#> lat decade median_month_day
#> <dbl> <chr> <dbl>
#> 1 55 1950-1959 8
#> 2 55 1960-1969 5
Created on 2022-04-05 by the reprex package (v2.0.0)