Home > Blockchain >  Error and lack of names in distance matrix, dist()
Error and lack of names in distance matrix, dist()

Time:01-08

I have some problems with alghorytmic hierarchical clustering by Minkowski method. That's my code

install.packages("eurostat")
install.packages("ggplot2")
install.packages("dplyr")
library(eurostat)
library(ggplot2)
library(dplyr)


unique(data$geo)

data <- get_eurostat("prc_hicp_manr")

data <- filter(data, time >= "2000-03-01" & time <= "2022-09-30" & coicop == "CP00")


country_name <- eurostat::eu_countries


data <- merge(data, country_name, by.x = "geo", by.y = "code")

unique(data$name)

data <- filter(data, name != "" & name != "United Kingdom")

unique(data$name)

sum(is.na(data$name)) # Mamy 0 NAs, a zatem w naszych danych sa tylko kraje EU-27



ggplot(data)  
  aes(x = time, y = values, colour = name)  
  geom_line()  
  scale_color_hue(direction = 1)  
  labs(title = "Przebiegi HICP dla krajow UE")  
  theme_bw()  
  theme(plot.title = element_text(size = 20L, hjust = 0.5))

data_by_country <- data %>%
  group_by(name) %>%
  summarize(HICP = mean(values)) %>%
  ungroup() %>%
  as.matrix()

any(is.na(data_by_country))

data_by_country <- na.omit(data_by_country)

distance_matrix <- as.dist(dist(data_by_country, p = 1.5, method = "minkowski"))

Everything is fine until data_by_country line. By running this I got nice table with two columns, about names of countries and their HICP. There is no any NAs in this table. That's the first 6 rows from this table.

      name     HICP
1  Austria 2.102952
2  Belgium 2.232472
3 Bulgaria 4.036531
4  Croatia 2.410332
5   Cyprus 1.833579
6  Czechia 2.621033

After that, I want to create distance matrix based on that table. Unfortunately, I got this error:

Warning message:
In dist(data_by_country, p = 1.5, method = "minkowski") :
  NAs introduced by coercion

But there is literally no NAs in my table. Additionally, when I use function:

distance_matrix <- as.dist(dist(data_by_country, p = 1.5, method = "minkowski"))

I got a matrix with values, but there are no names in columns and rows. There are only numbers.

Have you got any idea what's wrong with that code? I am new to R, so I think it could be easy but i have no idea what to do now.

IMPORTANT: I can use only dplyr, ggplot2 and eurostat packages in that exercise

That's the plot I want to get. Is it a right way to do that?hclust

  • Related