Home > Software engineering >  Argument is not numeric
Argument is not numeric

Time:05-30

I would like to visualize the number of people infected with COVID-19, but I am unable to obtain the mortality rate because the number of deaths is derived by int when obtaining the mortality rate per 100,000 population for each prefecture.

What I want to achieve

I want to find the solution of "covid19j_20200613$POP2019 * 100" by setting the data type of "covid19j_20200613$deaths" to num.

Error message.

 Error in covid19j_20200613$deaths/covid19j_20200613$POP2019: 
   Argument of binary operator is not numeric 

Source code in question.

library(spdep)
library(sf)
library(spatstat)
library(tidyverse)
library(ggplot2)

needs::prioritize(magrittr)

covid19j <- read.csv("https://raw.githubusercontent.com/kaz-ogiwara/covid19/master/data/prefectures.csv",
                     header=TRUE)

# Below is an example for May 20, 2020.
# Month and date may be changed

covid19j_20200613 <- dplyr::filter(covid19j,
                                   year==2020,
                                   month==6,
                                   date==13)
covid19j_20200613$CODE <- 1:47

covid19j_20200613[is.na(covid19j_20200613)] <- 0

pop19 <- read.csv("/Users/carlobroschi_imac/Documents/lectures/EGDS/07/covid19_data/covid19_data/pop2019.csv", header=TRUE)

covid19j_20200613 <- dplyr::inner_join(covid19j_20200613, pop19, 
                                       by = c("CODE" = "CODE"))

# Load Japan prefecture administrative boundary data
jpn_pref <- sf::st_read("/Users/carlobroschi_imac/Documents/lectures/EGDS/07/covid19_data/covid19_data/jpn_pref.shp")
# Data and concatenation
jpn_pref_cov19 <- dplyr::inner_join(jpn_pref, covid19j_20200613, by=c("PREF_CODE"="CODE"))

ggplot2::ggplot(data = jpn_pref_cov19)   
  geom_sf(aes(fill=testedPositive))   
  scale_fill_distiller(palette="RdYlGn")   
  theme_bw()  
  labs(title = "Tested Positiv of Covid19 (2020/06/13)")


# Mortality rate per 100,000 population
# Population number in units of 1000
as.numeric(covid19j_20200613$deaths)
covid19j_20200613$deaths_rate <- covid19j_20200613$deaths / covid19j_20200613$POP2019 * 100

Source code in question.

prefectures.csv
https://docs.google.com/spreadsheets/d/11C2vVo-jdRJoFEP4vAGxgy_AEq7pUrlre-i-zQVYDd4/edit?usp=sharing
pop2019.csv
https://docs.google.com/spreadsheets/d/1CbEX7BADutUPUQijM0wuKUZFq2UUt-jlWVQ1ipzs348/edit?usp=sharing

What we tried

I tried to put "as.numeric(covid19j_20200613$deaths)" before the calculation and set the number of dead to type num, but I got the same error message during the calculation.

Additional information (FW/tool versions, etc.)

iMac M1 2021, R 4.2.0

Translated with www.DeepL.com/Translator (free version)

CodePudding user response:

as.numeric() does not permanently change the data type - it only does it temporarily.

So when you're running as.numeric(covid19j_20200613$deaths), this shows you the column deaths as numeric, but the column will stay a character.

So if you want to coerce the data type, you need to also reassign:

covid19j_20200613$deaths <- as.numeric(covid19j_20200613$deaths)
covid19j_20200613$POP2019 <- as.numeric(covid19j_20200613$POP2019)

# Now you can do calculations
covid19j_20200613$deaths_rate <- covid19j_20200613$deaths / covid19j_20200613$POP2019 * 100

It's easier to read if you use mutate from dplyr:

covid19j_20200613 <- covid19j_20200613 |>
  mutate(
    deaths = as.numeric(deaths),
    POP2019 = as.numeric(POP2019),
    death_rate = deaths / POP2019 * 100
  )

Result

  deaths POP2019 deaths_rate
1     91    5250  1.73333333
2      1    1246  0.08025682
3      0    1227  0.00000000
4      1    2306  0.04336513
5      0     966  0.00000000

PS: your question is really difficult to follow! There is a lot of stuff that we don't actually need to answer it, so that makes it harder for us to identify where the issue is. For example, all the data import, the join, the ggplot...

When writing a question, please only include the minimal elements that lead to a problem. In your case, we only needed a sample dataset with the deaths and POP2019 columns, and the two lines of code that you tried to fix at the end.

CodePudding user response:

If you look at str(covid19j) you'll see that the deaths column is a character column containing a lot of blanks. You need to figure out the structure of that column to read it properly.

  •  Tags:  
  • r
  • Related