Logical Operator "or" did not work properly on Rstudio?-CodePudding

I'm listing all countries where "Cocoa Percent" > 70% or "Rating" > 3.5 on rstudio, ggplot2. However, the plot shows some countries that did not match the criteria such as South Korea (70% cocoa, 3.25 rating point), Netherlands (70% cocoa, 3.5 rating point), Russia, South Korea, Suriname, etc. It's supposed to be 51 countries as I checked on Excel advanced filter, instead of 56 on ggplot

This is my geom_bar() plot

This is the data set

This is my code chunk:

chocolate_df %>% filter(`Cocoa\nPercent` > 70 | Rating > 3.5) %>%
  ggplot(aes(x=`Company\nLocation`))   
  geom_bar()   theme(axis.text.x = element_text(angle=90))

CodePudding user response：

The issue is that the variable Cocoa\nPercent was read into R as a character variable, including the % symbol. You need to convert it to a numeric variable.

Here's the same dataset from a Github repository:

library(readr)
library(ggplot2)
library(dplyr)

cacao <- read_csv("https://raw.githubusercontent.com/ry05/Chocolate-Bar-Analysis/master/Dataset/flavors_of_cacao.csv")

glimpse(cacao, width = 100)

Rows: 1,795
Columns: 9
$ `Company \n(Maker-if known)`        <chr> "A. Morin", "A. Morin", "A. Morin", "A. Morin", "A. Mo…
$ `Specific Bean Origin\nor Bar Name` <chr> "Agua Grande", "Kpime", "Atsane", "Akata", "Quilla", "…
$ REF                                 <dbl> 1876, 1676, 1676, 1680, 1704, 1315, 1315, 1315, 1319, …
$ `Review\nDate`                      <dbl> 2016, 2015, 2015, 2015, 2015, 2014, 2014, 2014, 2014, …
$ `Cocoa\nPercent`                    <chr> "63%", "70%", "70%", "70%", "70%", "70%", "70%", "70%"…
$ `Company\nLocation`                 <chr> "France", "France", "France", "France", "France", "Fra…
$ Rating                              <dbl> 3.75, 2.75, 3.00, 3.50, 3.50, 2.75, 3.50, 3.50, 3.75, …
$ `Bean\nType`                        <chr> " ", " ", " ", " ", " ", "Criollo", " ", "Criollo", "C…
$ `Broad Bean\nOrigin`                <chr> "Sao Tome", "Togo", "Togo", "Togo", "Peru", "Venezuela…

Using your filter there are 56 rows:

cacao %>% 
  filter(`Cocoa\nPercent` > 70 | Rating > 3.5) %>% 
  distinct(`Company\nLocation`) %>% 
  nrow()

[1] 56

After conversion to numeric there are 51 rows:

cacao %>% 
  mutate(`Cocoa\nPercent` = as.numeric(gsub("%", "", `Cocoa\nPercent`))) %>%
  filter(`Cocoa\nPercent` > 70 | Rating > 3.5) %>% 
  distinct(`Company\nLocation`) %>% 
  nrow()

[1] 51