Home > Blockchain >  R - read_csv cause message: Use `spec()` to retrieve the full column specification for this data
R - read_csv cause message: Use `spec()` to retrieve the full column specification for this data

Time:02-28

I just finished a course to learn R for Data Analysis and now I am working on my own on a case study.

Since I am a beginner, please help me understand this problem I did not have during the course.

I have imported csv files and I want to assign them to variables with better names.

I am using following packades: tidyverse, readr, lubridate, ggplot2, janitor, tidyr, skimr.

This is my code:

daily_Activity <- read_csv("../input/bellabeat-dataset/dailyActivity_merged.csv")
daily_Calories <- read_csv("../input/bellabeat-dataset/dailyCalories_merged.csv")
daily_Intesities <- read_csv("../input/bellabeat-dataset/dailyIntensities_merged.csv")
daily_Steps <- read_csv("../input/bellabeat-dataset/dailySteps_merged.csv")
hourly_Calories <- read_csv("../input/bellabeat-dataset/hourlyCalories_merged.csv")
sleep_Day <- read_csv("../input/bellabeat-dataset/sleepDay_merged.csv")
weight_Log <- read_csv("../input/bellabeat-dataset/weightLogInfo_merged.csv")

When I run the code the new tables are created with the new name, but the console also shows me this message:

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

I don't quite understand if this is a problem or if I should just ignore it.

CodePudding user response:

Resources:

  1. https://readr.tidyverse.org/articles/readr.html
  2. https://readr.tidyverse.org/reference/spec.html
  3. <https://stackoverflow.com/questions/70129365/use-spec-to-retrieve-the-full-column-specification-for-this-data

Column specification

It would be tedious if you had to specify the type of every column when reading a file. Instead readr, uses some heuristics to guess the type of each column. You can access these results yourself using guess_parser():

Column specification describes the type of each column and the strategy readr uses to guess types so you don’t need to supply them all.

df <- read_csv(readr_example("mtcars.csv"))

will give:

Rows: 32 Columns: 11                                                            
-- Column specification ---------------------
Delimiter: ","
dbl (11): mpg, cyl, disp, hp, drat, wt, q...

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.

If we then use spec(df):

spec(df)

We will get:

cols(
  mpg = col_double(),
  cyl = col_double(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_double(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)
  • Basically in a situation with many files and columns readr will guess the data types if there is no specification. This may consume time.
  • In a situation where readr can't guess the data type (for example messy date input). With spec() we have to identify and determine the type of this specific column.
  • Related