I just finished a course to learn R for Data Analysis and now I am working on my own on a case study.
Since I am a beginner, please help me understand this problem I did not have during the course.
I have imported csv files and I want to assign them to variables with better names.
I am using following packades: tidyverse, readr, lubridate, ggplot2, janitor, tidyr, skimr.
This is my code:
daily_Activity <- read_csv("../input/bellabeat-dataset/dailyActivity_merged.csv")
daily_Calories <- read_csv("../input/bellabeat-dataset/dailyCalories_merged.csv")
daily_Intesities <- read_csv("../input/bellabeat-dataset/dailyIntensities_merged.csv")
daily_Steps <- read_csv("../input/bellabeat-dataset/dailySteps_merged.csv")
hourly_Calories <- read_csv("../input/bellabeat-dataset/hourlyCalories_merged.csv")
sleep_Day <- read_csv("../input/bellabeat-dataset/sleepDay_merged.csv")
weight_Log <- read_csv("../input/bellabeat-dataset/weightLogInfo_merged.csv")
When I run the code the new tables are created with the new name, but the console also shows me this message:
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
I don't quite understand if this is a problem or if I should just ignore it.
CodePudding user response:
Resources:
- https://readr.tidyverse.org/articles/readr.html
- https://readr.tidyverse.org/reference/spec.html
- <https://stackoverflow.com/questions/70129365/use-spec-to-retrieve-the-full-column-specification-for-this-data
Column specification
It would be tedious if you had to specify the type of every column when reading a file. Instead readr
, uses some heuristics to guess the type of each column. You can access these results yourself using guess_parser()
:
Column specification describes the type of each column and the strategy readr uses to guess types so you don’t need to supply them all.
df <- read_csv(readr_example("mtcars.csv"))
will give:
Rows: 32 Columns: 11
-- Column specification ---------------------
Delimiter: ","
dbl (11): mpg, cyl, disp, hp, drat, wt, q...
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
If we then use spec(df):
spec(df)
We will get:
cols(
mpg = col_double(),
cyl = col_double(),
disp = col_double(),
hp = col_double(),
drat = col_double(),
wt = col_double(),
qsec = col_double(),
vs = col_double(),
am = col_double(),
gear = col_double(),
carb = col_double()
)
- Basically in a situation with many files and columns
readr
will guess the data types if there is no specification. This may consume time. - In a situation where
readr
can't guess the data type (for example messy date input). Withspec()
we have to identify and determine the type of this specific column.