I have a list that looks like:
df1 <- tibble::tribble(~City, ~State, ~Year, ~Temp,
"", "", "Year", "Overall temperature, by now",
"Aberdeen", "", "2022", "18.73",
"Aberdeen", "", "2021", "17.79",
"Aberdeen", "", "2020", "-",
"Aberdeen", "", "2019", "16.43",
"Aberdeen", "", "2018", "-",
"Aberdeen", "", "2017", "17.84",
"Aberdeen", "", "2016", "17.47",
"Aberdeen", "", "2015", "25.84",
"Aberdeen", "", "2014", "26.8",
"Aberdeen", "", "2013", "22.73",
"Aberdeen", "", "2012", "23.08",
"Aalborg", "P1", "Year", "Temp, measurement",
"Aalborg", "P1", "2022", "-",
"Aalborg", "P1", "2021", "20.05")
Or, the same data for visual representation:
I need to turn the last two columns to numeric. In addition, it would be good to turn this list to dataframe (please, consider a method :)
It is a small sample of a big dataset.
Here is what doesn't work (although I used these methods for working with less complicated lists):
df1[, 3] <- as.numeric(df1[, 3]) #for sure, it is a list
# Error: 'list' object cannot be coerced to type 'double'
as.numeric(unlist(df1[[3]]))
# Error: (converted from warning) NAs introduced by coercion
df1[, 3:4] <- sapply(df1, as.numeric)
# Error in lapply(X = X, FUN = FUN, ...) : (converted from warning) NAs introduced by coercion
as.numeric(as.character(unlist(df1[[3]])))
# Error: (converted from warning) NAs introduced by coercion
df1$Year <- lapply(df1$Year, as.numeric)
# Error in lapply(df1$Year, as.numeric) : (converted from warning) NAs introduced by coercion
df1 <- as.data.frame(df1) #Working with a dataframe would be easier :)
typeof(df1)
# [1] "list"
as.numeric(df1[1, 3]) #If that would work - we could use loop to change element by element into numeric
# Error: (converted from warning) NAs introduced by coercion
df1 <- as.data.table(df1)
typeof(df1)
# [1] "list"
I don't care about the disappeared text data in the numeric columns after the transformation (these data are useless).
Update: we found that methods to work with lists are not robust - running some modern libraries deprecate the work of vital functions. But I can't find which libraries make solutions not working :( Can you, please, help.
Libraries I often use: "plyr", "dplyr", "data.table","tidyverse","magrittr", "tidyr", "reshape2", "expss", "janitor", "dplyr", "ggplot2", "purrr", "GGally", "cluster", "readxl", "writexl", "psych", "knitr", "ExPanDaR", "kableExtra", "plm", "sampleSelection", "nnet", "ggmap", "scales", "RPostgreSQL","readr","lubridate","seasonal","stargazer","merTools","RColorBrewer","colorRamps", "nycflights13", "scales", "zoo", "stringr", "maps", "mapdata", "gtrendsR", "cdlTools", "usmap", "rnaturalearth", "WDI", "tigris", "ggrepel", "rworldmap", "gapminder" System coding: Sys.setlocale(category = 'LC_ALL','en_US.UTF-8')
CodePudding user response:
It is a tibble
, so [,
would still return a tibble with single column as drop = FALSE
by default when compared to data.frame
. Instead use either $
or [[
to extract as vector. For multiple columns, use lapply
instead of sapply
as sapply
can return a matrix
df1[3:4] <- lapply(df1[3:4], as.numeric)
-output
> str(df1)
tibble [15 × 4] (S3: tbl_df/tbl/data.frame)
$ City : chr [1:15] "" "Aberdeen" "Aberdeen" "Aberdeen" ...
$ State: chr [1:15] "" "" "" "" ...
$ Year : num [1:15] NA 2022 2021 2020 2019 ...
$ Temp : num [1:15] NA 18.7 17.8 NA 16.4 ...
> df1
# A tibble: 15 × 4
City State Year Temp
<chr> <chr> <dbl> <dbl>
1 "" "" NA NA
2 "Aberdeen" "" 2022 18.7
3 "Aberdeen" "" 2021 17.8
4 "Aberdeen" "" 2020 NA
5 "Aberdeen" "" 2019 16.4
6 "Aberdeen" "" 2018 NA
7 "Aberdeen" "" 2017 17.8
8 "Aberdeen" "" 2016 17.5
9 "Aberdeen" "" 2015 25.8
10 "Aberdeen" "" 2014 26.8
11 "Aberdeen" "" 2013 22.7
12 "Aberdeen" "" 2012 23.1
13 "Aalborg" "P1" NA NA
14 "Aalborg" "P1" 2022 NA
15 "Aalborg" "P1" 2021 20.0
For single column
> class(df1[,3])
[1] "tbl_df" "tbl" "data.frame"
> class(df1[[3]])
[1] "numeric"
As the input is tibble, we can use dplyr
methods
library(dplyr)
df1 <- df1 %>%
mutate(across(3:4, as.numeric))
CodePudding user response:
First remove all words in Year
and Temp
then convert them to the apropriate class integer
for Year
and double
for Temp
library(dplyr)
library(readr)
df1 %>%
dplyr::filter(!grepl("^[A-Z]", Year),
!grepl("^[A-Z]", Temp)) %>%
dplyr::mutate(Year = readr::parse_integer(Year),
Temp = readr::parse_number(Temp))
# A tibble: 13 × 4
City State Year Temp
<chr> <chr> <int> <dbl>
1 Aberdeen "" 2022 18.7
2 Aberdeen "" 2021 17.8
3 Aberdeen "" 2020 NA
4 Aberdeen "" 2019 16.4
5 Aberdeen "" 2018 NA
6 Aberdeen "" 2017 17.8
7 Aberdeen "" 2016 17.5
8 Aberdeen "" 2015 25.8
9 Aberdeen "" 2014 26.8
10 Aberdeen "" 2013 22.7
11 Aberdeen "" 2012 23.1
12 Aalborg "P1" 2022 NA
13 Aalborg "P1" 2021 20.0