Home > Mobile >  R: stable method for transforming selected list columns to numeric
R: stable method for transforming selected list columns to numeric

Time:01-17

I have a list that looks like:

df1 <- tibble::tribble(~City,   ~State, ~Year,  ~Temp,
"", "", "Year", "Overall temperature, by now",
"Aberdeen", "", "2022", "18.73",
"Aberdeen", "", "2021", "17.79",
"Aberdeen", "", "2020", "-",
"Aberdeen", "", "2019", "16.43",
"Aberdeen", "", "2018", "-",
"Aberdeen", "", "2017", "17.84",
"Aberdeen", "", "2016", "17.47",
"Aberdeen", "", "2015", "25.84",
"Aberdeen", "", "2014", "26.8",
"Aberdeen", "", "2013", "22.73",
"Aberdeen", "", "2012", "23.08",
"Aalborg",  "P1",   "Year", "Temp, measurement",
"Aalborg",  "P1",   "2022", "-",
"Aalborg",  "P1",   "2021", "20.05")

Or, the same data for visual representation: enter image description here

I need to turn the last two columns to numeric. In addition, it would be good to turn this list to dataframe (please, consider a method :)

It is a small sample of a big dataset.

Here is what doesn't work (although I used these methods for working with less complicated lists):

df1[, 3] <- as.numeric(df1[, 3]) #for sure, it is a list
# Error: 'list' object cannot be coerced to type 'double'
as.numeric(unlist(df1[[3]])) 
# Error: (converted from warning) NAs introduced by coercion
df1[, 3:4] <- sapply(df1, as.numeric)
# Error in lapply(X = X, FUN = FUN, ...) :  (converted from warning) NAs introduced by coercion
as.numeric(as.character(unlist(df1[[3]]))) 
# Error: (converted from warning) NAs introduced by coercion
df1$Year <- lapply(df1$Year, as.numeric)
# Error in lapply(df1$Year, as.numeric) :  (converted from warning) NAs introduced by coercion
df1 <- as.data.frame(df1) #Working with a dataframe would be easier :)
typeof(df1)
# [1] "list"

as.numeric(df1[1, 3]) #If that would work - we could use loop to change element by element into numeric 
# Error: (converted from warning) NAs introduced by coercion
df1 <- as.data.table(df1)
typeof(df1)
# [1] "list"

I don't care about the disappeared text data in the numeric columns after the transformation (these data are useless).

Update: we found that methods to work with lists are not robust - running some modern libraries deprecate the work of vital functions. But I can't find which libraries make solutions not working :( Can you, please, help.

Libraries I often use: "plyr", "dplyr", "data.table","tidyverse","magrittr", "tidyr", "reshape2", "expss", "janitor", "dplyr", "ggplot2", "purrr", "GGally", "cluster", "readxl", "writexl", "psych", "knitr", "ExPanDaR", "kableExtra", "plm", "sampleSelection", "nnet", "ggmap", "scales", "RPostgreSQL","readr","lubridate","seasonal","stargazer","merTools","RColorBrewer","colorRamps", "nycflights13", "scales", "zoo", "stringr", "maps", "mapdata", "gtrendsR", "cdlTools", "usmap", "rnaturalearth", "WDI", "tigris", "ggrepel", "rworldmap", "gapminder" System coding: Sys.setlocale(category = 'LC_ALL','en_US.UTF-8')

CodePudding user response:

It is a tibble, so [, would still return a tibble with single column as drop = FALSE by default when compared to data.frame. Instead use either $ or [[ to extract as vector. For multiple columns, use lapply instead of sapply as sapply can return a matrix

df1[3:4] <- lapply(df1[3:4], as.numeric)

-output

> str(df1)
tibble [15 × 4] (S3: tbl_df/tbl/data.frame)
 $ City : chr [1:15] "" "Aberdeen" "Aberdeen" "Aberdeen" ...
 $ State: chr [1:15] "" "" "" "" ...
 $ Year : num [1:15] NA 2022 2021 2020 2019 ...
 $ Temp : num [1:15] NA 18.7 17.8 NA 16.4 ...
> df1
# A tibble: 15 × 4
   City       State  Year  Temp
   <chr>      <chr> <dbl> <dbl>
 1 ""         ""       NA  NA  
 2 "Aberdeen" ""     2022  18.7
 3 "Aberdeen" ""     2021  17.8
 4 "Aberdeen" ""     2020  NA  
 5 "Aberdeen" ""     2019  16.4
 6 "Aberdeen" ""     2018  NA  
 7 "Aberdeen" ""     2017  17.8
 8 "Aberdeen" ""     2016  17.5
 9 "Aberdeen" ""     2015  25.8
10 "Aberdeen" ""     2014  26.8
11 "Aberdeen" ""     2013  22.7
12 "Aberdeen" ""     2012  23.1
13 "Aalborg"  "P1"     NA  NA  
14 "Aalborg"  "P1"   2022  NA  
15 "Aalborg"  "P1"   2021  20.0

For single column

> class(df1[,3])
[1] "tbl_df"     "tbl"        "data.frame"
> class(df1[[3]])
[1] "numeric"

As the input is tibble, we can use dplyr methods

library(dplyr)
df1 <- df1 %>%
    mutate(across(3:4, as.numeric))

CodePudding user response:

First remove all words in Year and Temp then convert them to the apropriate class integer for Year and double for Temp

library(dplyr)
library(readr)

df1 %>% 
  dplyr::filter(!grepl("^[A-Z]", Year), 
         !grepl("^[A-Z]", Temp)) %>% 
  dplyr::mutate(Year = readr::parse_integer(Year),
         Temp = readr::parse_number(Temp))

# A tibble: 13 × 4
   City     State  Year  Temp
   <chr>    <chr> <int> <dbl>
 1 Aberdeen ""     2022  18.7
 2 Aberdeen ""     2021  17.8
 3 Aberdeen ""     2020  NA  
 4 Aberdeen ""     2019  16.4
 5 Aberdeen ""     2018  NA  
 6 Aberdeen ""     2017  17.8
 7 Aberdeen ""     2016  17.5
 8 Aberdeen ""     2015  25.8
 9 Aberdeen ""     2014  26.8
10 Aberdeen ""     2013  22.7
11 Aberdeen ""     2012  23.1
12 Aalborg  "P1"   2022  NA  
13 Aalborg  "P1"   2021  20.0
  • Related