Home > database >  Unwanted leading whitespace before number to char conversion in R
Unwanted leading whitespace before number to char conversion in R

Time:08-09

Consider the following minimal working example:

df <- tibble::tibble(
    x=c(1, 10)
)

mwe <- function(row) {
    return(paste0(row[["x"]]))
}

df$mwe1 = apply(df, 1, mwe)
df$mwe2 = apply(df, 1, mwe)

The final value of df is

# A tibble: 2 × 3
      x mwe1  mwe2 
  <dbl> <chr> <chr>
1     1 1     " 1" 
2    10 10    "10" 

I expect it to be

# A tibble: 2 × 3
      x mwe1  mwe2 
  <dbl> <chr> <chr>
1     1 1     1 
2    10 10    10 

Can I get an explanation of why column mwe2 has the extra whitespace?

R.version info

platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          4                           
minor          1.2                         
year           2021                        
month          11                          
day            01                          
svn rev        81115                       
language       R                           
version.string R version 4.1.2 (2021-11-01)
nickname       Bird Hippie

CodePudding user response:

The meaning of 1 changes between when you create mwe1 and mwe2 because the new column changes the row. That is, having the mwe1 column changes the way paste works.

When you are creating mwe1 the data frame consists of one column (x) and 2 rows. When you are creating mwe2 the data frame has two columns and two rows. So when you paste0() the rows the result is different.

CodePudding user response:

Some details behind apply():

  1. apply() converts a data.frame(or tibble) into a matrix with as.matrix() before applying a function over array margins.
  2. If a dataframe contains numeric and character columns, as.matrix() will make all numeric ones converted to character with format().
  3. format() coerces a numeric vector to have the same width when convering it into character. E.g.
format(c(123.45, pi))

[1] "123.450000"  "  3.141593"

(Note the trailing zeros of 123.45 and the leading whitespaces of pi)


When you create mwe1:

df$mwe1 = apply(df, 1, mwe)

df just has a numeric column x, so only step 1 is involved. The process goes on to mwe2:

df$mwe2 = apply(df, 1, mwe)

Now df has two columns, one is numeric(x) and the other character(mwe1). In this case, step 2-3 are involved and hence the x column will be passed to format() and become

format(df$x)

[1] " 1" "10"

This is why the leading whitespace appears.

  • Related