Consider the following minimal working example:
df <- tibble::tibble(
x=c(1, 10)
)
mwe <- function(row) {
return(paste0(row[["x"]]))
}
df$mwe1 = apply(df, 1, mwe)
df$mwe2 = apply(df, 1, mwe)
The final value of df
is
# A tibble: 2 × 3
x mwe1 mwe2
<dbl> <chr> <chr>
1 1 1 " 1"
2 10 10 "10"
I expect it to be
# A tibble: 2 × 3
x mwe1 mwe2
<dbl> <chr> <chr>
1 1 1 1
2 10 10 10
Can I get an explanation of why column mwe2
has the extra whitespace?
R.version
info
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 1.2
year 2021
month 11
day 01
svn rev 81115
language R
version.string R version 4.1.2 (2021-11-01)
nickname Bird Hippie
CodePudding user response:
The meaning of 1 changes between when you create mwe1 and mwe2 because the new column changes the row. That is, having the mwe1 column changes the way paste works.
When you are creating mwe1 the data frame consists of one column (x) and 2 rows. When you are creating mwe2 the data frame has two columns and two rows. So when you paste0() the rows the result is different.
CodePudding user response:
Some details behind apply()
:
apply()
converts adata.frame
(ortibble
) into a matrix withas.matrix()
before applying a function over array margins.- If a dataframe contains numeric and character columns,
as.matrix()
will make all numeric ones converted to character withformat()
. format()
coerces a numeric vector to have the same width when convering it into character. E.g.
format(c(123.45, pi))
[1] "123.450000" " 3.141593"
(Note the trailing zeros of 123.45 and the leading whitespaces of pi
)
When you create mwe1
:
df$mwe1 = apply(df, 1, mwe)
df
just has a numeric column x
, so only step 1 is involved. The process goes on to mwe2
:
df$mwe2 = apply(df, 1, mwe)
Now df
has two columns, one is numeric(x
) and the other character(mwe1
). In this case, step 2-3 are involved and hence the x
column will be passed to format()
and become
format(df$x)
[1] " 1" "10"
This is why the leading whitespace appears.