Home > other >  How to add a constant value column to an empty dataframe?
How to add a constant value column to an empty dataframe?

Time:01-26

I have created the following empty dataframe:

df <- data.frame(Doubles=double(),
                 Ints=integer(),
                 Factors=factor(),
                 Logicals=logical(),
                 Characters=character(),
                 stringsAsFactors=FALSE)

Obviously, the output is:

[1] Doubles    Ints       Factors    Logicals   Characters
<0 rows> (or 0-length row.names)

I would like to be able to add a constant value to this empty dataframe, something like this:

df['Country'] = 'CHL'

But I get the following error:

Error in `[<-.data.frame`(`*tmp*`, "Country", value = "CHL") : 
  replacement has 1 row, data has 0

I am quite awere that if the dataframe wasn't empty, this error would not pop out, but I would like to know how can I make this, I mean, if the dataframe is empty, at least the column Country is added to the dataframe without error and stay empty, but is the dataframe is not empty the constant value is added normally.

CodePudding user response:

The problem is that none of the other columns have any values; one cannot add to or delete from one column without doing the same to the other columns.

One option is to get what I call an NA-row of the data, i.e.,

df[NA,]
#    Doubles Ints Factors Logicals Characters
# NA      NA   NA    <NA>       NA       <NA>

and then add the column to that:

transform(df[NA,], Country = "CHL")
#    Doubles Ints Factors Logicals Characters Country
# NA      NA   NA    <NA>       NA       <NA>     CHL

In this case, since df is previously empty, you can just reassign back to that frame.

If, however, df actually had data and you want to append it, then either (a) all columns must be the same (name, order, and class) in both, or you need to add the missing columns. For instance,

newdf <- transform(df[NA,], Country = "CHL")
df <- cbind(df, newdf[setdiff(names(newdf), names(df))][min(1, nrow(df)),,drop=FALSE])
df_updated <- rbind(df, newdf)
df_updated
#    Doubles Ints Factors Logicals Characters Country
# NA      NA   NA    <NA>       NA       <NA>     CHL

These steps preserve the 'class' of each column.

CodePudding user response:

You can do this if instead of relying on R to "recycle" the values the right number of times you explicitly use rep:

df = data.frame(x = numeric())
df['Country'] = rep("CHL", nrow(df))
df
# [1] x       Country
# <0 rows> (or 0-length row.names)

df = data.frame(x = 1:3)
df['Country'] = rep("CHL", nrow(df))
df
#   x Country
# 1 1     CHL
# 2 2     CHL
# 3 3     CHL

CodePudding user response:

See the documentation for data.frame https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/data.frame

With my understanding this isn't possible using an actual data frame. "A data frame is a list of variables of the same number of rows with unique row names, given class "data.frame". If no variables are included, the row names determine the number of rows."

The way adding something like Country, and having it populate all of the rows with that constant value work is it basically repeats that string up until the number of rows, but that doesn't work if the number of rows in the data frame is less than the string length. The same error is thrown if you try to add 2 countries to a data frame with one row.

You could set df[1,]=NA first and then run countries, and this will work, but technically thats not empty.

  •  Tags:  
  • Related