How to add column to dataframe with unequal length in R-CodePudding

I have a data.frame and I want to add an extra column based on a pattern of an other column, but with unequal length of a numeric list.

class(mylist)
[1] "numeric"

mylist 
 [1]  90 100  97 100  93 100  90 100 100 100 100 100 100 100  96 100 100 100 100 100

This is my data.frame, i just show part of it

df[16:26,]
# A tibble: 11 × 9
   parent  node branch.length label     isTip      x     y  branch angle
    <int> <int>         <dbl> <chr>     <lgl>  <dbl> <dbl>   <dbl> <dbl>
 1     30    16     0.0000117 sample-59 TRUE  0.0213 15    0.0213  257. 
 2     39    17     0.0000179 sample-62 TRUE  0.0213  4    0.0213   68.6
 3     32    18     0.0000212 sample-63 TRUE  0.0213  3    0.0213   51.4
 4     40    19     0.0000147 sample-68 TRUE  0.0213  5    0.0213   85.7
 5     40    20     0.0000147 sample-69 TRUE  0.0213  6    0.0213  103. 
 6     28    21     0.00630   sample-5  TRUE  0.0213 11    0.0181  189. 
 7     22    22     0         NA        FALSE 0       8.20 0       140. 
 8     22    23     0.0143    NA        FALSE 0.0143 13.9  0.00715 239. 
 9     22    24     0.0129    NA        FALSE 0.0129  2.47 0.00645  42.3
10     24    25     0.000115  NA        FALSE 0.0130  3.94 0.0130   67.5
11     25    26     0.00241   NA        FALSE 0.0154  5.88 0.0142  101.

So, I want to add mylist to the end of the data frame, but only with those rows with FALSE in isTip column.

I usually do this like:

Filter

dfisTip <- filter(df, isTip == FALSE)

add the list as column (btp)

dfisTip$btp <- mylist

and join the dataframes

df <- left_join(df, dfisTip)
Joining, by = c("parent", "node", "branch.length", "label", "isTip", "x", "y", "branch", "angle")



df[16:26, ]
# A tibble: 11 × 10
   parent  node branch.length label     isTip      x     y  branch angle   btp
    <int> <int>         <dbl> <chr>     <lgl>  <dbl> <dbl>   <dbl> <dbl> <dbl>
 1     30    16     0.0000117 sample-59 TRUE  0.0213 15    0.0213  257.     NA
 2     39    17     0.0000179 sample-62 TRUE  0.0213  4    0.0213   68.6    NA
 3     32    18     0.0000212 sample-63 TRUE  0.0213  3    0.0213   51.4    NA
 4     40    19     0.0000147 sample-68 TRUE  0.0213  5    0.0213   85.7    NA
 5     40    20     0.0000147 sample-69 TRUE  0.0213  6    0.0213  103.     NA
 6     28    21     0.00630   sample-5  TRUE  0.0213 11    0.0181  189.     NA
 7     22    22     0         NA        FALSE 0       8.20 0       140.     90
 8     22    23     0.0143    NA        FALSE 0.0143 13.9  0.00715 239.    100
 9     22    24     0.0129    NA        FALSE 0.0129  2.47 0.00645  42.3    97
10     24    25     0.000115  NA        FALSE 0.0130  3.94 0.0130   67.5   100
11     25    26     0.00241   NA        FALSE 0.0154  5.88 0.0142  101.     93

All the rows with TRUE in isTip column have NA in btp column, and those with FALSE have the number!!

I just was wonder if it is any other simple way to do it????

Thanks!!!

CodePudding user response：

We generally index data frames with data[rows, columns]. If you want to assign mylist to the "btp" column for the rows where isTip == FALSE (which we'll write as !isTip), then you can do it like this:

df[!isTip, "btp"] <- mylist

mylist will need to be the correct length.

Or, in dplyr, you could do this without splitting your data (though I prefer the direct assignment above).

df %>%
  left_join(data.frame(isTip = FALSE, btp = mylist), by = "isTip")

CodePudding user response：

You are right - it is not necessary to split and rejoin the data. Here is an example with the iris dataset:

data("iris")

vector_length_virginica  <- c(1:50)
iris$newcol[iris$Species=="virginica"]  <- vector_length_virginica

Now the new column newcol contains the values of the vector (not list) vector_length_virginica where Species=="viriginica"), and the rest of the values in that column are NA.