I have a data.frame and I want to add an extra column based on a pattern of an other column, but with unequal length of a numeric list.
class(mylist)
[1] "numeric"
mylist
[1] 90 100 97 100 93 100 90 100 100 100 100 100 100 100 96 100 100 100 100 100
This is my data.frame, i just show part of it
df[16:26,]
# A tibble: 11 × 9
parent node branch.length label isTip x y branch angle
<int> <int> <dbl> <chr> <lgl> <dbl> <dbl> <dbl> <dbl>
1 30 16 0.0000117 sample-59 TRUE 0.0213 15 0.0213 257.
2 39 17 0.0000179 sample-62 TRUE 0.0213 4 0.0213 68.6
3 32 18 0.0000212 sample-63 TRUE 0.0213 3 0.0213 51.4
4 40 19 0.0000147 sample-68 TRUE 0.0213 5 0.0213 85.7
5 40 20 0.0000147 sample-69 TRUE 0.0213 6 0.0213 103.
6 28 21 0.00630 sample-5 TRUE 0.0213 11 0.0181 189.
7 22 22 0 NA FALSE 0 8.20 0 140.
8 22 23 0.0143 NA FALSE 0.0143 13.9 0.00715 239.
9 22 24 0.0129 NA FALSE 0.0129 2.47 0.00645 42.3
10 24 25 0.000115 NA FALSE 0.0130 3.94 0.0130 67.5
11 25 26 0.00241 NA FALSE 0.0154 5.88 0.0142 101.
So, I want to add mylist to the end of the data frame, but only with those rows with FALSE in isTip column.
I usually do this like:
Filter
dfisTip <- filter(df, isTip == FALSE)
add the list as column (btp)
dfisTip$btp <- mylist
and join the dataframes
df <- left_join(df, dfisTip)
Joining, by = c("parent", "node", "branch.length", "label", "isTip", "x", "y", "branch", "angle")
df[16:26, ]
# A tibble: 11 × 10
parent node branch.length label isTip x y branch angle btp
<int> <int> <dbl> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 30 16 0.0000117 sample-59 TRUE 0.0213 15 0.0213 257. NA
2 39 17 0.0000179 sample-62 TRUE 0.0213 4 0.0213 68.6 NA
3 32 18 0.0000212 sample-63 TRUE 0.0213 3 0.0213 51.4 NA
4 40 19 0.0000147 sample-68 TRUE 0.0213 5 0.0213 85.7 NA
5 40 20 0.0000147 sample-69 TRUE 0.0213 6 0.0213 103. NA
6 28 21 0.00630 sample-5 TRUE 0.0213 11 0.0181 189. NA
7 22 22 0 NA FALSE 0 8.20 0 140. 90
8 22 23 0.0143 NA FALSE 0.0143 13.9 0.00715 239. 100
9 22 24 0.0129 NA FALSE 0.0129 2.47 0.00645 42.3 97
10 24 25 0.000115 NA FALSE 0.0130 3.94 0.0130 67.5 100
11 25 26 0.00241 NA FALSE 0.0154 5.88 0.0142 101. 93
All the rows with TRUE in isTip column have NA in btp column, and those with FALSE have the number!!
I just was wonder if it is any other simple way to do it????
Thanks!!!
CodePudding user response:
We generally index data frames with data[rows, columns]
. If you want to assign mylist
to the "btp"
column for the rows where isTip == FALSE
(which we'll write as !isTip
), then you can do it like this:
df[!isTip, "btp"] <- mylist
mylist
will need to be the correct length.
Or, in dplyr
, you could do this without splitting your data (though I prefer the direct assignment above).
df %>%
left_join(data.frame(isTip = FALSE, btp = mylist), by = "isTip")
CodePudding user response:
You are right - it is not necessary to split and rejoin the data. Here is an example with the iris dataset:
data("iris")
vector_length_virginica <- c(1:50)
iris$newcol[iris$Species=="virginica"] <- vector_length_virginica
Now the new column newcol
contains the values of the vector (not list) vector_length_virginica
where Species=="viriginica")
, and the rest of the values in that column are NA
.