I have data sets from 5 different years(US stocks data sets) - 2014, 2015,2016,2017,2018. 2014 has 901 variables, while 2015- 1386, 2016-1469 and etc. I want to downsize all of them to 901 so I can easily compare them and show the movement of stocks form 2014 to 2018. How can I do this?
CodePudding user response:
The following should get you going, Ayaz.
- the following defines a dataframe for 2014 and 2015 to simulate your data sets.
- trims the 2nd data frame based on the stock-names found in the first data frame. You will notice, there are a few extra "names" and one "missing".
Note that we look for the names in stock_2014$STOCK
. You may have them defined in another vector or pull them from elsewhere.
As you speak about filter, I assume you use the tidyverse. Here you can build your filter criteria on the names of the stock and use %in%
to check for their occurences.
library(dplyr)
# simulate your data for 2 years
stock_2014 <- data.frame(STOCK = c("A","B","C","D"), VALUE = c(12,34,56,78))
stock_2015 <- data.frame(STOCK = c("A", "A1","A2", "B", "D","D1","E"), VALUE = c(12,23,34,45,56,89,38))
stock_2015_trimmed <- stock_2015 %>% filter(STOCK %in% stock_2014$STOCK)
stock_2015_trimmed
STOCK VALUE
1 A 12
2 B 45
3 D 56