I'm learning R in my Data-Driven Business course at my university, so it's brand new to me. I'm making a data project that analyzes data from a .csv file.
I have tried this, and it doesn't provide me with the right kind of result.
My problem is removing rows based on values from the column "Year_Birth".
I have tried:
# Read a csv file using read.csv()
csv_file = read.csv(file = "filtered_data.csv",
stringsAsFactors = FALSE, header = TRUE)
BabyBoomer = csv_file$Year_Birth[ csv_file$Year_Birth >= 1946 & csv_file$Year_Birth <= 1964]
head(BabyBoomer)
print::
[1] 1957 1954 1959 1952 1946 1946
y = csv_file$Year_Birth[csv_file$Year_Birth <= 1964]
BabyBoomer <- csv_file[-c(y), ]
head(BabyBoomer)
print:: df but without something changed
I would like to be able to create a subset with all rows deleted beside those <= 1964
CodePudding user response:
y = csv_file$Year_Birth[csv_file$Year_Birth <= 1964]
After executing the snippet above, y
will contain a vector of Year_Birth
<= 1964 but what you need to extract the subset you desire is a vector containing the indices of the data.frame where Year_Birth
<= 1964. This code will do that:
y <- which(csv_file$Year_Birth <= 1964)
BabyBoomer <- csv_file[ y, ]
head(BabyBoomer)
CodePudding user response:
Try using the y <- subset()
function. With that you can say subset(dataset, dataset$year <= 1946)
.
EDIT: you can also then say if you only want a vector containing years, you can say subset(dataset$year, dataset$year <= 1946)
Check out this documentation, helped me a lot to get started: https://homerhanumat.github.io/elemStats/
Hope this helps!