im trying to create a dataframe from a csv file, there's multiple columns and rows. One of the columns has either 'yes' or 'no'. I only want the dataframe to include the rows that have 'yes' Can someone show me how to write this code? Thanks in advance.
CodePudding user response:
You can read the file then filter the dataframe to only get "yes" rows.For example:
df = pd.read_csv("data.csv")
df = df[df.column == 'yes']
CodePudding user response:
Here are some ways that can help you.
Say that your column name is choice
and your data frame name is df
df_new = df[df['choice'] == 'yes']
In this case, if you run df_new, you will get your datagram that only has yes
.
Same to the code below.
mask = df['choice'] == 'yes'
# new dataframe with selected rows
df_new = pd.DataFrame(df[mask])
You can also try this:
# condition with df.values property
mask = df['choice'].values == 'yes'
# new dataframe
df_new = df[mask]
print(df_new)