Home > Enterprise >  Trying to filter data frame to rows that have a certain value
Trying to filter data frame to rows that have a certain value

Time:10-25

First post in the community (congrats or I am sorry are in order :-)). I provided some code below for survey data I am trying to analyze. I am trying to capture the rows that have the value "1" in any column. It was noted as a float, but I converted to an interger and it did not work. Used quotes and did not work. Any advice?

# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import json
from pprint import pprint
import requests
import time
from scipy import stats
import seaborn as sn
%matplotlib inline
    
# Read csv
us_path = "us_Data.csv"
us_responses = pd.read_csv(us_path)

# Created filtered data frame.    
preexisting_us = us_responses

# Filter data.
preexisting_us = us_responses[us_responses["diabetes"] == "1" | us_responses(us_responses["cardiovascular_disorders"] == "1") | us_responses(us_responses["obesity"] == "1") | us_responses(us_responses["respiratory_infections"] == "1") | us_responses(us_responses["respiratory_disorders_exam"] == "1") | us_responses(us_responses["gastrointestinal_disorders"] == "1") | us_responses(us_responses["chronic_kidney_disease"] == "1") | us_responses(us_responses["autoimmune_disease"] == "1") | us_responses(us_responses["chronic_fatigue_syndrome_a"] == "1")]

CodePudding user response:

First of all, you probably should define your new DataFrame as a copy of the orignal one, such as df = us_responses.copy(). In this way you are sure that the original DataFrame will not modified (I suggest you to have a look at the documentation).

Now, to filter the DataFrame you can use simpler ways than the one of your code. For example:

cols_to_check = ['diabetes', 'cardiovascular_disorders', ... ]
df_filtered = df.loc[df[cols_to_check].sum(axis=1) > 0, :]

In this way, by calculating the sum of the selected columns, if at least one has value 1, the corresponding row is kept in the filtered DataFrame.

However, if you really want to keep your code the way it is (which I would not suggest), you need to make some corrections:

preexisting_us = preexisting_us[preexisting_us["diabetes"] == 1 | (preexisting_us["cardiovascular_disorders"] == 1) | (preexisting_us["obesity"] == 1) | (preexisting_us["respiratory_infections"] == 1) | (preexisting_us["respiratory_disorders_exam"] == 1) | (preexisting_us["gastrointestinal_disorders"] == 1) | (preexisting_us["chronic_kidney_disease"] == 1) | (preexisting_us["autoimmune_disease"] == 1) | (preexisting_us["chronic_fatigue_syndrome_a"] == 1)]

If you are interested in more info about filtering using loc(), here you can find the documentation.

Please, follow @mozway suggestions for posting clearer questions next time.

  • Related