Home > Software engineering >  Pandas: add column name to a list, if the column contains a specific set of value
Pandas: add column name to a list, if the column contains a specific set of value

Time:04-02

I wish to create a new list which contains column names of those columns which have atleast one of the following values.

  1. Most of the time
  2. Quite Often
  3. Less than often
  4. Never

Sample :

df ={'A':['name1', 'name2', 'name3', 'name4'],

 'B':['Most of the Time', 'Never', 'Quite Often', 'Less than 
        Often'],

 'C':['abc', 'DEF', 'ghi', 'Jkl'],
 'D':['1', '2', '3', '4'],
 'E':['Most of the Time', 'Never', 'Quite Often', 'Less than 
        Often'],
 'F':['Most of the Time', 'Never', 'Quite Often', 'Less than 
        Often']       
           }

h_ls = ['B','E','F']

I tried the following code

df.columns

h_ls=[]

    

for x in df.columns:
    xi = str(x)
    for i in df[x]:
        if i.startswith("Most of the Time") or i.startswith("Quite 
            Often") or i.startswith("Less that Often") or 
            i.startswith("Never"):
            
            h_ls.append(xi)
            break
        else:
            continue

I get an error that says 'Timestamp' object has no attribute 'startswith'

It gets stuck on the first column where the condition is false.

Can anyone tell me where I am committing the mistake or if there exists a better solution? I have dropped the timestamp column from the data frame but still it's popping up.

CodePudding user response:

Your error is coming up because not all your data are strings, so the .startswith method fails. You can use .astype(str) to force a series to be strings. In your example that would be for i in df[x].astype(str):

It is, however, bad practice to loop over the series. Instead you could apply your check to the whole column at once.

accepted_strings = ["Most of the time", "Quite Often", "Less than often", "Never"]
h_ls = [col for col in df.columns if df[col].isin(accepted_strings).any()]

Here df[col].isin(accepted_strings) returns a boolean series of [True, False, False...] corresponding to whether the values in df[col] are in your accepted_string list. .any() then returns True if any of the values in this boolean series are True.

CodePudding user response:

Here is another way:

df.columns[df.isin(accepted_strings).any()].tolist()

Output:

['B', 'E', 'F']
  • Related