I wish to create a new list which contains column names of those columns which have atleast one of the following values.
- Most of the time
- Quite Often
- Less than often
- Never
Sample :
df ={'A':['name1', 'name2', 'name3', 'name4'],
'B':['Most of the Time', 'Never', 'Quite Often', 'Less than
Often'],
'C':['abc', 'DEF', 'ghi', 'Jkl'],
'D':['1', '2', '3', '4'],
'E':['Most of the Time', 'Never', 'Quite Often', 'Less than
Often'],
'F':['Most of the Time', 'Never', 'Quite Often', 'Less than
Often']
}
h_ls = ['B','E','F']
I tried the following code
df.columns
h_ls=[]
for x in df.columns:
xi = str(x)
for i in df[x]:
if i.startswith("Most of the Time") or i.startswith("Quite
Often") or i.startswith("Less that Often") or
i.startswith("Never"):
h_ls.append(xi)
break
else:
continue
I get an error that says 'Timestamp' object has no attribute 'startswith'
It gets stuck on the first column where the condition is false.
Can anyone tell me where I am committing the mistake or if there exists a better solution? I have dropped the timestamp column from the data frame but still it's popping up.
CodePudding user response:
Your error is coming up because not all your data are strings, so the .startswith
method fails. You can use .astype(str)
to force a series to be strings. In your example that would be for i in df[x].astype(str):
It is, however, bad practice to loop over the series. Instead you could apply your check to the whole column at once.
accepted_strings = ["Most of the time", "Quite Often", "Less than often", "Never"]
h_ls = [col for col in df.columns if df[col].isin(accepted_strings).any()]
Here df[col].isin(accepted_strings)
returns a boolean series of [True, False, False...]
corresponding to whether the values in df[col]
are in your accepted_string list. .any()
then returns True if any of the values in this boolean series are True.
CodePudding user response:
Here is another way:
df.columns[df.isin(accepted_strings).any()].tolist()
Output:
['B', 'E', 'F']