I have a dataframe that looks like the below (inclusive of the brackets and quotes):
ID | Interests |
---|---|
2131 | ['music','art','travel'] |
3213 | [] |
3132 | ['martial arts'] |
3232 | ['martial arts'] |
The desired output I am trying to get is:
ID | Interests |
---|---|
2131 | 3 |
3213 | 0 |
3132 | 1 |
3232 | 1 |
I've tried using
from collections import Counter
ravel = np.ravel(user.personal_interests.to_list())
But that just gives me the count of each combination i.e.: ['martial arts']:2
I've also tried stripping the quotes and using a series to count, but to no avail.
CodePudding user response:
If you have lists (['music','art','travel']
):
df['Interests'] = df['Interests'].str.len()
If you have strings ("['music','art','travel']"
):
from ast import literal_eval
df['Interests'] = df['Interests'].apply(literal_eval).str.len()
Or, if you know that there are no quoted commas:
df['Interests'] = df['Interests'].str.count(',').add(df['Interests'].ne('[]'))
CodePudding user response:
You can try using len() method in Python
If df is your dataframe,
df['new_interests'] = df['Interests'].apply(lambda x: temp.append(len(x)))