I have a column in a dataframe that looks something like this
0 NaN
1 ["arts"]
2 ["sports", "tech"]
3 ["arts", "finance", "health", "sports", "science"...
4 ["finance", "sports", "tech"]
5 ["arts", "finance", "sports", "tech"]
6 ["arts", "finance", "health", "sports", "science"...
7 ["arts", "sports", "science"]
I would love to know how many times Arts occured across all these lists of lists. However, upon trying out column.explode().value_counts(sort=True)
I just get a basic distribution with options which is not what I want.
["tech"] 5
["arts", "finance", "sports", "tech"] 2
["arts", "sports"] 2
["finance", "sports"] 1
["arts"] 1
I even tried using counters collections.Counter(itertools.chain.from_iterable(v.split(',') for v in column))
but I get the following error 'float' object has no attribute 'split'
Any pointers?
CodePudding user response:
If column
is not too long, a simple nested loop should work just fine:
count = 0
for str_list in column:
for name in str_list:
if name == "arts":
count = 1
print(count)
CodePudding user response:
You can create a mask of rows that contain arts by doing something like:
mask = df['industry'].apply(lambda x: 'comedy' in x)
And then restrict your dataframe to your new mask
df = df[mask]
From there you should just be able to use len(df) or something like that.