I have a data set of two columns where when columnA = 1 then in ColumnB I want to count the number of occurrences of x that is comma delimited
Sample data set that is within an excel file
columnA columnB
1 x,a,b,c
2 d,e,g
3 a,r,x
4 y,x,o,a
What I've tried
if any ('1' in str(x) for x in excel_file['columnA']):
count = excel_file['columnB'].astype(str).str.contains('x').value_counts()[True]
else:
count = 0
This does get me the number of occurrences of x but it gets me all occurrences and not only when columnA is equal to 1 in the same row.
I know in excel it could be written as countifs=(columnA, "1", columnB, "*x*")
but can't seem to find a similar way of doing this within python
Any help would be greatly appreciated!
CodePudding user response:
While waiting for more completed sample data and expected outcome, I'd provide my preliminary answer:
Assuming this is how your data look:
import pandas as pd
data = [[1, ['x', 'a', 'b', 'c']], [2, ['d', 'e', 'g']], [3, ['a', 'r', 'x']], [4, ['y', 'x', 'o', 'a']]]
excel_file = pd.DataFrame(data, columns=['columnA', 'columnB'])
Usually, I'd do your task with a for loop by row:
for idx, row in excel_file.iterrows():
if row['columnA'] == 1:
count = 0
for item in row['columnB']:
if item == 'x':
count = 1
print('Row', idx, 'has', count, 'occurrence(s) of x')
CodePudding user response:
Looks like this was actually able to be solved pretty easily using len.
filter = excel_file[excel_file['columnA'].astype(str).str.contains('1')
filter_2 = len([filter[filter['columnB'].astype(str).str.contains('x')])
I had tried something similar earlier but I think separating it out into multiple functions rather than one was able to solve this pretty easily.
Thanks Yee for trying to provide an answer!