I have a dataframe with 0s and 1s
a 1 1 1 1 0 0 0 1 0 0 0 0 0
b 1 1 1 1 0 0 0 1 1 0 0 0 0
c 1 1 1 1 0 0 0 1 1 1 1 0 0
d 1 1 1 1 0 0 0 1 1 1 1 0 0
e 1 1 1 1 0 0 0 0 0 0 0 1 1
f 1 1 1 1 1 1 1 0 0 0 0 0 0
(No header)
I want to make a function that if a certain list with strings given (row name),
the output will be the number of columns exactly matched with strings
For example,
def exact_match(ls1):
~~~~~
return col_num
print(exact_match(['c', 'd']))
>>> 2
The output is 2 because
The exact matching columns are only two.
CodePudding user response:
The question is unclear, but if you want to get the columns for which there is only 1s in the provided indices and not in the other rows, you can use:
def exact_match(ls1):
# 1s on the provided indices
m1 = df.loc[ls1].eq(1).all()
# no 1s in the other rows
m2 = df.drop(ls1).ne(1).all()
# slice and get shape
return df.loc[:, m1&m2].shape[1]
# or
# return (m1&m2).sum()
print(exact_match(['c', 'd']))
# 2
CodePudding user response:
If I understood your mean, correctly
and, your dataframe was something like:
df = pd.DataFrame(data = [
["a", 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0],
["b", 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0],
["c", 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0],
["d", 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0],
["e", 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1],
["f", 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
])
df = df.rename(columns = {0:"name"}).set_index("name")
then:
def exact_match(lst):
s = df[df.columns[df.loc[lst].sum(axis = 0) == len(lst)]].sum(axis = 0) == len(lst)
return len(s[s])
exact_match(["c","d"]) # output: 2