Count occurances of a specific string within multi-valued elements in a set-CodePudding

I have generated a list of genes

genes = ['geneName1', 'geneName2', ...]

and a set of their interactions:

geneInt = {('geneName1', 'geneName2'), ('geneName1', 'geneName3'),...}

I want to find out how many interactions each gene has and put that in a vector (or dictionary) but I struggle to count them. I tried the usual approach:

interactionList = []
for gene in genes:
   interactions = geneInt.count(gene)
   interactionList.append(ineractions)

but of course the code fails because my set contains elements that are made out of two values while I need to iterate over the single values within.

CodePudding user response：

I would argue that you are using the wrong data structure to hold interactions. You can represent interactions as a dictionary keyed by gene name, whose values are a set of all the genes it interacts with.

Let's say you currently have a process that does something like this at some point:

geneInt = set()
...
    geneInt.add((gene1, gene2))

Change it to

geneInt = collections.defaultdict(set)
...
    geneInt[gene1].add(gene2)

If the interactions are symmetrical, add a line

    geneInt[gene2].add(gene1)

Now, to count the number of interactions, you can do something like

intCounts = {gene: len(ints) for gene, ints in geneInt.items()}

Counting your original list is simple if the interactions are one-way as well:

intCounts = dict.fromkeys(genes, 0)
for gene, _ in geneInt:
    intCounts[gene]  = 1

If each interaction is two-way, there are three possibilities:

Both interactions are represented in the set: the above loop will work.

Only one interaction of a pair is represented: change the loop to

for gene1, gene2 in geneInt:
    intCounts[gene1]  = 1
    if gene1 != gene2:
        intCounts[gene2]  = 1

Some reverse interactions are represented, some are not. In this case, transform geneInt into a dictionary of sets as shown in the beginning.

CodePudding user response：

Try something like this,

interactions = {}

for gene in genes:
    interactions_count = 0
    for tup in geneInt:
        interactions_count  = tup.count(gene)
    interactions[gene] = interactions_count

CodePudding user response：

Use a dictionary, and keep incrementing the value for every gene you see in each tuple in the set geneInt.

interactions_counter = dict()

for interaction in geneInt:
    for gene in interaction:
        interactions_counter[gene]  = interactions_counter.get(gene, 0)   1

The dict.get(key, default) method returns the value at the given key, or the specified default if the key doesn't exist. (More info)

For the set geneInt={('geneName1', 'geneName2'), ('geneName1', 'geneName3')}, we get:

interactions_counter = {'geneName1': 2, 'geneName2': 1, 'geneName3': 1}