I have generated a list of genes
genes = ['geneName1', 'geneName2', ...]
and a set of their interactions:
geneInt = {('geneName1', 'geneName2'), ('geneName1', 'geneName3'),...}
I want to find out how many interactions each gene has and put that in a vector (or dictionary) but I struggle to count them. I tried the usual approach:
interactionList = []
for gene in genes:
interactions = geneInt.count(gene)
interactionList.append(ineractions)
but of course the code fails because my set contains elements that are made out of two values while I need to iterate over the single values within.
CodePudding user response:
I would argue that you are using the wrong data structure to hold interactions. You can represent interactions as a dictionary keyed by gene name, whose values are a set of all the genes it interacts with.
Let's say you currently have a process that does something like this at some point:
geneInt = set()
...
geneInt.add((gene1, gene2))
Change it to
geneInt = collections.defaultdict(set)
...
geneInt[gene1].add(gene2)
If the interactions are symmetrical, add a line
geneInt[gene2].add(gene1)
Now, to count the number of interactions, you can do something like
intCounts = {gene: len(ints) for gene, ints in geneInt.items()}
Counting your original list is simple if the interactions are one-way as well:
intCounts = dict.fromkeys(genes, 0)
for gene, _ in geneInt:
intCounts[gene] = 1
If each interaction is two-way, there are three possibilities:
Both interactions are represented in the set: the above loop will work.
Only one interaction of a pair is represented: change the loop to
for gene1, gene2 in geneInt: intCounts[gene1] = 1 if gene1 != gene2: intCounts[gene2] = 1
Some reverse interactions are represented, some are not. In this case, transform
geneInt
into a dictionary of sets as shown in the beginning.
CodePudding user response:
Try something like this,
interactions = {}
for gene in genes:
interactions_count = 0
for tup in geneInt:
interactions_count = tup.count(gene)
interactions[gene] = interactions_count
CodePudding user response:
Use a dictionary, and keep incrementing the value for every gene you see in each tuple in the set geneInt
.
interactions_counter = dict()
for interaction in geneInt:
for gene in interaction:
interactions_counter[gene] = interactions_counter.get(gene, 0) 1
The dict.get(key, default)
method returns the value at the given key, or the specified default if the key doesn't exist. (More info)
For the set geneInt={('geneName1', 'geneName2'), ('geneName1', 'geneName3')}
, we get:
interactions_counter = {'geneName1': 2, 'geneName2': 1, 'geneName3': 1}