I have a table with relative abundances of strings collected elsewhere. I also have a list of features that are associated with the strings. What is the best way to explore each string for each feature and sum the relative abundancies.
Example Input Table:
──────────── ────────────
| String | Abundance |
──────────── ────────────
| abcdef | 12 |
| cdefgh | 15 |
| fghijk | 36 |
| jklmnoabc | 37 |
──────────── ────────────
Example String Features:
cdef, abc, jk
Example Output
────────── ────────────────
| Feature | Abundance (%) |
────────── ────────────────
| cdef | 27 |
| abc | 59 |
| jk | 73 |
────────── ────────────────
Any help would be greatly appreciated!
CodePudding user response:
The answer is to go through the list of string feature for each string you have and use the in
operator of Python.
This will check if your feature has an occurence in the string you apply it to.
You then want to accumulate abundance and associate it to your feature.
CodePudding user response:
You can use this code:
a = [['abcdef', 12], ['cdefgh', 15], ['fghijk', 36], ['jklmnoabc', 37]]
s = ['cdef', 'abc', 'jk']
def func(a, s):
ans = []
for i in s:
ans.append(0)
for j in a:
if i in j[0]: # string j[0] contains string i
ans[len(ans) - 1] = j[1]
return ans
print(func(a, s))
CodePudding user response:
Did you try to do a loop with a regex? Because, you must go through your two lists (inputs and features). I don't think there is a particular algorithm to highly accelerate the process.
Here is what I'm thinking about
import re
for feature in features:
p = re.compile(f"{feature.string}")
feature.abundance = 0
for input in inputs:
m = p.match(input.string)
if m: # if not None
feature.abundance = input.abundance
With that, you will have all your stuff in your features
list.