I have a pandas column with a string value and I want to see if a separate column (listed format) contains the string at all.
Category | top predicted |
---|---|
Category A. Molecular Pathogenesis and Physiology | list see below |
[("Category A. Molecular Pathogenesis and Physiology::HiClass::Separator::1. Amyloid beta::HiClass::Separator::f. Amyloid Structure",
0.054),
('Category B. Diagnosis and Assessment::HiClass::Separator::8. Methodologies::HiClass::Separator::None',
0.049),
('Category B. Diagnosis and Assessment::HiClass::Separator::1. Fluid Biomarkers::HiClass::Separator::b. Blood-based',
0.035)]
The list generated provides Category
and 2 further sub-categories.
What I desire is a way to determine and identify how many times the Category
column value appears in the list for column top predicted
. In the above case "Category A. Molecular Pathogenesis and Physiology" for example would return a 1. If the value was "Category B. Diagnosis and Assessment" then 2 would be returned.
This would then iterate through the rows in the pandas dataframe.
Any help in achieving this would be much appreciated :) Many thanks!
CodePudding user response:
Your second column contains a list of tuples, which in turn contain the strings to check for. The following lines of code should do it:
df['count'] = df.apply(lambda row: sum(1 for x in row['top predicted'] if row['Category'] in x[0]), axis=1)
You should use apply()
instead of iterating over the rows as you suggested.