Home > Software design >  How would I go about iterating through each row in a column and keeping a running tally of every sub
How would I go about iterating through each row in a column and keeping a running tally of every sub

Time:04-22

Essentially what I am trying to do is go through the "External_Name" column, row by row, and get a count of unique substrings within each string, kind of like .value_counts().

External_Name Specialty
ABMC Hyperbaric Medicine and Wound Care Hyperbaric/Wound Care
ABMC Kaukauna Laboratory Services Laboratory
AHCM Sinai Bariatric Surgery Clinic General Surgery
........... ...........
n n

For example, after running through the first three rows in "External_Name" the output would be something like

Output Count
ABMC 2
Hyperbaric 1
Medicine 1
and 1
Wound 1
Care 1

So on and so forth. Any help would be really appreciated!

CodePudding user response:

You can split at whitespace with str.split(), then explode the resulting word lists into individual rows and count the values with value_counts.

>>> df.External_Name.str.split().explode().value_counts()
ABMC          2
Hyperbaric    1
Medicine      1
and           1
Wound         1
Care          1
Kaukauna      1
Laboratory    1
Services      1
AHCM          1
Sinai         1
Bariatric     1
Surgery       1
Clinic        1
Name: External_Name, dtype: int64
  • Related