How would I go about iterating through each row in a column and keeping a running tally of every sub-CodePudding

Essentially what I am trying to do is go through the "External_Name" column, row by row, and get a count of unique substrings within each string, kind of like .value_counts().

External_Name	Specialty
ABMC Hyperbaric Medicine and Wound Care	Hyperbaric/Wound Care
ABMC Kaukauna Laboratory Services	Laboratory
AHCM Sinai Bariatric Surgery Clinic	General Surgery
...........	...........
n	n

For example, after running through the first three rows in "External_Name" the output would be something like

Output	Count
ABMC	2
Hyperbaric	1
Medicine	1
and	1
Wound	1
Care	1

So on and so forth. Any help would be really appreciated!

CodePudding user response：

You can split at whitespace with str.split(), then explode the resulting word lists into individual rows and count the values with value_counts.

>>> df.External_Name.str.split().explode().value_counts()
ABMC          2
Hyperbaric    1
Medicine      1
and           1
Wound         1
Care          1
Kaukauna      1
Laboratory    1
Services      1
AHCM          1
Sinai         1
Bariatric     1
Surgery       1
Clinic        1
Name: External_Name, dtype: int64