Essentially what I am trying to do is go through the "External_Name" column, row by row, and get a count of unique substrings within each string, kind of like .value_counts().
External_Name | Specialty |
---|---|
ABMC Hyperbaric Medicine and Wound Care | Hyperbaric/Wound Care |
ABMC Kaukauna Laboratory Services | Laboratory |
AHCM Sinai Bariatric Surgery Clinic | General Surgery |
........... | ........... |
n | n |
For example, after running through the first three rows in "External_Name" the output would be something like
Output | Count |
---|---|
ABMC | 2 |
Hyperbaric | 1 |
Medicine | 1 |
and | 1 |
Wound | 1 |
Care | 1 |
So on and so forth. Any help would be really appreciated!
CodePudding user response:
You can split at whitespace with str.split()
, then explode
the resulting word lists into individual rows and count the values with value_counts
.
>>> df.External_Name.str.split().explode().value_counts()
ABMC 2
Hyperbaric 1
Medicine 1
and 1
Wound 1
Care 1
Kaukauna 1
Laboratory 1
Services 1
AHCM 1
Sinai 1
Bariatric 1
Surgery 1
Clinic 1
Name: External_Name, dtype: int64