Given this sample dataframe I am trying to extract all rows in which the Code column has a value starting with 'HCC' and all rows that follow until the next 'HCC' into separate dataframes:
Code Description
HCC001 <empty string>
A1 Description of A1
B1 Description of B1
HCC002 <empty string>
A2 Description of A2
B2 Description of B2
C2 Description of C2
HCC003 <empty string>
....
I have tried itterrows, just looping through the columns as well.
The expected output is: Dataframe 1
Code Description
HCC001 <empty string>
A1 Description of A1
B1 Description of B1
Dataframe 2
Code Description
HCC002 <empty string>
A2 Description of A2
B2 Description of B2
C2 Description of C2
....
This did not work: hcc_lst = [] #check each row in the first column for 'HCC' and make a separate dataframe
for i in range(len(icd_data)):
if 'HCC' in icd_data.iloc[i,0]:
hcc_lst.append(icd_data.iloc[i,0])
I am sure that something with itterrows should work I just cannot find this precise answer.
Thanks for taking a look.
CodePudding user response:
Create groups and use locals()
to create new python variables dynamically (strongly discouraged, maybe your should use a dictionary to store sub dataframes):
for i, subdf in df.groupby(df['Code'].str.startswith('HCC').cumsum()):
locals()[f"df{i}"] = subdf
Output:
>>> df1
Code Description
0 HCC001 <empty string>
1 A1 Description of A1
2 B1 Description of B1
>>> df2
Code Description
3 HCC002 <empty string>
4 A2 Description of A2
5 B2 Description of B2
6 C2 Description of C2
>>> df3
Code Description
7 HCC003 <empty string>
CodePudding user response:
You can do that using eq for checking rows starting with "HCC" and cumsum() to segregate the DataFrames likewise:
for k, v in df.groupby(df['Code'].str[:3].eq('HCC').cumsum()):
print(f"DataFrame {k}")
print(v)
print('\n')