Screenshot of results from Updated code from suggestion
'dlist' is a list of provider id that is in a dataframe. I tried to use a while loop for 'dlist', but it only returns the value of the last provider id in the array. In this case it is 1005. I used the append function but it didn't do anything. The additional 74 rows from provider id 1000 are not showing. How do I combine everything so it shows the values of both numbers from dlist, to equal 684 rows?
dlist = ["1000", "1005"]
final_list = pd.DataFrame()
index = 0
while index < len(dlist):
provider = dlist[index]
# Filter dentist (CHANGEABLE)
final_list = report_df[(report_df["provider_id"] == provider)]
# Sort values of the codes
final_list = final_list.sort_values(['codes','report_month'], ascending=True)
# Drop 'report_year' column
final_list = final_list.drop(['report_year'], axis = 1)
# Change 'report_month' numbers into month name
final_list = final_list.replace({'report_month': {1: "January",
2: "February",
3: "March",
4: "April",
5: "May",
6: "June",
7: "July",
8: "August",
9: "September",
10: "October",
11: "November"}})
final_list.append(final_list)
index =1
CodePudding user response:
Could create a list with all the dataframes and then concatenate them. Like before the while loop have a list_of_dfs = []
, and prior to the index =1
add list_of_dfs.append(final_list)
. You probably dont want final_list.append(final_list)
. Eventually could do my_df_of_concern = pd.concat(list_of_dfs, index=0)
. See https://pandas.pydata.org/docs/reference/api/pandas.concat.html
CodePudding user response:
Your problem is you are modifying the same variable again and again. In your code:
Line 1: while index < len(dlist):
Line 2: provider = dlist[index]
Line 3: # Filter dentist (CHANGEABLE)
Line 4: final_list = report_df[(report_df["provider_id"] == provider)] # PROBLEM LINE
Line 5: # MORE CODE
Line 6: # MORE CODE
Line 7: final_list.append(final_list)
Line 8: index =1
Since your dlist
has ["1000", "1005"]
, during the first run, in line 4, final_list
has all the rows where provider_id == 1000
. Then you make some modifications to it and then in Line 7, you append it to the same object. So now, final_list is going to have 2 copies of everything because you are doing final_list.append(final_list)
Then you increment index and for the next iteration where provider is now 1005
, you again do Line 4 where by your final_list is going to be overwritten. This means that all your previous values stored in that variable is no longer present only the new values where provider_id == 1005
is present.
Try changing your code like this
while index < len(dlist):
provider = dlist[index]
# Filter dentist (CHANGEABLE)
report_list = report_df[(report_df["provider_id"] == provider)]
# Sort values of the codes
report_list = report_list.sort_values(['codes','report_month'], ascending=True)
# Drop 'report_year' column
report_list = report_list.drop(['report_year'], axis = 1)
# Change 'report_month' numbers into month name
report_list = report_list.replace({'report_month': {1: "January",
2: "February",
3: "March",
4: "April",
5: "May",
6: "June",
7: "July",
8: "August",
9: "September",
10: "October",
11: "November"}})
final_list.append(report_list)
index =1
report_list
acts as a temporary variable which holds all the data of a particular provider and then after all your modifications like dropping report_year column, sorting, etc. you append the values to final_list. Now you will have the data across multiple iterations.
Also, instead of doing
while index < len(dlist):
provider = dlist[index]
index =1
you can do this:
for provider in dlist:
# YOUR CODE where provider will be "1000" for 1st run and "1005" in second run