beginner python question here that I've had struggles getting answered from related stack questions.
I've got a list
dfList = df0,df1,df2,...,df7
I've got a function that I've defined and takes a dataframe as its argument. I'm not sure the function itself matters, but to be safe it is basically
def rateCalc (outcomeDataFrame):
rateList = list()
upperRateList = list()
lowerRateList = list()
for i in range(len(outcomeDataFrame)):
lowlevel, highlevel = proportion_confint(count=outcomeDataFrame.iloc[i,4], nobs=outcomeDataFrame.iloc[i,3])
lowerRateList.append(lowlevel)
rateList.append(outcomeDataFrame.iloc[i,4]/outcomeDataFrame.iloc[i,3])
upperRateList.append(highlevel)
outcomeDataFrame = outcomeDataFrame.assign(lowerRate=lowerRateList)
outcomeDataFrame = outcomeDataFrame.assign(midrate=rateList)
outcomeDataFrame = outcomeDataFrame.assign(upperRate=upperRateList)
return outcomeDataFrame
What I'm trying to do is append a the observed success ratio of two numbers as well as their 95% confidence interval. Goes fine when working with any individual df.
What I want to accomplish is turn each item of dfList into a version of itself with those lowerRate, midRate, and higherRate values appended as new columns.
When I try to apply across each dataframe with
for i in range(len(dfList):
rateCalc(dfList[i])
though, it seems to only execute for df0. I can't make any sense of that; a full error I'd assume I had some basic flaw in the code, but it seems to work for df0 and then not iterate to df1 and beyond.
I also thought there may be an issue of "df1 != dfList[1]" in some backend sense (that running the function on the item in a list dfList[1] would not have any affect on the original item df1) but, again, the fact it seems to work with df0 would imply that's not the issue.
I also tried throwing some mud at the wall with the "map" function but am not sure I understand how to use that in this context (or any other for that matter ha)
Thanks all
CodePudding user response:
I think it is because the assing function returns another Data Frame which only exists inside the function scope, here is an example
import pandas as pd
df_0 = pd.DataFrame(data = [{'column':'a'}])
df_1 = pd.DataFrame(data = [{'column':'c'}])
df_2 = pd.DataFrame(data = [{'column':'d'}])
df_altos = df_0,df_1,df_2
def mod_df(df):
test = list()
test.append('d')
#print('id before setting another column ' str(id(df)))
#df['b'] = test
print('id before assinging ' str(id(df)))
df = df.assign(lowerRate = test)
print('id after assinging ' str(id(df)))
return df
for i in range(len(df_altos)):
mod_df(df_altos[i])
The returning id of each dataframe is the following
id before assinging 1833832455136
id after assinging 1833832523568
id before assinging 1833832456144
id after assinging 1833832525776
id before assinging 1833832454416
id after assinging 1833832521888
As you can see, the id changes. You could try another atribution method, as the following
def mod_df(df):
test = list()
test.append('d')
print('id before setting another column ' str(id(df)))
df['b'] = test
print('id after assinging ' str(id(df)))
return df
which outputs
id before setting another column 1833831955520
id after assinging 1833831955520
id before setting another column 1833791973888
id after assinging 1833791973888
id before setting another column 1833791973264
id after assinging 1833791973264
Now the ids are the same and the new column exists on all the dataframes. How the first dataframe of you code was working i dont know.