I am writing a function to classify ICD-10 codes into dummy variables for particular causes of death. The pandas.Series.between function works fine in a one-liner, but fails when placed in a user-created function.
When I create a dummy variable outside of a function, it works fine. For example:
df["copd"] = df["icd10"].between("j40", "j4799").astype(int)
df["copd"].value_counts()
0 41071
1 1957
Name: copd, dtype: int64
However, it throws an attribution error when I try to place this in a user-created function:
def classify_death(row):
copd = row["icd10"].between("c00",
"c9799").astype(int)
return copd
df["copd"] = df.apply(classify_death, axis=1)
...
\~\\AppData\\Local\\Temp\\ipykernel_4684\\1881079059.py in classify_death(row)
1 def classify_death(row):
\----\> 2 copd = row["dmcaacme"].between("c00", "c9799").astype(int)
3 return copd
4
5
AttributeError: 'str' object has no attribute 'between'
Any ideas? Many thanks in advance for any help!
CodePudding user response:
No function is needed. Just apply .between directly to the column
df["copd"] = df['icd10'].between("j40", "j4799").astype(int)
CodePudding user response:
It looks like the issue is that you are passing a string to the between method instead of a Pandas series. The between method is a Pandas series method, so it can only be called on a Pandas series object. In your code, row["icd10"] is a string, so calling the between method on it is causing the attribute error. To fix this, you can pass the entire row to the function as a Pandas series object and call the between method on the icd10 column. Here's an example of how you can modify your function to fix the error:
def classify_death(row):
copd = row["icd10"].between("c00", "c9799").astype(int)
return copd
df["copd"] = df.apply(classify_death, axis=1)
With this change, the between method will be called on the icd10 column, which is a Pandas series, and the function should work as expected.
I hope this helps! Let me know if you have any further questions or if you need more information.