Home > OS >  In Python, why does pandas.Series.between function return error "'str' object has no
In Python, why does pandas.Series.between function return error "'str' object has no

Time:12-17

I am writing a function to classify ICD-10 codes into dummy variables for particular causes of death. The pandas.Series.between function works fine in a one-liner, but fails when placed in a user-created function.

When I create a dummy variable outside of a function, it works fine. For example:

df["copd"] = df["icd10"].between("j40", "j4799").astype(int)
df["copd"].value_counts()

0    41071
1     1957
Name: copd, dtype: int64

However, it throws an attribution error when I try to place this in a user-created function:


def classify_death(row):
     copd = row["icd10"].between("c00", 
     "c9799").astype(int)
     return copd

df["copd"] = df.apply(classify_death, axis=1)

...

\~\\AppData\\Local\\Temp\\ipykernel_4684\\1881079059.py in classify_death(row)
1 def classify_death(row):
\----\> 2     copd = row["dmcaacme"].between("c00", "c9799").astype(int)
3     return copd
4
5

AttributeError: 'str' object has no attribute 'between'

Any ideas? Many thanks in advance for any help!

CodePudding user response:

No function is needed. Just apply .between directly to the column

df["copd"] = df['icd10'].between("j40", "j4799").astype(int)

CodePudding user response:

It looks like the issue is that you are passing a string to the between method instead of a Pandas series. The between method is a Pandas series method, so it can only be called on a Pandas series object. In your code, row["icd10"] is a string, so calling the between method on it is causing the attribute error. To fix this, you can pass the entire row to the function as a Pandas series object and call the between method on the icd10 column. Here's an example of how you can modify your function to fix the error:

def classify_death(row):
    copd = row["icd10"].between("c00", "c9799").astype(int)
    return copd

df["copd"] = df.apply(classify_death, axis=1)

With this change, the between method will be called on the icd10 column, which is a Pandas series, and the function should work as expected.

I hope this helps! Let me know if you have any further questions or if you need more information.

  • Related