Home > other >  Pandas extract first N% of characters from string column in DataFrame
Pandas extract first N% of characters from string column in DataFrame

Time:12-08

I have a name column and I'm trying to extract the first 75% of characters of a string.

What I've tried:

data = [
    ["123", "NQMCare"],
    ["456", "CRAMER"],
    ["789", "Swinley Court"]
]
df = pd.DataFrame(data, columns=["ID", "Name"])
df["len"] = df["Name"].str.len()
df["len_75"] = (df.len * 0.75).fillna(0).astype(int)
df["Name 2"] = df["Name"].str[ : df.len_75 ]
df["Name 3"] = df["Name"].str.slice(0, df.len_75, 1)

df
#   ID  Name            len len_75  Name 2  Name 3
# 0 123 NQMCare         7   5       NaN     NaN
# 1 456 CRAMER          6   4       NaN     NaN
# 2 789 Swinley Court   13  9       NaN     NaN

I'm getting NaN's when attempting to slice the string values. Not sure where I'm going wrong, since hardcoding a integer value like so df["Name"].str[:5] works...

CodePudding user response:

Slicing in pandas is possible only by scalar, if need different values per rows is possible use DataFrame.apply or list comprehension:

df["Name 2"] = df.apply(lambda x: x["Name"][ : x.len_75 ], axis=1)
df["Name 3"] = [a[:b] for a, b in zip(df['Name'], df['len_75'])]
print (df)

    ID           Name  len  len_75     Name 2     Name 3
0  123        NQMCare    7       5      NQMCa      NQMCa
1  456         CRAMER    6       4       CRAM       CRAM
2  789  Swinley Court   13       9  Swinley C  Swinley C
  • Related