I have a name column and I'm trying to extract the first 75% of characters of a string.
What I've tried:
data = [
["123", "NQMCare"],
["456", "CRAMER"],
["789", "Swinley Court"]
]
df = pd.DataFrame(data, columns=["ID", "Name"])
df["len"] = df["Name"].str.len()
df["len_75"] = (df.len * 0.75).fillna(0).astype(int)
df["Name 2"] = df["Name"].str[ : df.len_75 ]
df["Name 3"] = df["Name"].str.slice(0, df.len_75, 1)
df
# ID Name len len_75 Name 2 Name 3
# 0 123 NQMCare 7 5 NaN NaN
# 1 456 CRAMER 6 4 NaN NaN
# 2 789 Swinley Court 13 9 NaN NaN
I'm getting NaN
's when attempting to slice the string values. Not sure where I'm going wrong, since hardcoding a integer value like so df["Name"].str[:5]
works...
CodePudding user response:
Slicing in pandas is possible only by scalar, if need different values per rows is possible use DataFrame.apply
or list comprehension:
df["Name 2"] = df.apply(lambda x: x["Name"][ : x.len_75 ], axis=1)
df["Name 3"] = [a[:b] for a, b in zip(df['Name'], df['len_75'])]
print (df)
ID Name len len_75 Name 2 Name 3
0 123 NQMCare 7 5 NQMCa NQMCa
1 456 CRAMER 6 4 CRAM CRAM
2 789 Swinley Court 13 9 Swinley C Swinley C