I am doing a Udemy tutorial on data analysis and machine learning and I have come across an issue that I am not able to understand fully.
The dataset being used is available on Kaggle and is called 911.csv.
I am supposed to Create a new feature ** In the titles column there are "Reasons/Departments" specified before the title code. These are EMS, Fire, and Traffic. Use .apply() with a custom lambda expression to create a new column called "Reason" that contains this string value.**
*For example, if the title column value is EMS: BACK PAINS/INJURY, the Reason column value would be EMS. *
when I do this for the first row in a column in works
x=df['title'].iloc[0]
x.split(':')[0]
But the issue is when I do [ I am trying to do this to remove all the ':' in the 'title' column]
x = df['title']
x.split(':')
I get the following error
AttributeError: 'Series' object has no attribute 'split'
Can you tell me what am I doing wrong here?
CodePudding user response:
You should do:
x.str.split(':')
For example:
import pandas as pd
df = pd.DataFrame({"A":["a:b", "b:c"]}
df["A"].str.split(":")
OUTPUT
0 [a, b]
1 [b, c]
CodePudding user response:
I was able to get the code working by doing the following step by step approach :
df['title']
df['title'].iloc[0]
x = df['title'].iloc[0]
x.split(':')
x.split(':')[0]
y = x.split(':')[0]
y
And then create the lambda function using the variable y created :
df['reason'] = df['title'].apply(lambda y: y.split(':')[0] )
df['reason']