Home > front end >  Using lambda function to split a column in a Pandas dataset
Using lambda function to split a column in a Pandas dataset

Time:12-19

I am doing a Udemy tutorial on data analysis and machine learning and I have come across an issue that I am not able to understand fully.

The dataset being used is available on Kaggle and is called 911.csv.

I am supposed to Create a new feature ** In the titles column there are "Reasons/Departments" specified before the title code. These are EMS, Fire, and Traffic. Use .apply() with a custom lambda expression to create a new column called "Reason" that contains this string value.**

*For example, if the title column value is EMS: BACK PAINS/INJURY, the Reason column value would be EMS. *

when I do this for the first row in a column in works

x=df['title'].iloc[0]
x.split(':')[0]

But the issue is when I do [ I am trying to do this to remove all the ':' in the 'title' column]

x = df['title']
x.split(':')

I get the following error

AttributeError: 'Series' object has no attribute 'split'

Can you tell me what am I doing wrong here?

CodePudding user response:

You should do:

x.str.split(':')

For example:

import pandas as pd

df = pd.DataFrame({"A":["a:b", "b:c"]}

df["A"].str.split(":")

OUTPUT

0    [a, b]
1    [b, c]

CodePudding user response:

I was able to get the code working by doing the following step by step approach :

df['title']
df['title'].iloc[0]
x = df['title'].iloc[0]
x.split(':')
x.split(':')[0]
y = x.split(':')[0]
y

And then create the lambda function using the variable y created :

df['reason'] = df['title'].apply(lambda y: y.split(':')[0] )
df['reason']
  • Related