# Split single column into two columns use apply()
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split(",")))
print(df)
1- why when i change the code to .apply(lambda x: str(x).split("," , expand=True))
i got an error which is "expand is invalid argument to split function"
2- why do i have to use pd.Series() although the default return value of str.split() is Series
3- how does pd.Series() return a series while it returns a DF -here-
i tried to write expand and use it normally but it didn't work
here is the DF
import pandas as pd
import numpy as np
technologies = {
'Student_details':["Pramodh_Roy", "Leena_Singh", "James_William", "Addem_Smith"],
'Courses':["Spark", "PySpark", "Pandas", "Hadoop"],
'Fee' :[25000, 20000, 22000, 25000]
}
df = pd.DataFrame(technologies)
print(df)
CodePudding user response:
df[['First Name', 'Last Name']] = df["Student_details"].str.split("_", expand=True)
I don't get what you want... is it about the solution above? Or do you really wanna know why #1 throws an error?
EDIT 1: The expand parameter does not exist for the split function of the str type you are referring to, as this is the str type of python. The expand parameter has been written for the split function for a Series in pandas.
EDIT 2: Re your third question: As you can see in my suggestion, I'm not using even the pd.Series function, however of course the df["Student_details"] is a series. The key here in my answer is, that the "expand" parameter is here returning a DF with as many columns as required for the split results. So if one of the names were "a_b_c_d" I would get in total a df with four columns.
CodePudding user response:
FYI, this does work:
technologies = {
'Student_details':["Pramodh_Roy", "Leena_Singh", "James_William", "Addem_Smith"],
'Courses':["Spark", "PySpark", "Pandas", "Hadoop"],
'Fee' :[25000, 20000, 22000, 25000]
}
df = pd.DataFrame(technologies)
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split("_")))
print(df)
Output:
Student_details Courses Fee First Name Last Name
0 Pramodh_Roy Spark 25000 Pramodh Roy
1 Leena_Singh PySpark 20000 Leena Singh
2 James_William Pandas 22000 James William
3 Addem_Smith Hadoop 25000 Addem Smith