Home > Net >  why can't we use the argument expand of split() inside apply() -in pandas-
why can't we use the argument expand of split() inside apply() -in pandas-

Time:01-19

# Split single column into two columns use apply()
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split(",")))
print(df)

1- why when i change the code to .apply(lambda x: str(x).split("," , expand=True)) i got an error which is "expand is invalid argument to split function"

2- why do i have to use pd.Series() although the default return value of str.split() is Series

3- how does pd.Series() return a series while it returns a DF -here-

i tried to write expand and use it normally but it didn't work

here is the DF

import pandas as pd
import numpy as np
technologies = {
    'Student_details':["Pramodh_Roy", "Leena_Singh", "James_William", "Addem_Smith"],
    'Courses':["Spark", "PySpark", "Pandas",  "Hadoop"],
    'Fee' :[25000, 20000, 22000, 25000]
              }
df = pd.DataFrame(technologies)
print(df)

CodePudding user response:

df[['First Name', 'Last Name']] = df["Student_details"].str.split("_", expand=True)

I don't get what you want... is it about the solution above? Or do you really wanna know why #1 throws an error?

EDIT 1: The expand parameter does not exist for the split function of the str type you are referring to, as this is the str type of python. The expand parameter has been written for the split function for a Series in pandas.

EDIT 2: Re your third question: As you can see in my suggestion, I'm not using even the pd.Series function, however of course the df["Student_details"] is a series. The key here in my answer is, that the "expand" parameter is here returning a DF with as many columns as required for the split results. So if one of the names were "a_b_c_d" I would get in total a df with four columns.

CodePudding user response:

FYI, this does work:

technologies = {
    'Student_details':["Pramodh_Roy", "Leena_Singh", "James_William", "Addem_Smith"],
    'Courses':["Spark", "PySpark", "Pandas",  "Hadoop"],
    'Fee' :[25000, 20000, 22000, 25000]
              }
df = pd.DataFrame(technologies)
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split("_")))
print(df)

Output:

  Student_details  Courses    Fee First Name Last Name
0     Pramodh_Roy    Spark  25000    Pramodh       Roy
1     Leena_Singh  PySpark  20000      Leena     Singh
2   James_William   Pandas  22000      James   William
3     Addem_Smith   Hadoop  25000      Addem     Smith
  • Related