Home > Software engineering >  Assign multiple pandas columns from list output
Assign multiple pandas columns from list output

Time:09-23

Situation: trying to split one column of a pandas df into two separate columns, without changing the original data - if possible using the .assign() method.

Below produces the expected result but each column requires its own assignment expression and feels like the wrong way of doing it.

pets = pd.DataFrame({'observation': ['black,cat', 'brown,dog']})

(
    pets
    .assign(colour = pets['observation'].str.split(',', expand=True)[0],
            animal = pets['observation'].str.split(',', expand=True)[1])
    .drop(columns='observation')
)

Below feels more like the right way: .str.split(...,expand=True) results in a list so a list of variable names feels like what I should provide.

# throws error
(
    pets
    .assign(colour, animal = pets['observation'].str.split(',', expand=True))
    .drop(columns='observation')
)

# throws error
(
    pets
    .assign([colour, animal] = pets['observation'].str.split(',', expand=True))
    .drop(columns='observation')
)
NameError: name 'colour' is not defined
   .assign([colour, animal] = pets['observation'].str.split(',', expand=True))
            ^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

I'm still getting used to working in pandas so any help is appreciated.

CodePudding user response:

You can try:

def app(s):
    s['colour']=s[0][0]
    s['animal']=s[0][1]
    return s

pets['observation']=pets.apply(lambda x: x[0].split(','),axis=1)
pets=pets.apply(app,axis=1)

CodePudding user response:

If you want to create a new dataframe, you can rename the columns using a simple dictionary:

cols = ['colour', 'animal']
new_df = (pets['observation']
              .str.split(',', expand=True)
              .rename(columns=dict(enumerate(cols)))
         )

output:

  colour animal
0  black    cat
1  brown    dog

Assuming you want to use a pipeline, you can craft a custom function and use pipe:

def split(df):
    df = df.copy()
    cols = ['colour', 'animal']
    df[cols] = df['observation'].str.split(',', expand=True)
    return df

(
    pets
    .pipe(split)
    .drop(columns='observation')
)

NB. this is only a simple pipeline example, of course you can craft a more interesting function with parameters

def split(df, col_to_split, cols):
    df = df.copy()
    df[cols] = df[col_to_split].str.split(',', expand=True)
    return df

(
    pets
    .pipe(split, col_to_split='observation', cols=['colour', 'animal'])
    .drop(columns='observation')
)

CodePudding user response:

You can create dictioanry of Series by DataFrame.set_axis and DataFrame.to_dict:

d = pets['observation'].str.split(',', expand=True).set_axis(cols, axis=1).to_dict('Series')
df1 = pets.assign(**d)
print (df1)
  observation colour animal
0   black,cat  black    cat
1   brown,dog  brown    dog

If possible assign to original DataFrame is possible use:

cols = ['colour', 'animal']
pets[cols] = pets['observation'].str.split(',', expand=True)
print (pets)
  observation colour animal
0   black,cat  black    cat
1   brown,dog  brown    dog

If need new DataFrame:

cols = ['colour', 'animal']
df = pets['observation'].str.split(',', expand=True).set_axis(cols, axis=1)
print (df)
  colour animal
0  black    cat
1  brown    dog
  • Related