Home > Software engineering >  How to assign to column with non-string "name" or index?
How to assign to column with non-string "name" or index?

Time:06-08

Pandas' DataFrames have a method assign which will assign values to a column, and which differs from methods like loc or iloc in that it returns a DataFrame with the newly assigned column(s) without modifying any shallow copies or references to the same data.

The assign method uses argument names to denote column names (or "index" in pandas parlance), and it works fine if one is dealing with column names that are strings, but pandas supports usage of arbitrary python objects as column names.

Suppose I have a DataFrame with integers as column "names":

import pandas as pd
df = pd.DataFrame({
    0 : [1,2,3],
    1 : [4,5,6]
})

How can I assign, say, to the column 0?

This doesn't work:

df.assign(0 = df[[0]] 1)
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

Nor does this:

df.assign(**{0:df[[0]] 1})
TypeError: keywords must be strings

Now, I could use direct assign or loc, but it would modify the underlying data - for example:

df_shallow_copy = df
df[[0]] = df[[0]]   1

Now df_shallow_copy would have values [2,3,4] for column 0 instead of [1,2,3].

I could also do a full deep copy of all the columns, but that involves duplicating the data in memory and performing redundant operations:

df_shallow_copy = df
df = df.copy()
df[[0]] = df[[0]]   1

How can I assign to the column without generating a redundant deep copy and without potentially modifying other objects?

CodePudding user response:

You could side-step the integer keyword problem with a rename like this:

df.rename(columns={0:'tmp'}).assign(tmp=lambda x: x['tmp'] 1).rename(columns={'tmp':0})

   0  1
0  2  4
1  3  5
2  4  6

Would that work for your use-case?

CodePudding user response:

Not the most straightforward way, but you could use:

df.pipe(lambda d: d[[0]].add(1).combine_first(d))

output:

   0  1
0  2  4
1  3  5
2  4  6
  • Related