Home > Back-end >  Cannot update multiple rows and columns when new values include a list
Cannot update multiple rows and columns when new values include a list

Time:09-21

I am selecting multiple rows based on a condition, and updating values in multiple columns. This works unless one of the values is a list.

First, a dataframe:

>>> dummy_data = {'A': ['abc', 'def', 'def'],
          'B': ['red', 'purple', 'blue'],
          'C': [25, 94, 57],
          'D': [False, False, False],
          'E': [[9,8,12], [36,72,4], [18,3,5]]}

>>> df = pd.DataFrame(dummy_data)

     A       B   C      D            E
0  abc     red  25  False   [9, 8, 12]
1  def  purple  94  False  [36, 72, 4]
2  def    blue  57  False   [18, 3, 5]

Things that work:

This works to select multiple rows and update multiple columns:

>>> df.loc[df['A'] == 'def', ['B', 'C', 'D']] = ['orange', 42, True]

     A       B   C      D            E
0  abc     red  25  False   [9, 8, 12]
1  def  orange  42   True  [36, 72, 4]
2  def  orange  42   True   [18, 3, 5]

This works to update column E with a new list:

>>> new_list = [1,2,3]

>>> df.loc[df['A'] == 'def', ['E']] = pd.Series([new_list] * len(df))

     A       B   C      D           E
0  abc     red  25  False  [9, 8, 12]
1  def  purple  94  False   [1, 2, 3]
2  def    blue  57  False   [1, 2, 3]

But how to do both?

I can't figure out an elegant way to combine these approaches.

Attempt 1 This works, but I get the ndarray from ragged nested sequences warning:

>>> new_list = [1,2,3]
>>> updates = ['orange', 42, new_list]
>>> num_rows = df.A.eq('def').sum()

>>> df.loc[df['A'] == 'def', ['B', 'C', 'E']] = [updates] * num_rows

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences ...

     A       B   C      D           E
0  abc     red  25  False  [9, 8, 12]
1  def  orange  42  False   [1, 2, 3]
2  def  orange  42  False   [1, 2, 3]

Attempt 2 This works, but seems overly complicated:

>>> new_list = [1,2,3]
>>> updates = ['orange', 42, new_list]
>>> num_rows = df.A.eq('def').sum()

>>> df2 = pd.DataFrame([updates] * num_rows)

>>> df.loc[df['A'] == 'def', ['B', 'C', 'E']] = df2[[0, 1, 2]].values

     A       B   C      D           E
0  abc     red  25  False  [9, 8, 12]
1  def  orange  42  False   [1, 2, 3]
2  def  orange  42  False   [1, 2, 3]

CodePudding user response:

You can use pandas.DataFrame to assign/align the new values with the selected columns with the help of a boolean mask.

mask = df['A'] == 'def'
cols = ['B', 'C', 'D', 'E']
new_list = [1,2,3]
updates = ['orange', 42, True, [new_list]]

df.loc[mask, cols] = pd.DataFrame(dict(zip(cols, updates)), index=df.index)
    

>>> print(df)

     A       B   C      D           E
0  abc     red  25  False  [9, 8, 12]
1  def  orange  42   True   [1, 2, 3]
2  def  orange  42   True   [1, 2, 3]
[Finished in 589ms]

CodePudding user response:

Create a numpy array with object dtype:

df.loc[df['A'] == 'def', ['B', 'C', 'E']] = np.array([updates] * num_rows, dtype='object')

Output:

     A       B   C      D           E
0  abc     red  25  False  [9, 8, 12]
1  def  orange  42  False   [1, 2, 3]
2  def  orange  42  False   [1, 2, 3]

However, as commented, [updates] * num_rows is a dangerous operation. For example, later you want to modify one of the array value:

df.iloc[-1,-1].append(4)

Then your data becomes (notice the change in row 1 as well):

     A       B   C      D             E
0  abc     red  25  False    [9, 8, 12]
1  def  orange  42  False  [1, 2, 3, 4]
2  def  orange  42  False  [1, 2, 3, 4]
  • Related