Home > OS >  Insert value from iterator to a pandas column based on condition
Insert value from iterator to a pandas column based on condition

Time:11-22

Suppose to have a dataframe like this

   A
0  -1
1  -1
2  1
3  0
4  -2
5  3
6  0

I want to add a new column with an incremental value in every row where the value of the column A is negative.

   A   B
0  -1  1
1  -1  2
2  1   nan
3  0   nan
4  -2  3
5  3   nan
6  0   nan

Here is my code

ids = iter(range(1, np.sum((df['A'] < 0).values.ravel())   1))
df['B'] = np.where(df['A'] < 0, next(ids), np.nan)

Unfortunately what I get is

   A   B
0  -1  1
1  -1  1
2  1   nan
3  0   nan
4  -2  1
5  3   nan
6  0   nan

I tried also with a generator

def id_generator(max_id):
    curr = 1
    while curr <= max_id:
        yield curr
        curr  = 1

df['B'] = np.where(df['A'] < 0, next(id_generator(np.sum((df['A'] < 0).values.ravel()))), np.nan)

As the previous solution I get this dataframe

   A   B
0  -1  1
1  -1  1
2  1   nan
3  0   nan
4  -2  1
5  3   nan
6  0   nan

It seems that it creates an iterator/generator for every row it processes and for that reason the id is always 1. The only one solution that I found is using an intermediate dataframe

index = df[df['A'] < 0].index
new_df = pd.DataFrame(data=[x   1 for x in range(len(index))], columns=['B'], index=index)
df = df.join(new_df)

So my question is, is there a way to stick with np.where solution instead of create a new temp dataframe? Or are there any pandas bultin functions to do so?

CodePudding user response:

Use DataFrame.loc for possible set values by range:

m = df['A'] < 0
df.loc[m, 'B'] = range(1, m.sum()   1)
print (df)
   A    B
0 -1  1.0
1 -1  2.0
2  1  NaN
3  0  NaN
4 -2  3.0
5  3  NaN
6  0  NaN

CodePudding user response:

Almost anything can be done using numpy and pandas functions directly. Try to avoid iterators and generators

I have a solution using a couple of lines

First, create a dataframe for your example:

import numpy as np
import pandas as pd
df = pd.DataFrame(data={'A': [-1,2,-1,0,-1,-1,2,3,5,-1]})

You can use cumsum() cummulative sum to count how many negative numbers you've encountered and assign it to a new column

df['B'] = (df['A'] < 0).cumsum()
    A   B
0   -1  1
1   2   1
2   -1  2
3   0   2
4   -1  3
5   -1  4
6   2   4
7   3   4
8   5   4
9   -1  5

This still will have a values for the positive number in column A, so you can replace these by NaN values

df.loc[df['A'] >=0, 'B'] = np.NaN

You end up with:

    A   B
0   -1  1.0
1   2   NaN
2   -1  2.0
3   0   NaN
4   -1  3.0
5   -1  4.0
6   2   NaN
7   3   NaN
8   5   NaN
9   -1  5.0
  • Related