Home > Blockchain >  Is there an easy way to select across rows in a panda frame to create a new column?
Is there an easy way to select across rows in a panda frame to create a new column?

Time:11-12

Question updated, see below

I have a large dataframe similar in structure to e.g.

df=pd.DataFrame({'A': [0, 0, 0, 11, 22,33], 'B': [10, 20,30, 110, 220, 330], 'C':['x', 'y', 'z', 'x', 'y', 'z']})
df
    A   B   C
0   0   10  x
1   0   20  y
2   0   30  z
3   11  110 x
4   22  220 y
5   33  330 z

I want to create a new column by selecting the column value of B from a different row based on the value of C being equal to the current row and the value of A being 0, so the expected result is

    A   B   C   new_B_based_on_A_and_C
0   0   10  x   10
1   0   20  y   20
2   0   30  z   30
3   11  110 x   10
4   22  220 y   20
5   33  330 z   30

I want to have a simple solution without needing to have a for loop over the rows. Something like

df.apply(lambda row: df[df[(df['C']==row.C) & (df['A']==0)]]['B'].iloc[0], axis=1)

The dataframe is guaranteed to have those values and the values are unique


Update for a more general case
I am looking for a general solution that would also work for multiple columns to match on e.g.

df=pd.DataFrame({'A': [0, 0, 0,0, 11, 22,33, 44], 'B': [10, 20,30, 40, 110, 220, 330, 440], 'C':['x', 'y', 'x', 'y', 'x', 'y', 'x', 'y'], 'D': [1, 1, 5, 5, 1,1 ,5, 5]})

A   B   C   D
0   0   10  x   1
1   0   20  y   1
2   0   30  x   5
3   0   40  y   5
4   11  110 x   1
5   22  220 y   1
6   33  330 x   5
7   44  440 y   5

and the result would be then

A   B   C   D   new_B_based_on_A_C_D
0   0   10  x   1   10
1   0   20  y   1   20
2   0   30  x   5   30
3   0   40  y   5   40
4   11  110 x   1   10
5   22  220 y   1   20
6   33  330 x   5   30
7   44  440 y   5   40

CodePudding user response:

You can do a map:

# you **must** make sure that for each unique `C` value, 
# there is only one row with `A==0`.
df['new'] = df['C'].map(df.loc[df['A']==0].set_index('C')['B'])

Output:

    A    B  C  new
0   0   10  x   10
1   0   20  y   20
2   0   30  z   30
3  11  110  x   10
4  22  220  y   20
5  33  330  z   30

Explanation: Imagine you have a series s indicating the mapping:

idx 
idx1  value1
idx2  value2
idx3  value3

then that's what map does: df['C'].map(s).

Now, for a dataframe d:

C  B
c1 b1
c2 b2
c3 b3

we do s=d.set_index('C')['B'] to get the above form.

Finally, as mentioned, you mapping happens where A==0, so d = df[df['A']==0].

Composing the forward path:

mapping_data = df[df['A']==0]
mapping_series = mapping_data.set_index('C')['B']
new_values = df['C'].map(mapping_series)

and the first piece of code is just all these lines combined.

CodePudding user response:

If I understood the question, for the general case you could use a merge like this:

df.merge(df.loc[df['A'] == 0, ['B', 'C', 'D']], on=['C', 'D'], how='left', suffixes=('', '_new'))

Output:

A   B   C   D   B_new
0   10  x   1   10
0   20  y   1   20
0   30  x   5   30
0   40  y   5   40
11  110 x   1   10
22  220 y   1   20
33  330 x   5   30
44  440 y   5   40
  • Related