Home > OS >  New column based on a rule AND equalling an changing value (python pandas dataframe)
New column based on a rule AND equalling an changing value (python pandas dataframe)

Time:09-30

I have a dataframe, called target, that I would like to fill, based on some rules, from an old dataframe, called source.

import pandas as pd
import numpy as np

target_d = {'ColA': ['a1', 'a1', 'a2', 'a2', 'a3', 'a3', 'a4'],
            'ColB': ['X', 'Y', 'X', 'Y', 'X', 'Y', 'X'],
            'ColC': ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7']
    }
target = pd.DataFrame(data=target_d)

source_d = {'Col1': ['a1', 'a2', 'a3', 'a4'],
            'Col2': ['e1', 'e2', 'e3', 'e4'],
            'Col3': ['f1', 'f2', 'f3', 'f4'],
            'Col4': ['g1', 'g2', 'g3', 'g4'],
    }
source = pd.DataFrame(data=source_d)
source = source.set_index('Col1')

target.loc[target['ColB'] == 'X', 'ColC'] = source.loc['a1', 'Col4']

The above starts with our two input dataframes:

  ColA ColB ColC
0   a1    X   c1
1   a1    Y   c2
2   a2    X   c3
3   a2    Y   c4
4   a3    X   c5
5   a3    Y   c6
6   a4    X   c7

     Col2 Col3 Col4
Col1
a1     e1   f1   g1
a2     e2   f2   g2
a3     e3   f3   g3
a4     e4   f4   g4

to then update target to provide:

  ColA ColB ColC
0   a1    X   g1
1   a1    Y   c2
2   a2    X   g1
3   a2    Y   c4
4   a3    X   g1
5   a3    Y   c6
6   a4    X   g1

however, what I would like is:

  ColA ColB ColC
0   a1    X   g1
1   a1    Y   c2
2   a2    X   g2
3   a2    Y   c4
4   a3    X   g3
5   a3    Y   c6
6   a4    X   g4

I want to condition on the ColB value being X and to then use the ColA value to find the Col4 value in the source, by using the index.

I somehow want the right hand side of the equality to read

source.loc[target.loc['ContractID'], 'Col4']

to use my indexing to find which value I want. I don't know if I'm completely barking up the wrong tree since the right hand side is normally a fixed value in everything that I've seen.

Any ideas of how I can produce the desired output?

CodePudding user response:

You can use create a dictionary with keys being the index of source and respective values being Col4 of source.

Then, map map with loc (to locate where ColB equals 'X') to update your target DF:

d = dict(zip(source.index,source.Col4))
target.loc[target['ColB'].eq('X'), 'ColC'] = target['ColA'].map(d)

Prints:

>>> target

  ColA ColB ColC
0   a1    X   g1
1   a1    Y   c2
2   a2    X   g2
3   a2    Y   c4
4   a3    X   g3
5   a3    Y   c6
6   a4    X   g4

Setup:

>>> target.to_dict()

{'ColA': {0: 'a1', 1: 'a1', 2: 'a2', 3: 'a2', 4: 'a3', 5: 'a3', 6: 'a4'},
 'ColB': {0: 'X', 1: 'Y', 2: 'X', 3: 'Y', 4: 'X', 5: 'Y', 6: 'X'},
 'ColC': {0: 'c1', 1: 'c2', 2: 'c3', 3: 'c4', 4: 'c5', 5: 'c6', 6: 'c7'}}

>>> source.to_dict()

{'Col2': {'a1': 'e1', 'a2': 'e2', 'a3': 'e3', 'a4': 'e4'},
 'Col3': {'a1': 'f1', 'a2': 'f2', 'a3': 'f3', 'a4': 'f4'},
 'Col4': {'a1': 'g1', 'a2': 'g2', 'a3': 'g3', 'a4': 'g4'}}
  • Related