Home > database >  vectorizing a function to use entire dataframe column instead of single value
vectorizing a function to use entire dataframe column instead of single value

Time:10-13

I have a function to set colors. Currently, I loop through a dataframe and pass a single value to the function, cross reference that value to its corresponding color value and return the color value. I now want to pass the entire column from the dataframe (instead of looping through the dataframe) and return an array of color values.

Here is a simplified version of the function that currently works passing a single value (I just set the single value instead of showing the entire loop through the dataframe):

    def set_LineQualityColor(LineQ):
      data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3], 
            ['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1], 
            ['lightgray', 9]]
      df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])   
      c=df[df['LineQuality']==LineQ]['CR'].values[0]
    return c
    
    LQ=4
    c= set_LineQualityColor(LQ)

How can I get this to work correctly when LineQ is a column from a dataframe? i.e.

c= set_LineQualityColor(df.LQ)

Or is there a more efficient way to go about doing this? New to python. Thanks.

CodePudding user response:

You can pass a new (or column of) dataframe to join both to get result.

>>> def set_LineQualityColor_df(LineQ):
...     data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3],
...             ['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1],
...             ['lightgray', 9]]
...     df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
...     #c=df[df['LineQuality']==LineQ]['CR'].values[0]
...     c = df.set_index('LineQuality').join(LineQ)
...     return c
...
>>> df_lineQ = pd.DataFrame({ 'LineQuality': [4,5]})
>>> set_LineQualityColor_df(df_lineQ).head(5)
                     CR  LineQuality
LineQuality
0.0                    grey          4.0
1.0          cornflowerblue          5.0
2.0              lightgreen          NaN
3.0                seagreen          NaN
4.0               mistyrose          NaN

You can pass specific dataframe column.

>>> set_LineQualityColor_df(df_lineQ.LineQuality).head(5)
                         CR  LineQuality
LineQuality
0.0                    grey          4.0
1.0          cornflowerblue          5.0
2.0              lightgreen          NaN
3.0                seagreen          NaN
4.0               mistyrose          NaN
>>>

CodePudding user response:

Set LineQuality as the index.

data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3], 
            ['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1], 
            ['lightgray', 9]]

df = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
df.set_index(['LineQuality'], drop=True, inplace=True)

Which gives this dataframe:

                         CR
LineQuality                
0.0                    grey
1.0          cornflowerblue
2.0              lightgreen
3.0                seagreen
4.0               mistyrose
4.1              lightcoral
5.0               rosybrown
5.1               indianred
9.0               lightgray

Then lookup using loc.

LQ_df = pd.DataFrame([1, 5, 4, 9, 4.1, 0, 4.0], columns=['LQ'])

LQ = LQ_df['LQ']

df.loc[LQ, 'CR']

Which gives this series:

LineQuality
1.0    cornflowerblue
5.0         rosybrown
4.0         mistyrose
9.0         lightgray
4.1        lightcoral
0.0              grey
4.0         mistyrose

It doesn't make sense to create the df dataframe every time you call the function, so it's better to create it once before calling the function. Then, you can define the function to use df.loc like we did before:

data = [['grey', 0], ['cornflowerblue', 1], ['lightgreen', 2],['seagreen', 3], 
            ['mistyrose', 4], ['lightcoral', 4.1],['rosybrown', 5], ['indianred', 5.1], 
            ['lightgray', 9]]

lineq_color_lookup = pd.DataFrame(data, columns = ['CR', 'LineQuality'])
lineq_color_lookup.set_index(['LineQuality'], drop=True, inplace=True)

def get_LineQualityColor(LineQ):
    return lineq_color_lookup.loc[LineQ, 'CR'] # .tolist() if you want it as a list

I also changed the function name to get_LineQualityColor because the function doesn't set anything -- it only returns the color corresponding to the given LineQuality.

  • Related