Home > front end >  Applying conditional statements on lists stored in Dataframe cell
Applying conditional statements on lists stored in Dataframe cell

Time:10-18

I would like to create a column that is the result of boolean logic of list stored in other column.

import pandas as pd
import numpy as np
d = {'202201': [7180516.0, 4868058.0], '202202': [433433740.0, 452632806.0], '202203': [5444119.0, 10000000.0]}
df = pd.DataFrame(data=d)

#Storing Values in List
df['seq'] = df.agg(list, axis=1)
#Or
#df['seq'] = df.agg(np.array, axis=1)
df

Desired output I want is a new col (df['seqToFs']) that is a True or False list For values in df['seq']list > 8000000.

import numpy as np
d = {'202201': [7180516.0, 4868058.0], '202202': [433433740.0, 452632806.0], '202203': [5444119.0, 10000000.0], 
     'seq':[[7180516.0,433433740.0,5444119.0],[4868058.0,452632806.0,10000000.0]], 'seqToFs':[[False,True,False],[False,True,True]]}
df = pd.DataFrame(data=d)
df

Is it better to make df['seq'] a list or np.array for performance?

My end goals is to analyze sequential orders of values meeting conditions. Is there a better way to perform such analysis than making lists in dataframe?

Example frame work of what I was trying to apply to each row. (Not my code)

original_prices = [1.25, -9.45, 10.22, 3.78, -5.92, 1.16]
prices = [True if i > 0else False for i in original_prices]
prices

Where original_prices list is replaced with row list, df['seq'] and prices is new col df['seqToFs]. Getting errors because of list format.

Help would be much appreciated.

CodePudding user response:

You can use the normal > operator and then use agg or apply to get the desired output:

(df > 8000000).apply(list, axis=1)

0    [False, True, False]
1     [False, True, True]

example:

df = pd.DataFrame({'202201': [7180516.0, 4868058.0], '202202': [433433740.0, 452632806.0], '202203': [5444119.0, 10000000.0]})
df['seqToFs'] = (df > 8000000).apply(list, axis=1)
  • Related