How to append series to a dataframe with Pandas?-CodePudding

I would like to iterate through rows of a dataframe df_mask (4368 rows x 232 columns), generate a Pandas Series series and recreate a dataframe container from the Series. My problem with the code below is that it takes several minutes to complete.

How could I fasten the code execution ?

df_prices = get_prices_df()
container = pd.DataFrame()

for idx, row in df_mask.iterrows():
    cols = row[row == True].index
    series = df_prices.loc[idx, cols].rank(axis=0, ascending=False, na_option='bottom').le(10)
    df = pd.DataFrame([series])
    container = pd.concat([container, df], axis=0).fillna(False)

CodePudding user response：

Assuming your input data is similar to this.

np.random.seed(10)
df_prices = pd.DataFrame(np.random.choice(list(range(10)), size=100).reshape(10,-1))
df_mask = pd.DataFrame(np.random.choice([True, False], size=100).reshape(10,-1))

then you can create container without loop for using where directly on the full dataframe df_prices with the dataframe df_mask. rank along the columns (axis=1) because with this method you don't iterate, then compare to 10 (here 5 for the example) and fillna with False (although I don't think it is necessary but I don't have time to check that).

container_fast= (
    df_prices.where(df_mask)
      .rank(axis=1, ascending=False, na_option='bottom')
      .le(5) # replace by 10, but in the used input it makes all True
      .fillna(False)
)
print(container_fast)
       0      1      2      3      4      5      6      7      8      9
0   True   True  False  False  False  False   True   True   True  False
1   True   True   True  False  False   True  False  False  False   True
2   True   True  False   True  False   True  False  False  False  False
3  False  False   True  False  False  False  False  False  False   True
4   True  False  False   True  False  False  False   True  False   True
5  False   True   True  False  False   True  False  False  False   True
6  False  False   True  False  False  False   True  False   True   True
7  False  False  False  False  False   True   True  False  False   True
8  False   True  False  False   True   True  False  False   True  False
9   True  False   True  False   True  False   True  False  False   True

creating container like you do, then if I do (container == container_fast).all().all() I get True.

CodePudding user response：

Instead of iterating through rows (which is inefficient) try using apply function. You can find the documentation here.

You can do something like this:

def helper_function(row):
    row.index # Gives you the index of the row
    row['column_name'] # Gives you the value of specific column in that row
    # do some logic in here
    return something

new_series = yor_df.apply(helper_function, axis=1)

You can iterate through both rows and columns.