Iterating Over Pandas Dataframe Without Overwriting Previous-CodePudding

First post here seeking solution that I have not been able to solve. I'm essentially trying to iterate over a pandas dataframe without overwriting the previous. Currently, I've got code that gets the first instance but fails to iterate over the next set of "values". See below. Any help is much appreciated!!

import pandas as pd
import numpy as np

# load csv data
data = pd.read_csv(r"C:\Users\Admin\Desktop\data.csv")

# convert csv data to pandas dataframe
df = pd.DataFrame(data, columns = ["Date", "B1", "B2", "B3", "B4", "B5"]

Main Code:

for n in range(1,10):
    df[n] = pd.Series(["X" if df.iat[x,y] == n else "" for y in range(1,6) for x in range(0,4)])

Incorrect Output Table:

Date	B1	B2	B3	B4	B5	1	2	3	5
8/22/19	2	3	38	39	40		X
8/25/19	5	8	16	20	25				X
8/27/19	3	4	19	35	39			X
8/29/19	1	2	6	15	19	X

CORRECT Output Table:

Date	B1	B2	B3	B4	B5	1	2	3	4	5	6	8
8/22/19	2	3	38	39	40		X	X
8/25/19	5	8	16	20	25					X		X
8/27/19	3	4	19	35	39			X	X
8/29/19	1	2	6	15	19	X	X				X

I hope this makes sense. Thank you!! :)

CodePudding user response：

Let us try not use for loop and apply

s = df.filter(like='B').stack()
s = pd.crosstab(s.index.get_level_values(level=0),s).replace({0:'',1:'X'})
s.columns = s.columns.astype(str)
out = s.reindex(columns = df.columns).combine_first(df)
          Date   B1   B2    B3    B4    B5  1  2  3  4  5  6 7  8
row_0                                                            
0      8/22/19  2.0  3.0  38.0  39.0  40.0     X  X              
1      8/25/19  5.0  8.0  16.0  20.0  25.0              X       X
2      8/27/19  3.0  4.0  19.0  35.0  39.0        X  X           
3      8/29/19  1.0  2.0   6.0  15.0  19.0  X  X           X

CodePudding user response：

I think using the apply function will make things easier: whenever you make a new column, you iterate through each row of the DataFrame, and each row of your new column is set to "X" if the column value is contained in the row being considered. To avoid overwriting anything, you can index the DataFrame by "Date", and only apply this function to the first 5 columns of the original DataFrame.

for n in range(1,10):
    df[n] = df.iloc[:,:5].apply(lambda x: "X" if n in x.values else "", axis=1)

Result:

df
         B1  B2  B3  B4  B5  1  2  3  4  5  6 7  8 9
Date                                                
8/22/19   2   3  38  39  40     X  X                
8/25/19   5   8  16  20  25              X       X  
8/27/19   3   4  19  35  39        X  X             
8/29/19   1   2   6  15  19  X  X           X