First post here seeking solution that I have not been able to solve. I'm essentially trying to iterate over a pandas dataframe without overwriting the previous. Currently, I've got code that gets the first instance but fails to iterate over the next set of "values". See below. Any help is much appreciated!!
import pandas as pd
import numpy as np
# load csv data
data = pd.read_csv(r"C:\Users\Admin\Desktop\data.csv")
# convert csv data to pandas dataframe
df = pd.DataFrame(data, columns = ["Date", "B1", "B2", "B3", "B4", "B5"]
Main Code:
for n in range(1,10):
df[n] = pd.Series(["X" if df.iat[x,y] == n else "" for y in range(1,6) for x in range(0,4)])
Incorrect Output Table:
Date | B1 | B2 | B3 | B4 | B5 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8/22/19 | 2 | 3 | 38 | 39 | 40 | X | |||||||
8/25/19 | 5 | 8 | 16 | 20 | 25 | X | |||||||
8/27/19 | 3 | 4 | 19 | 35 | 39 | X | |||||||
8/29/19 | 1 | 2 | 6 | 15 | 19 | X |
CORRECT Output Table:
Date | B1 | B2 | B3 | B4 | B5 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8/22/19 | 2 | 3 | 38 | 39 | 40 | X | X | ||||||
8/25/19 | 5 | 8 | 16 | 20 | 25 | X | X | ||||||
8/27/19 | 3 | 4 | 19 | 35 | 39 | X | X | ||||||
8/29/19 | 1 | 2 | 6 | 15 | 19 | X | X | X |
I hope this makes sense. Thank you!! :)
CodePudding user response:
Let us try not use for loop and apply
s = df.filter(like='B').stack()
s = pd.crosstab(s.index.get_level_values(level=0),s).replace({0:'',1:'X'})
s.columns = s.columns.astype(str)
out = s.reindex(columns = df.columns).combine_first(df)
Date B1 B2 B3 B4 B5 1 2 3 4 5 6 7 8
row_0
0 8/22/19 2.0 3.0 38.0 39.0 40.0 X X
1 8/25/19 5.0 8.0 16.0 20.0 25.0 X X
2 8/27/19 3.0 4.0 19.0 35.0 39.0 X X
3 8/29/19 1.0 2.0 6.0 15.0 19.0 X X X
CodePudding user response:
I think using the apply function will make things easier: whenever you make a new column, you iterate through each row of the DataFrame, and each row of your new column is set to "X" if the column value is contained in the row being considered. To avoid overwriting anything, you can index the DataFrame by "Date", and only apply this function to the first 5 columns of the original DataFrame.
for n in range(1,10):
df[n] = df.iloc[:,:5].apply(lambda x: "X" if n in x.values else "", axis=1)
Result:
df
B1 B2 B3 B4 B5 1 2 3 4 5 6 7 8 9
Date
8/22/19 2 3 38 39 40 X X
8/25/19 5 8 16 20 25 X X
8/27/19 3 4 19 35 39 X X
8/29/19 1 2 6 15 19 X X X