I am newer to programming and am running into an issue I am trying to solve. I am working with a dataframe using panadas in python. I have sorted the table and created some extra columns: https://i.stack.imgur.com/Y6lkN.png
{'Part Number': ['K4SD',
'K4SD',
'K4SD',
'K4SD',
'K4SD',
'K4SD',
'K4SD',
'K4SD',
'K4SD',
'K4SD',
'QOL2',
'QOL2',
'QOL2',
'QOL2',
'QOL2',
'QOL2',
'QOL2',
'QOL2',
'QOL2',
'QOL2'],
'Date': [Timestamp('2021-05-17 00:00:00'),
Timestamp('2021-05-23 00:00:00'),
Timestamp('2021-07-08 00:00:00'),
Timestamp('2021-08-17 00:00:00'),
Timestamp('2021-08-17 00:00:00'),
Timestamp('2021-10-18 00:00:00'),
Timestamp('2021-12-18 00:00:00'),
Timestamp('2021-12-20 00:00:00'),
Timestamp('2022-02-10 00:00:00'),
Timestamp('2022-03-31 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-18 00:00:00'),
Timestamp('2021-11-03 00:00:00'),
Timestamp('2021-11-03 00:00:00'),
Timestamp('2021-11-17 00:00:00'),
Timestamp('2021-11-24 00:00:00'),
Timestamp('2021-11-27 00:00:00'),
Timestamp('2021-12-22 00:00:00'),
Timestamp('2021-12-24 00:00:00'),
Timestamp('2022-03-21 00:00:00')],
'Code': ['SF22',
'KFS3',
'3FFS',
'Replacement needed',
'LA52',
'K2KA',
'Belt Broke',
'QET6',
'QET6',
'P0SF',
'Testing Broken',
'DP2L',
'SR2F',
'JKO2',
'DP2L',
'A2BF',
'KLL2',
'Light Off',
'A3SA',
'LA52'],
'Fix': ['na',
'na',
'na',
'Custom Status',
'na',
'na',
'Remade',
'na',
'na',
'na',
'Testing Procedure Fixed',
'na',
'na',
'na',
'na',
'na',
'na',
'Light Repair',
'na',
'na'],
'Fixed': ['No',
'No',
'No',
'Yes',
'No',
'No',
'Yes',
'No',
'No',
'No',
'Yes',
'No',
'No',
'No',
'No',
'No',
'No',
'Yes',
'No',
'No'],
'Combined': ['SF22',
'KFS3',
'3FFS',
'Replacement needed',
'LA52',
'K2KA',
'Belt Broke',
'QET6',
'QET6',
'P0SF',
'Testing Broken',
'DP2L',
'SR2F',
'JKO2',
'DP2L',
'A2BF',
'KLL2',
'Light Off',
'A3SA',
'LA52']}
I sorted the dataframe by date and now I would like to create a loop that goes down the table in order row by row. In the loop, if the row in the "Fixed" column is "No", I want to append the value in that row in the "Code" column to a list (which I called list_test). Then, when the row in the "Fixed" column becomes a "Yes", I want to create a new list variable that is a copy of the list_test.
Then, I want to clear the list_test to be an empty list so that it can repeat the process down the column (clearing itself every time there is a "Fix").
In my example table above, I would want the output to be something along the lines of:
- Fixed_Before_3 = ["SF22", "KFS3", "3FFS"]
- Fixed_Before_6 = ["LA52", "K2KA"]
- Fixed_Before_10 = ["QET6", "QET6", "P0SF"]
- Fixed_Before_17 = ["DP2L", "SR2F", "JKO2", "DP2L", "A2BF", "KLL2"]
This is one way I tried to approach the problem:
list_test = []
var_test = {}
for index in df.index:
var_test[index] = "Fixed_Before_" str(index)
if df['Fixed'][index] == 'No':
list_test.append(df['Code'])
if df['Fixed'][index] == 'Yes':
var_test[index] = list_test
list_test = []
list_test
Although, when I run the code, the output (https://i.stack.imgur.com/RM0Pj.png) is a very large column and it looks like it includes everything in my column more than a few times rather than the output I included above. I think my problems might be with:
- The way I iterate throughout the dataframe
- The conditional statements about dataframes
- Maybe list_test.append(df['Code']) gives me the whole column instead of the value of the column in the row in my conditional statement?
- Using a dictionary to create new variables in my loop
If anyone has any helpful information about what I am doing wrong or another approach to my problem that would be greatly appreciated!
CodePudding user response:
The exact expected output is unclear, but here is a suggestion:
mask = df['Fixed'].eq('Yes')
out = (df
.assign(index=pd.Series(df.index.where(mask), index=df.index).bfill())
.loc[~mask]
.groupby(['Part Number', mask.shift(fill_value=True).cumsum()])
.agg(fix_before=('index', 'first'),
list=('Code', list))
.reset_index(drop=True)
)
Output:
fix_before list
0 3.0 [SF22, KFS3, 3FFS]
1 6.0 [LA52, K2KA]
2 10.0 [QET6, QET6, P0SF]
3 17.0 [DP2L, SR2F, JKO2, DP2L, A2BF, KLL2]
4 NaN [A3SA, LA52]