Home > OS >  Conditionally insert rows in the middle of dataframe using pandas
Conditionally insert rows in the middle of dataframe using pandas

Time:07-29

I have a dataset that I need to add rows based on conditions. Rows can be added anywhere within the dataset. i.e., middle, top, and bottom.

I have 26 columns in the data but will only use a few to set conditions. I want my code to go through each row and check if a column named "potveg" has values 4,8 or 9. If true, add a row below it and set 'col,' 'lat' column values similar to those of the last row, and set the values of columns 'icohort' and 'isrccohort' to those of the last row 1. Then export the new data frame to CSV. I have tried several implementations based on this logic: Pandas: Conditionally insert rows into DataFrame while iterating through rows in the middle PS* New to Python and Pandas

Here is the code I have so far:

   for index, row in df.iterrows():
    last_row = df.iloc[index-1]
    next_row = df.iloc[index]

    new_row = {
'col':last_row.col,
'row':last_row.row,
'tmpvarname':last_row.tmpvarname,
'year':last_row.year,
'icohort':next_row.icohort,
'isrccohort':next_row.icohort,
'standage':3000,
'chrtarea':0,
'potveg':13,
'currentveg':13,
'subtype':13,
'agstate':0,
'agprevstate':0,
'tillflag':0,
'fertflag':0,
'irrgflag':0,
'disturbflag':0,
'disturbmonth':0,
'FRI':2000,
'slashpar':0,
'vconvert':0,
'prod10par':0,
'prod100par':0,
'vrespar':0,
'sconvert':0,
'tmpregion':last_row.tmpregion
    }
new_row = {k:v for k,v in new_row.items()}
if (df.iloc[index]['potveg'] == 4):
              newdata =df.append(new_row, ignore_index=True)

CodePudding user response:

Following the steps you suggested, you could write something like:

df = pd.DataFrame({'id':[1,2,4,5], 'before': [1,2,4,5], 'after': [1,2,4,5]})
new_df = pd.DataFrame()

for i, row in df.iterrows():
    new_df = pd.concat([new_df, pd.DataFrame(row.to_frame().transpose())])
    if row['id'] == 2:
        # add our new row, with data for `col` before coming from the previous row, and `after` coming from the following row
        temp = pd.DataFrame({'id': [3], 'before': [df.loc[i]['before']], 'after': [df.loc[i 1]['after']]})
        new_df = pd.concat([new_df, pd.DataFrame(temp)])

You might need to consider exploring how you could approach the problem without iterating over the dataframe as this might be quite slow if you have a large dataset. I'd suggest checking the apply function.

CodePudding user response:

Inserting rows at a specific position can be done this way:

import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 4, 5], 'col2': ['A', 'B', 'D', 'E']})

new_row = pd.DataFrame({'col1': [3], 'col2': ['C']})
idx_pos = 2

pd.concat([df.iloc[:idx_pos], new_row, df.iloc[idx_pos:]]).reset_index(drop=True)

Output:

   col1 col2
0     1    A
1     2    B
2     3    C
3     4    D
4     5    E
  • Related