Home > Mobile >  Add selected rows from an existing Pandas DataFrame to a new Pandas DataFrame in for loop in Python
Add selected rows from an existing Pandas DataFrame to a new Pandas DataFrame in for loop in Python

Time:12-06

I want to select some rows based on a condition from an existing Pandas DataFrame and then insert it into a new DataFrame.
At frist, I tried this way:

second_df = pd.DataFrame()
for specific_idx in specific_idx_set:
    second_df = existing_df.iloc[specific_idx]
len(specific_idx_set), second_df.shape => (1000), (15,)

As you see, I'm iterating over a set which has 1000 indexes. However, after I add these 1000 rows to into a new Pandas DataFrame(second_df), I saw only one of these rows was stored into the new DataFrame while I expected to see 1000 rows with 15 columns in this DataFrame.
So, I tried new way:

specific_rows = list() 
for specific_val in specific_idx_set:
    specific_rows.append( existing_df[existing_df[col] == specific_val])

new_df = pd.DataFrame(specific_rows)

And I got this error:

ValueError: Must pass 2-d input. shape=(1000, 1, 15)

Then, I wrote this code:

specific_rows = list() 
new_df = pd.DataFrame()
for specific_val in specific_idx_set:
    specific_rows.append(existing_df[existing_df[col] == specific_val])
pd.concat([new_df, specific_rows])

But I got this error:

TypeError: cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid

CodePudding user response:

You need modify your last solution - remove empty DataFrame and for concat use list of DataFrames only:

specific_rows = list() 
for specific_val in specific_idx_set:
    specific_rows.append(existing_df[existing_df[col] == specific_val])
    
out = pd.concat(specific_rows)

Problem of your solution - if join list with DataFrame error is raised:

pd.concat([new_df, specific_rows])
#specific_rows - is list
#new_df - is DataFrame

If need append DataFrame need join lists - append one element list [new_df] another list specific_rows - ouput is list of DataFrames:

pd.concat([new_df]   specific_rows)
  • Related