I want to select some rows based on a condition from an existing Pandas DataFrame and then insert it into a new DataFrame.
At frist, I tried this way:
second_df = pd.DataFrame()
for specific_idx in specific_idx_set:
second_df = existing_df.iloc[specific_idx]
len(specific_idx_set), second_df.shape => (1000), (15,)
As you see, I'm iterating over a set which has 1000 indexes. However, after I add these 1000 rows to into a new Pandas DataFrame(second_df), I saw only one of these rows was stored into the new DataFrame while I expected to see 1000 rows with 15 columns in this DataFrame.
So, I tried new way:
specific_rows = list()
for specific_val in specific_idx_set:
specific_rows.append( existing_df[existing_df[col] == specific_val])
new_df = pd.DataFrame(specific_rows)
And I got this error:
ValueError: Must pass 2-d input. shape=(1000, 1, 15)
Then, I wrote this code:
specific_rows = list()
new_df = pd.DataFrame()
for specific_val in specific_idx_set:
specific_rows.append(existing_df[existing_df[col] == specific_val])
pd.concat([new_df, specific_rows])
But I got this error:
TypeError: cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid
CodePudding user response:
You need modify your last solution - remove empty DataFrame and for concat
use list of DataFrames only:
specific_rows = list()
for specific_val in specific_idx_set:
specific_rows.append(existing_df[existing_df[col] == specific_val])
out = pd.concat(specific_rows)
Problem of your solution - if join list with DataFrame error is raised:
pd.concat([new_df, specific_rows])
#specific_rows - is list
#new_df - is DataFrame
If need append DataFrame need join lists - append one element list [new_df]
another list specific_rows
- ouput is list of DataFrames:
pd.concat([new_df] specific_rows)