Home > Mobile >  Create new dataframes in python pandas based on the value of a column
Create new dataframes in python pandas based on the value of a column

Time:11-16

I have a dataset that looks like that:

enter image description here

There are 15 unique values in the column 'query id', so I am trying to create new dataframes for each unique value. I thought of having a loop for every unique value in column 'query id' with a code like this:

df_list = []
i = 0

for x in df['query id'].unique():
    df{i} = pd.DataFrame(columns=df.columns) 
    df_list.append()
    i =1

But I am definitely doing something wrong there and got stuck. Do you have any ideas of how to do that?

Sample dataset:

relevance   query id   1   2   3
        1   WT04-170  10  40  80
        1   WT04-170  20  60  70
        1   WT04-176  30  70  50     
        1   WT04-176  40  90  20      
        1   WT04-173  50 100  10

CodePudding user response:

Pandas has a built-in function for iterating unique values in a column and selecting the matching rows. The function is groupby

In your case, you can create the dictionary as a one-liner using:

dfs = {query_id: grp.copy() for query_id, grp in df.groupby("query id")}

Once you have your dictionary of dataframes, you can access each one using the query id as your key:

my_df = dfs["WT04-170"]  # Access each dataframe using the appropriate key
my_df.describe()  # Do your work with the dataframe here

CodePudding user response:

Does something like this help?


df_list = []

for x in set(df['query id'].to_list()):
    df = df[df['query id'] == x].copy() 
    df_list.append(df)



CodePudding user response:

It sounds like what you want is a filtered dataframe for each unique query id. So you would end up with 15 dataframes, each containing only the rows for that specific query id from the combined df. Is that right?

In that case, your approach is close, but you could just filter the df in your loop. I also used a dict to store the resulting dataframes too, but you could do it with the list as well.

If my understanding of what you're looking for is correct, I think this should work for you:

df_dict = {}
for (i,x) in enumerate(df['query id'].unique()):
    df_dict[i] = df[df['query id']==x].copy()

You could also just use the query_ids as the dict keys too, like this:

df_dict = {}
for x in df['query id'].unique():
    df_dict[x] = df[df['query id']==x].copy()
  • Related