I have a dataset that looks like that:
There are 15 unique values in the column 'query id', so I am trying to create new dataframes for each unique value. I thought of having a loop for every unique value in column 'query id' with a code like this:
df_list = []
i = 0
for x in df['query id'].unique():
df{i} = pd.DataFrame(columns=df.columns)
df_list.append()
i =1
But I am definitely doing something wrong there and got stuck. Do you have any ideas of how to do that?
Sample dataset:
relevance query id 1 2 3
1 WT04-170 10 40 80
1 WT04-170 20 60 70
1 WT04-176 30 70 50
1 WT04-176 40 90 20
1 WT04-173 50 100 10
CodePudding user response:
Pandas has a built-in function for iterating unique values in a column and selecting the matching rows. The function is groupby
In your case, you can create the dictionary as a one-liner using:
dfs = {query_id: grp.copy() for query_id, grp in df.groupby("query id")}
Once you have your dictionary of dataframes, you can access each one using the query id as your key:
my_df = dfs["WT04-170"] # Access each dataframe using the appropriate key
my_df.describe() # Do your work with the dataframe here
CodePudding user response:
Does something like this help?
df_list = []
for x in set(df['query id'].to_list()):
df = df[df['query id'] == x].copy()
df_list.append(df)
CodePudding user response:
It sounds like what you want is a filtered dataframe for each unique query id. So you would end up with 15 dataframes, each containing only the rows for that specific query id from the combined df. Is that right?
In that case, your approach is close, but you could just filter the df in your loop. I also used a dict to store the resulting dataframes too, but you could do it with the list as well.
If my understanding of what you're looking for is correct, I think this should work for you:
df_dict = {}
for (i,x) in enumerate(df['query id'].unique()):
df_dict[i] = df[df['query id']==x].copy()
You could also just use the query_ids as the dict keys too, like this:
df_dict = {}
for x in df['query id'].unique():
df_dict[x] = df[df['query id']==x].copy()