Home > OS >  New dataframe in Pandas based on specific values(a lot of them) from existing df
New dataframe in Pandas based on specific values(a lot of them) from existing df

Time:11-18

Good evening! I'm using pandas on Jupyter Notebook. I have a huge dataframe representing full history of posts of 26 channels in a messenger. It has a column "dialog_id" which represents in which dialog the message was sent(so, there can be only 26 unique values in the column, but there are more then 700k rows, and the df is sorted itself by time, not id, so it is kinda chaotic). I have to split this dataframe into 2 different(one will contain full history of 13 channels, and the other will contain history for the rest 13 channels). I know ids by which I have to split, they are random as well. For example, one is -1001232032465 and the other is -1001153765346.

The question is, how do I do it most elegantly and adequate? I know I can do it somehow with df.loc[], but I don't want to put like 13 rows of df.loc[]. I've tried to use logical operators for this, like: df1.loc[(df["dialog_id"] == '-1001708255880') & (df["dialog_id"] == '-1001645788710' )], but it doesn't work. I suppose I'm using them wrong. I expect a solution with any method creating a new df, with the use of logical operators. In verbal expression, I think it should sound like "put the row in a new df if the dialog_id is x, or dialog_id is y, or dialog_id is z, etc". Please help me!

CodePudding user response:

The easiest way seems to be just setting up a query.

df = pd.DataFrame(dict(col_id=[1,2,3,4,], other=[5,6,7,8,]))

channel_groupA = [1,2]
channel_groupB = [3,4]

df_groupA = df.query(f'col_id == {channel_groupA}')
df_groupB = df.query(f'col_id == {channel_groupB}')
  • Related