Home > Mobile >  Connecting rows in Data Frame in python
Connecting rows in Data Frame in python

Time:02-12

I have Data Frame of 2 columns, I'm trying to combine each 3 rows into one row with "or" condition between them. I can't find smart and easy solution.

data frame:

dorks list  ofek rank
0                         allintext:"nike" acquisition          2
1                            allintext:"nike" acquired          2
2                                 allintext:"nike" buy          2
3                                allintext:"nike" sell          2
4                                allintext:"nike" sold          2
..                                                 ...        ...
481    insubject:"nike" divested source:prnewswire.com          4
482       insubject:"nike" divested source:reuters.com          4
483  insubject:"nike" divested source:seekingalpha.com          4
484     insubject:"nike" divested source:pitchbook.com          4
485     insubject:"nike" divested source:bloombarg.com          4

The desired result:

allintext:"nike" acquisition or allintext:"nike" acquired or allintext:"nike" buy for each 3 rows.

CodePudding user response:

Assuming you have a range index:

df.groupby(df.index//3).agg(**{'result': ('dorks list', ' or '.join),
                               'mean_rank': ('ofek rank', 'mean')
                              })

NB. I don't know what you want to do with "ofek rank", so I took the mean, but you can do whatever you want (first, min, max… or join like the other column) output:

                                                result  mean_rank
0    allintext:"nike" acquisition or allintext:"nik...        2.0
1       allintext:"nike" sell or allintext:"nike" sold        2.0
160  insubject:"nike" divested source:prnewswire.co...        4.0
161  insubject:"nike" divested source:seekingalpha....        4.0

If you don't have a range index, replace df.index//3 with np.arange(len(df))//3

CodePudding user response:

before I look for the smart way, I always use the dumb way. That is using iteration

new_df = []
for index, row in df.iterrows():
    if something:
        dorks = dorks   row['dorks list']
    if index%3==0:
        new_df.append({'dorks list':dorks})
        dorks = ''
df_new = pd.DataFrame(new_df)
  • Related