I have Data Frame of 2 columns, I'm trying to combine each 3 rows into one row with "or" condition between them. I can't find smart and easy solution.
data frame:
dorks list ofek rank
0 allintext:"nike" acquisition 2
1 allintext:"nike" acquired 2
2 allintext:"nike" buy 2
3 allintext:"nike" sell 2
4 allintext:"nike" sold 2
.. ... ...
481 insubject:"nike" divested source:prnewswire.com 4
482 insubject:"nike" divested source:reuters.com 4
483 insubject:"nike" divested source:seekingalpha.com 4
484 insubject:"nike" divested source:pitchbook.com 4
485 insubject:"nike" divested source:bloombarg.com 4
The desired result:
allintext:"nike" acquisition or allintext:"nike" acquired or allintext:"nike" buy for each 3 rows.
CodePudding user response:
Assuming you have a range index:
df.groupby(df.index//3).agg(**{'result': ('dorks list', ' or '.join),
'mean_rank': ('ofek rank', 'mean')
})
NB. I don't know what you want to do with "ofek rank", so I took the mean, but you can do whatever you want (first, min, max… or join like the other column) output:
result mean_rank
0 allintext:"nike" acquisition or allintext:"nik... 2.0
1 allintext:"nike" sell or allintext:"nike" sold 2.0
160 insubject:"nike" divested source:prnewswire.co... 4.0
161 insubject:"nike" divested source:seekingalpha.... 4.0
If you don't have a range index, replace df.index//3
with np.arange(len(df))//3
CodePudding user response:
before I look for the smart way, I always use the dumb way. That is using iteration
new_df = []
for index, row in df.iterrows():
if something:
dorks = dorks row['dorks list']
if index%3==0:
new_df.append({'dorks list':dorks})
dorks = ''
df_new = pd.DataFrame(new_df)