I'm working on this dataset.
My question is how do I group this dataset based on the same timestamp and merge these strings into one with unique tokens, so, for example, I could have:
date | string |
---|---|
2011-02-01 15:00:00 | Richmond Service Index S&P/CS HPI Composite - 20 s.a. n.s.a Texas Services Sector Outlook TIC Net Long-Term Transactions including Swaps |
I don't have any idea on what method should I use to solve this problem. Does anyone know how to solve it?
CodePudding user response:
Could this help you?
import pandas as pd
from collections import OrderedDict
df['event'] = df['event'].str.replace('amp;', '')
df = df.groupby('date')['event'].apply(lambda x: ' '.join(x)).reset_index()
df['event'] = df['event'].str.split().apply(lambda x: OrderedDict.fromkeys(x).keys()).str.join(' ')