Home > OS >  Pandas append string tokens into list with corresponding column where those column in those string r
Pandas append string tokens into list with corresponding column where those column in those string r

Time:11-28

I'm working on this dataset.

dataset

My question is how do I group this dataset based on the same timestamp and merge these strings into one with unique tokens, so, for example, I could have:

date string
2011-02-01 15:00:00 Richmond Service Index S&P/CS HPI Composite - 20 s.a. n.s.a Texas Services Sector Outlook TIC Net Long-Term Transactions including Swaps

I don't have any idea on what method should I use to solve this problem. Does anyone know how to solve it?

CodePudding user response:

Could this help you?

import pandas as pd
from collections import OrderedDict

df['event'] = df['event'].str.replace('amp;', '')
df = df.groupby('date')['event'].apply(lambda x: ' '.join(x)).reset_index()
df['event'] = df['event'].str.split().apply(lambda x: OrderedDict.fromkeys(x).keys()).str.join(' ')
  • Related