I have a DataFrame where I would like to rearrange the data of a given columns.
What I have:
text KEYWORD
0 Fetch.ai will transform economies, healthcare,... supplies chain issues
1 self
2 secured key partnership
3 real world challenge
4 autonomous economic agent
5 learning traffic signal
6 autonomous machine learning
7 disruptive ai tech
8 parking issues
9 traffic reduction
10
11
12 The two most popular cryptocurrencies on the p... bitcoin
13 limited supplies
14 ethereum
What I would like:
text KEYWORD
0 Fetch.ai will transform economies, healthcare,... supplies chain issues, self, secured key partnership, real world challenge, autonomous economic agent, learning traffic signal, autonomous machine learning, disruptive ai tech, parking issues, traffic reduction
1 The two most popular cryptocurrencies on the p... bitcoin, limited supplies, emphasized text, ethereum
Each row containing text are displayed in the "Text" column. The "Text" column has been analyzed and keywords have been extracted from it and displayed in the "KEYWORD" column. The annoying part is that if 10 key words are extracted from the "Text" column, it will create 10 rows and add 1 keyword per row. I would like to join all of these keywords into a single row (corresponding to the good text).
Unfortunately I do not have access to the keyword extraction process which was done by a software.
CodePudding user response:
Try with groupby
:
#replace blank cells with NaN
df = df.replace(r"^\s*$",np.nan,regex=True)
#drop rows that are all NaN and forward fill
df = df.dropna(how="all").ffill()
#groupby and aggregate
output = df.groupby("text", as_index=False)["KEYWORD"].agg(", ".join)
>>> output
text KEYWORD
0 Fetch.ai will transform economies, healthcare,... supplies chain issues, self, secured key partn...
1 The two most popular cryptocurrencies on the p... bitcoin, limited supplies, ethereum