I want to extract news by keyword and hashtags. In the keyword and hashtags, I want to combine them into 1 to create sentences in the form of strings using python.
Here's the table I have :
the desired output is like this "Gempa AND #gempa cianjur AND #gempa bali" or "Lukas Enembe AND #lukas enembe tersangka AND #gubernur papua
CodePudding user response:
You can use apply on entire dataframe and join the column values from each row. if you want your each row output as a string then use
df1[list(df1.columns)].apply(lambda row: f"'{' AND '.join(row.values.astype(str))}'", axis=1)
Out[145]:
0 'Gempa AND #gempa cianjur AND #gempa bali'
1 'Fredy Sambo AND #fredy sambo AND brigadir j'
2 'Lukas Enembe AND #lukas enembe tersangka AND ...
Without the quotes
df1[list(df1.columns)].apply(lambda row: ' AND '.join(row.values.astype(str)), axis=1)
Out[146]:
0 Gempa AND #gempa cianjur AND #gempa bali
1 Fredy Sambo AND #fredy sambo AND brigadir j
2 Lukas Enembe AND #lukas enembe tersangka AND #...
CodePudding user response:
Source Let's suppose we have the following dataframe :
import pandas as pd
df1 = pd.DataFrame({'Keyword': ['Gempa', 'Lukas Enembe'],
'Hastag1': ['#gempa cianjur' , '#lukas enembe tersangka'] ,
'Hastag2': ['#gempa bali' , '#gubernur papua']
})
Visualization:
Keyword Hastag1 Hastag2
0 Gempa #gempa cianjur #gempa bali
1 Lukas Enembe #lukas enembe tersangka #gubernur papua
Proposed script
df1['Chain'] = df1.agg(' AND '.join(['{0[%s]}'%c for c in df1.columns]).format, axis=1)
Result
Keyword ... Chain
0 Gempa ... Gempa AND #gempa cianjur AND #gempa bali AND G...
1 Lukas Enembe ... Lukas Enembe AND #lukas enembe tersangka AND #...
Alternative
df1['Result'] = df1['Keyword'] ' AND ' df1['Hastag1'] ' AND ' df1['Hastag2']