I managed to group rows in a dataframe, given one column (id). The problem is that one column consists of parts of sentences, and when I add them together, the spaces are missing.
An example probably makes it easier to understand...
My dataframe looks something like this:
import pandas as pd
#create dataFrame
df = pd.DataFrame({'id': [101, 101, 102, 102, 102],
'text': ['The government changed', 'the legislation on import control.', 'Politics cannot solve all problems', 'but it should try to do its part.', 'That is the reason why these elections are important.'],
'date': [1990, 1990, 2005, 2005, 2005],})
id text date
0 101 The government changed 1990
1 101 the legislation on import control. 1990
2 102 Politics cannot solve all problems 2005
3 102 but it should try to do its part. 2005
4 102 That is the reason why these elections are imp... 2005
Then I used the aggregation function:
aggregation_functions = {'id': 'first','text': 'sum', 'date': 'first'}
df_new = df.groupby(df['id']).aggregate(aggregation_functions)
which returns:
id text date
0 101 The government changedthe legislation on import control. 1990
2 102 Politics cannot solve all problemsbut it should try to... 2005
So, for example I need a space in between ' The government changed' and 'the legislation...'. Is that possible?
CodePudding user response:
If you need to put a space between the two phrases/rows, use str.join
:
ujoin = lambda s: " ".join(dict.fromkeys(s.astype(str)))
out= df.groupby(["id", "date"], as_index=False).agg(**{"text": ("text", ujoin)})[df.columns]
# Output :
print(out.to_string())
id text date
0 101 The government changed the legislation on import control. 1990
1 102 Politics cannot solve all problems but it should try to do its part. That is the reason why these elections are important. 2005