I have a dataframe:
Text
Background
Clinical
Method
Direct
Background
Direct
Now I want to group them in new column according to their first words like Background
belong to group 1 Clinical
belongs to group 2 and like this.
The expected output:
a dataframe:
Text Group
Background 1
Clinical 2
Method 3
Direct 4
Background 1
Direct 4
CodePudding user response:
Try this:
import pandas as pd
text = ['Background', 'Clinical', 'Method', 'Direct', 'Background', 'Direct']
df = pd.DataFrame(text, columns=['Text'])
def create_idx_map():
idx = 1
values = {}
for item in list(df['Text']):
if item not in values:
values[item] = idx
idx = 1
return values
values = create_idx_map()
df['Group'] = [values[x] for x in list(df['Text'])]
print(df)
CodePudding user response:
Idea: Make a list of unique values of the column Text
and for the column Group
you can assign the index of the value in this unique list. Code example:
df = pd.DataFrame({"Text": ["Background", "Clinical", "Clinical", "Method", "Background"]})
# List of unique values of column `Text`
groups = list(df["Text"].unique())
# Assign each value in `Text` its index
# (you can write `groups.index(text) 1` when the first value shall be 1)
df["Group"] = df["Text"].map(lambda text: groups.index(text))
# Ouptut for df
print(df)
### Result:
Text Group
0 Background 0
1 Clinical 1
2 Clinical 1
3 Method 2
4 Background 0
CodePudding user response:
A solution could be the following:
import pandas as pd
data = pd.DataFrame([["A B", 1], ["A C", 2], ["B A", 3], ["B C", 5]], columns=("name", "value"))
data.groupby(by=[x.split(" ")[0] for x in data.loc[:,"name"]])
You can select the first few words using x.split(" ")[:NUMBER_OF_WORDS]
. You then apply the aggregation you want to the need object