Home > Net >  How to group by the words and create an equivalent column consisting of float values? (Pandas)
How to group by the words and create an equivalent column consisting of float values? (Pandas)

Time:10-17

I have a dataframe:

   Text                 
   Background  
   Clinical      
   Method
   Direct
   Background
   Direct

Now I want to group them in new column according to their first words like Background belong to group 1 Clinical belongs to group 2 and like this.

The expected output:

a dataframe:

   Text            Group      
   Background       1
   Clinical         2
   Method           3
   Direct           4
   Background       1
   Direct           4

CodePudding user response:

Try this:

import pandas as pd

text = ['Background', 'Clinical', 'Method', 'Direct', 'Background', 'Direct']
df = pd.DataFrame(text, columns=['Text'])


def create_idx_map():
    idx = 1
    values = {}
    for item in list(df['Text']):
        if item not in values:
            values[item] = idx
            idx  = 1
    return values

values = create_idx_map()
df['Group'] = [values[x] for x in list(df['Text'])]

print(df)

CodePudding user response:

Idea: Make a list of unique values of the column Text and for the column Group you can assign the index of the value in this unique list. Code example:

df = pd.DataFrame({"Text": ["Background", "Clinical", "Clinical", "Method", "Background"]})

# List of unique values of column `Text`
groups = list(df["Text"].unique())

# Assign each value in `Text` its index
# (you can write `groups.index(text)   1` when the first value shall be 1)
df["Group"] = df["Text"].map(lambda text: groups.index(text))

# Ouptut for df
print(df)

### Result:
         Text  Group
0  Background      0
1    Clinical      1
2    Clinical      1
3      Method      2
4  Background      0

CodePudding user response:

A solution could be the following:

import pandas as pd
data = pd.DataFrame([["A B", 1], ["A C", 2], ["B A", 3], ["B C", 5]], columns=("name", "value"))
data.groupby(by=[x.split(" ")[0] for x in data.loc[:,"name"]])

You can select the first few words using x.split(" ")[:NUMBER_OF_WORDS]. You then apply the aggregation you want to the need object

  • Related