TypeError: cannot concatenate object of type '<class 'str'>'; only Series-CodePudding

giving a unique code by seeing the first column string and the second columns string and whenever the first column string change it starts from 1

Example

i use this code

dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():

 df['id'] = df.groupby(['colummen1','colummun2']).ngroup()  

dfs = [df[1] for df in dfs]
df = pd.concat(dfs)

but I got the flowing error

CodePudding user response：

Your code can be updated in the following way:

import pandas as pd

# set data
data = {"colummen1": ["Kenbele1", "Kenbele1", "Kenbele1", "Kenbele1", "Kenbele2", "Kenbele2", "Kenbele2", "Kenbele2"], 
        "colummun2": ["Commutity1", "Commutity2", "Commutity3", "Commutity4", "Commutity1", "Commutity2", "Commutity3", "Commutity4"]}

# create dataframe
df = pd.DataFrame(data)

dfs = df.groupby('colummen1')

dfs_updated = []

for _, df in dfs:
    df['id'] = df.groupby(['colummen1','colummun2']).ngroup() 1
    dfs_updated.append(df)
    
df_new = pd.concat(dfs_updated)

df_new

Returns

CodePudding user response：

It is not very clear what you expect, but when you write df[1] for df in dfs, then your df is a key (for example Kebele 1) and df[1] is a character (for example e - second character of the string).

That is why you get this error, because your array dfs is constructed out of 2 characters ["e", "e"]. Therefore you can not concatenate it.

I think with df[1] you meant the data frame, that is associated with the key, if so, then the code should look like this:

dfs = dict(tuple(df.groupby('colummen1')))

for _, df in dfs.items():
    df['id'] = df.groupby(['colummen1','colummun2']).ngroup()  

dfs = [df for _, df in dfs.items()]
df = pd.concat(dfs)