giving a unique code by seeing the first column string and the second columns string and whenever the first column string change it starts from 1
i use this code
dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()
dfs = [df[1] for df in dfs]
df = pd.concat(dfs)
CodePudding user response:
Your code can be updated in the following way:
import pandas as pd
# set data
data = {"colummen1": ["Kenbele1", "Kenbele1", "Kenbele1", "Kenbele1", "Kenbele2", "Kenbele2", "Kenbele2", "Kenbele2"],
"colummun2": ["Commutity1", "Commutity2", "Commutity3", "Commutity4", "Commutity1", "Commutity2", "Commutity3", "Commutity4"]}
# create dataframe
df = pd.DataFrame(data)
dfs = df.groupby('colummen1')
dfs_updated = []
for _, df in dfs:
df['id'] = df.groupby(['colummen1','colummun2']).ngroup() 1
dfs_updated.append(df)
df_new = pd.concat(dfs_updated)
df_new
Returns
CodePudding user response:
It is not very clear what you expect, but when you write df[1] for df in dfs
, then your df
is a key (for example Kebele 1) and df[1]
is a character (for example e - second character of the string).
That is why you get this error, because your array dfs
is constructed out of 2 characters ["e", "e"]
. Therefore you can not concatenate it.
I think with df[1]
you meant the data frame, that is associated with the key, if so, then the code should look like this:
dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()
dfs = [df for _, df in dfs.items()]
df = pd.concat(dfs)