For example DataFrame:
import pandas as pd
df = pd.DataFrame.from_dict({
'art1':['n1','n2'],
'sizes':['35 36 37', '36 38']
})
print (df)
# need that
df_result = pd.DataFrame.from_dict({
'art1':['n1','n1','n1','n2','n2'],
'sizes':[35,36,37,36,38]
})
print (df_result)
BELOW IS CORRECT BUT NOT EFFICIENT DECISION !!!
lst_art = []
lst_sizes = [x.split() for x in df['sizes']]
for i in range(len(lst_sizes)):
for j in range(len(lst_sizes[i])):
lst_art.append(df['art1'][i])
lst_sizes = sum(lst_sizes, [])
df = pd.DataFrame({'art1':lst_art, 'sizes':lst_sizes})
print (df)
any pandas efficient way to get df_result from df?
CodePudding user response:
You can first split the string column into a list and then you can explode each item in the list into a new row
df = pd.DataFrame.from_dict({
'art1':['n1','n2'],
'sizes':['35 36 37', '36 38']
})
# convert str to list
df['sizes'] = df['sizes'].str.split()
# create one new row per item in list of `sizes`
df_result = df.explode('sizes')
or you can do an overly powerful one liner
df.assign(sizes=df['sizes'].str.split()).explode('sizes')