Home > Net >  Split One Column to Multiple Columns in Pandas
Split One Column to Multiple Columns in Pandas

Time:05-24

I want to split one current column into 3 columns. In screenshot we see the builder column, which need to be split in 3 more column such as b.name , city and country. So I use str.split() method in python to split the column which give me good result for 2 column ownerName = df['owner_name'] df[["ownername", "owner_country"]] = df["owner_name"].str.split("-", expand=True)

But when it come to three columns ownerName = df['owner_name'] df[["ownername", "city", "owner_country"]] = df["owner_name"].str.split("," ,"-", expand=True), where I use 2 delimiter ',' and '-' it give me this error:

File "C:\Users....\lib\site-packages\pandas\core\frame.py", line 3160, in setitem self._setitem_array(key, value) File "C:\Users....\lib\site-packages\pandas\core\frame.py", line 3189, in _setitem_array raise ValueError("Columns must be same length as key") ValueError: Columns must be same length as key

whats best solution for 2 delimiter ',' and '-', Also there is some empty rows too.

CodePudding user response:

Your exact input is unclear, but assuming the sample input kindly provided by @ArchAngelPwn, you could use str.split with a regex:

names = ['Builder_Name', 'City_Name', 'Country']
out = (df['Column1']
 .str.split(r'\s*[,-]\s*', expand=True)  # split on "," or "-" with optional spaces
 .rename(columns=dict(enumerate(names))) # rename 0/1/2 with names in order
)

output:

   Builder_Name City_Name  Country
0  Builder Name      City  Country

CodePudding user response:

You can combine some rows if you feel like you need to, but this was a possible options and should be pretty readable for most developers included in the projects

data = {
    'Column1' : ['Builder Name - City, Country']
}

df = pd.DataFrame(data)
df['Builder_Name'] = df['Column1'].apply(lambda x : x.split('-')[0])
df['City_Name'] = df['Column1'].apply(lambda x : x.split('-')[1:])
df['City_Name'] = df['City_Name'][0]
df['City_Name'] = df['City_Name'].apply(lambda x : x.split()[0])
df['City_Name'] = df['City_Name'].apply(lambda x : x.replace(',', ''))
df['Country'] = df['Column1'].apply(lambda x : x.split(',')[1])
df = df[['Builder_Name', 'City_Name', 'Country']]
df
  • Related