So let's say I have data like this with some delimiter like commas that I want to split to new cells either across to columns or down into rows.
The Data | Location |
---|---|
One Museum, Two Museum | City A |
3rd Park, 4th Park, 5th Park | City B |
How would you do it in either direction? There are lots of methods why is methods provided preferred?
CodePudding user response:
The Excel manual method: click on Data>Text to Column. Now just copy and past if you want the data in one column. This is only good when the data set is small and your are doing it once.
The Power Query method: This method you do it once for the data source then click refresh button when the data changes in the future. The data source can be almost anything like csv,
3 - Now choose the split method, there is delimiter and there is 6 other choices.
4 - For this data I when with custom and use ", "
5 & 6 - To split down you have to select Advanced options. Make the selection.
7 Close & Load
This is a good method because you don't have to code in Power Query unless you want to.
CodePudding user response:
The Python method Make sure you have pip installed pandas or use conda to install pandas.
The code is like so:
import pandas as pd
df = pd.read_excel('path/to/myexcelfile.xlsx')
df[['key.0','key.1','key.2']] = df['The Data'].str.split(',', expand=True)
df.drop(columns=['The Data'], inplace = True)
# stop here if you want the data to be split into new columns
The data looks like this
Location key.0 key.1 key.2
0 City A One Museum Two Museum None
1 City B 3rd park 4th park 5th park
To get the split into rows proceed with the next code part:
stacked = df.set_index('Location').stack()
# set the name of the new series created
df = stacked.reset_index(name='The Data')
# drop the 'source' level (key.*)
df.drop('level_1', axis=1, inplace=True)
Now this is done and it looks like this
Location The Data
0 City A One Museum
1 City A Two Museum
2 City B 3rd park
3 City B 4th park
4 City B 5th park
The benefit of python is that is faster for larger data sets you can split using regex in probable a 100 ways. The data source can be all types that you would use for power query and more.