I have a data which I need to transform in order to get 2 cols insted of 4 :
data = [['123', 'Billy', 'Bill', 'Bi'],
['234', 'James', 'J', 'Ji'],
['543', 'Floyd', 'Flo', 'F'],
]
processed_data = ?
needed_df = pandas.DataFrame(processed_data, columns=['Number', 'Name'])
I expect the following behaviour:
['123', 'Billy']
['123', 'Bill']
['123', 'Bi']
['234', 'James']
['234', 'J']
['234', 'Ji']
I've tried to use for in for loop but getting the wrong result:
for row in df.iterrows():
for col in df.columns:
new_row = ...
processed_df = pandas.concat(df, new_row)
Such a construction gives a too big result
The similar question using sql:
How to split several columns into one column with several records in SQL?
CodePudding user response:
Let use list comprehension to create pairs of Name and Number then create a new dataframe
pd.DataFrame([[x, z] for x, *y in data for z in y], columns=['Number', 'Name'])
Number Name
0 123 Billy
1 123 Bill
2 123 Bi
3 234 James
4 234 J
5 234 Ji
6 543 Floyd
7 543 Flo
8 543 F
CodePudding user response:
Or, you can convert you exists data into a dataframe then perform pandas dataframe reshaping with melt
:
import pandas as pd
data = [['123', 'Billy', 'Bill', 'Bi'],
['234', 'James', 'J', 'Ji'],
['543', 'Floyd', 'Flo', 'F'],
]
df = pd.DataFrame(data)
df.melt(0).sort_values(0)
Output:
0 variable value
0 123 1 Billy
3 123 2 Bill
6 123 3 Bi
1 234 1 James
4 234 2 J
7 234 3 Ji
2 543 1 Floyd
5 543 2 Flo
8 543 3 F
CodePudding user response:
Here is one possible way to merge several columns into one column with several records using Python and Pandas:
import pandas as pd
# Define the data
data = [['123', 'Billy', 'Bill', 'Bi'],
['234', 'James', 'J', 'Ji'],
['543', 'Floyd', 'Flo', 'F'],
]
# Create a DataFrame from the data
df = pd.DataFrame(data, columns=['Number', 'Name1', 'Name2', 'Name3'])
# Merge the columns into a single 'Name' column
df = df.melt(id_vars=['Number'], value_vars=['Name1', 'Name2', 'Name3'])
# Drop the 'variable' column and rename the 'value' column as 'Name'
df = df.drop('variable', axis=1).rename(columns={'value': 'Name'})
# Print the resulting DataFrame
print(df)
CodePudding user response:
Ask ChatGPT and it will give you the correct answer. https://chat.openai.com/chat