Home > Mobile >  How to merge several columns into one column with several records using python and pandas?
How to merge several columns into one column with several records using python and pandas?

Time:12-05

I have a data which I need to transform in order to get 2 cols insted of 4 :


data = [['123', 'Billy', 'Bill', 'Bi'],
        ['234', 'James', 'J', 'Ji'],
        ['543', 'Floyd', 'Flo', 'F'],
]

processed_data = ?

needed_df = pandas.DataFrame(processed_data, columns=['Number', 'Name'])


I expect the following behaviour:

['123', 'Billy']
['123', 'Bill']
['123', 'Bi']
['234', 'James']
['234', 'J']
['234', 'Ji']

I've tried to use for in for loop but getting the wrong result:

for row in df.iterrows():
    for col in df.columns:
        new_row = ...
        processed_df = pandas.concat(df, new_row)

Such a construction gives a too big result

The similar question using sql:

How to split several columns into one column with several records in SQL?

CodePudding user response:

Let use list comprehension to create pairs of Name and Number then create a new dataframe

pd.DataFrame([[x, z] for x, *y in data for z in y], columns=['Number', 'Name'])

  Number   Name
0    123  Billy
1    123   Bill
2    123     Bi
3    234  James
4    234      J
5    234     Ji
6    543  Floyd
7    543    Flo
8    543      F

CodePudding user response:

Or, you can convert you exists data into a dataframe then perform pandas dataframe reshaping with melt:

import pandas as pd

data = [['123', 'Billy', 'Bill', 'Bi'],
        ['234', 'James', 'J', 'Ji'],
        ['543', 'Floyd', 'Flo', 'F'],
]


df = pd.DataFrame(data)

df.melt(0).sort_values(0)

Output:

     0  variable  value
0  123         1  Billy
3  123         2   Bill
6  123         3     Bi
1  234         1  James
4  234         2      J
7  234         3     Ji
2  543         1  Floyd
5  543         2    Flo
8  543         3      F

CodePudding user response:

Here is one possible way to merge several columns into one column with several records using Python and Pandas:

import pandas as pd

# Define the data
data = [['123', 'Billy', 'Bill', 'Bi'],
        ['234', 'James', 'J', 'Ji'],
        ['543', 'Floyd', 'Flo', 'F'],
]

# Create a DataFrame from the data
df = pd.DataFrame(data, columns=['Number', 'Name1', 'Name2', 'Name3'])

# Merge the columns into a single 'Name' column
df = df.melt(id_vars=['Number'], value_vars=['Name1', 'Name2', 'Name3'])

# Drop the 'variable' column and rename the 'value' column as 'Name'
df = df.drop('variable', axis=1).rename(columns={'value': 'Name'})

# Print the resulting DataFrame
print(df)

CodePudding user response:

Ask ChatGPT and it will give you the correct answer. https://chat.openai.com/chat

  • Related