Home > other >  How can I separate values from the below dataframe to widen it based on a certain word - using Panda
How can I separate values from the below dataframe to widen it based on a certain word - using Panda

Time:08-10

I have the below dataframe with data that looks like this:(Made up data for clarification)

Info Values
ID 53312
State Mass
Address Stackoverflowtown
ID 56120
State: Bos
Address Georgetown
Name: James

There are a lot of lines but the issue I am having is that I want the data that is under a certain ID to be in width form because ID, State, Address belong to the same ID- then for the next line where we also have a Name (ID, State, Address, Name) all belong to the same ID.

Question: Is there a way using Python Pandas or Excel to make it so that table would look something more like this:(Where we use ID as the index)

ID State Address Name
53312 Mass Stackoverflowtown
56120 Bos Georgetown James

I tried to do a pivot in Python but the issue is that since the values are all originally under the same column , when pivoted for me they end up showing each Value (ID, sTATE, Address..etc) on a different line and values in the same line would be blank- it looks something like this which is not right:

Example for ID 53312:

ID State Address Name
53312 Null Null Null
Null Mass NULL NULL
Null NULL Stackoverflowtown NULL
Null NULL NULL NULL

CodePudding user response:

Check Below Code:

import pandas as pd

df = pd.DataFrame({'Info':['ID','State','Address','ID','State','Address','Name'],
              'Values':[53312,'Mass','Stackoverflowtown',56120,'Bos','Georgetown','James']})


df['cumcount'] = df.groupby('Info')['Info'].transform('cumcount')

df['cumcount'] = np.where(df['cumcount'] < df['cumcount'].shift(), df['cumcount'].shift(), df['cumcount'])

pd.pivot_table(df,index='cumcount',
                      columns='Info',
                      values='Values',
                      aggfunc='first' ).reset_index()[['ID','State','Address','Name']].\
                      fillna('').\
                      rename_axis(None, axis=1)

Output:

enter image description here

CodePudding user response:

The only way to achieve this is if you are certain that the ID row comes before all of the other info for that ID.

If you do have this assurance, the following will work, where df is the original DataFrame in your post:

ids = df.query("Info == 'ID'")['Values']
ids.name = 'ID'

df = df.join(ids)
df['ID'] = df['ID'].ffill()
df.query("Info != 'ID'", inplace=True)

res = df.pivot('ID', 'Info', 'Values')

Output of res:

enter image description here

  • Related