Home > other >  Is there a way to create a new column for values that are in the original column either using Excel
Is there a way to create a new column for values that are in the original column either using Excel

Time:08-10

I have the below Pandas Series with values that look like this:

Info
# ID 3.22.33.2
Location: Texas
Address: 1321 madeupstreet
Name: mike b
Address: 6.3.1

There are almost 1000 rows with this data so the problem I am having is:

Question

Can I run a code in python or Excel to extract the values from these rows in such a way that it would place ID#s on a separate column, Location in a another column ...etc

So it would look something like this:

IDs Location Address Name
3.22.33.2 Texas 1321 Madeupstreet mike b

Some items wont have a Name and in that case, could it just leave it blank? or write No name found?

I tried creating separate lists (this data came from a text file) but that method is not working for me, and I do not have any code to share at the moment.so I copy pasted all the values into an excel sheet.

Note (I do not care about the second Address line, so if it is easier to ignore that is fine).

CodePudding user response:

Based on the specific series and for the sake of easiness I suggest you can try:

df['Index'] = df['Info'].replace("# ","",regex=True).str.split().str[0]
df['Values'] = [' '.join(x) for x in df['Info'].replace("# ","",regex=True).str.split().str[1:]]
output = df.set_index('Index').drop(columns='Info').T

Returning:

Index          ID Location:           Address:   Name: Address:
Values  3.22.33.2     Texas  1321 madeupstreet  mike b    6.3.1

CodePudding user response:

IIUC, split the header and values, then pivot_table:

out = (df['Info']
 .str.split(':|(?<=ID)\s', expand=True)
 .set_axis(['col', 'value'], axis=1)
 .assign(index=lambda d: d['col'].str.endswith('ID').cumsum())
 .pivot_table(index='index', columns='col', values='value', aggfunc='first')
 .rename_axis(index=None, columns=None)
)

output:

        # ID             Address Location     Name
1  3.22.33.2   1321 madeupstreet    Texas   mike b

CodePudding user response:

It was the row with ID entry that caused hastle but here is my solution

I first changed the entry with ID to have : just like all other rows

ID_index = df.Info[0].index('ID')
df.Info[0] =  df.Info[0][:ID_index 2]   ':'   df.Info[0][ID_index 2:]

Then split the rows by : and transpose the series

    new_df = df.Info.str.split(':', expand=True).T
    new_df.columns = new_df.iloc[0].values
    new_df = new_df.iloc[1:]

gives

        # ID    Location    Address         Name     Address
1   3.22.33.2   Texas   1321 madeupstreet   mike b  6.3.1
  • Related