I have the below Pandas Series with values that look like this:
Info |
---|
# ID 3.22.33.2 |
Location: Texas |
Address: 1321 madeupstreet |
Name: mike b |
Address: 6.3.1 |
There are almost 1000 rows with this data so the problem I am having is:
Question
Can I run a code in python or Excel to extract the values from these rows in such a way that it would place ID#s on a separate column, Location in a another column ...etc
So it would look something like this:
IDs | Location | Address | Name |
---|---|---|---|
3.22.33.2 | Texas | 1321 Madeupstreet | mike b |
Some items wont have a Name and in that case, could it just leave it blank? or write No name found?
I tried creating separate lists (this data came from a text file) but that method is not working for me, and I do not have any code to share at the moment.so I copy pasted all the values into an excel sheet.
Note (I do not care about the second Address line, so if it is easier to ignore that is fine).
CodePudding user response:
Based on the specific series and for the sake of easiness I suggest you can try:
df['Index'] = df['Info'].replace("# ","",regex=True).str.split().str[0]
df['Values'] = [' '.join(x) for x in df['Info'].replace("# ","",regex=True).str.split().str[1:]]
output = df.set_index('Index').drop(columns='Info').T
Returning:
Index ID Location: Address: Name: Address:
Values 3.22.33.2 Texas 1321 madeupstreet mike b 6.3.1
CodePudding user response:
IIUC, split
the header and values, then pivot_table
:
out = (df['Info']
.str.split(':|(?<=ID)\s', expand=True)
.set_axis(['col', 'value'], axis=1)
.assign(index=lambda d: d['col'].str.endswith('ID').cumsum())
.pivot_table(index='index', columns='col', values='value', aggfunc='first')
.rename_axis(index=None, columns=None)
)
output:
# ID Address Location Name
1 3.22.33.2 1321 madeupstreet Texas mike b
CodePudding user response:
It was the row with ID
entry that caused hastle but here is my solution
I first changed the entry with ID
to have :
just like all other rows
ID_index = df.Info[0].index('ID')
df.Info[0] = df.Info[0][:ID_index 2] ':' df.Info[0][ID_index 2:]
Then split the rows by :
and transpose the series
new_df = df.Info.str.split(':', expand=True).T
new_df.columns = new_df.iloc[0].values
new_df = new_df.iloc[1:]
gives
# ID Location Address Name Address
1 3.22.33.2 Texas 1321 madeupstreet mike b 6.3.1