I have numerous data structures that take the form of following table:
traces length
0 domain 10
1 length1 1
2 length2 2
3 length3 3
4 width1 4
5 width2 5
6 width3 6
And I need it to look like this:
traces measurments new_traces new_measurements
0 domain 10
1 length1 1 width1 4
2 length2 2 width2 5
3 length3 3 width3 6
Since my data structures contain various lengths and widths, I can't hard code range information and need something that can sort the traces column regardless of how many lengths and widths there are. I've been using pandas to try to coax this into shape, but I'm having a lot of issues (because I'm new to this and very naive) and haven't made much progress. Any help would be so much appreciated!
CodePudding user response:
Assuming the name of your original dataframe is df
, I would first rename the length column and then split the dataframe in two parts, making sure to rename the indices of the second part such that they align with those in the first part the way you want. Then pd.concat
will do the rest:
df.rename({'length': 'measurements'},
axis=1, inplace=True)
part_1 = df.loc[:3]
part_2 = df.loc[4:].rename(index=lambda x: x - 3,
columns=lambda s: 'new_' s)
new_df = pd.concat([part_1, part_2], axis=1)
new_df
traces measurements new_traces new_measurements
0 domain 10 NaN NaN
1 length1 1 width1 4.0
2 length2 2 width2 5.0
3 length3 3 width3 6.0
CodePudding user response:
Is this what you are looking for?:
df = df.set_index(df['traces'].str.extract('(\d )',expand = False))
mask = df['traces'].str.contains('width')
df.loc[~mask].join(df.loc[mask],rsuffix='_new').rename(lambda x: '_'.join(x.split('_')[::-1]),axis=1).reset_index(drop=True)
Output:
traces length new_traces new_length
0 domain 10 NaN NaN
1 length1 1 width1 4.0
2 length2 2 width2 5.0
3 length3 3 width3 6.0