I have a table that looks similar to the following:
Miles Side Direction Param1 Width Height Date
0.5 Left Right 5 0.6 0.8 2023-01-04
0.5 Right Right 5 0.5 0.9 2023-01-04
1 Left Left 4 0.3 0.3 2023-01-04
1 Right Left 4 0.5 0.5 2023-01-04
As can be seen from the table, there are duplicate values for Miles, Direction, Param1, and Date. However, Side, Width, and Height will vary. What I am trying to do is remove the duplicate values and make new columns for the values that vary. The table should look something like below:
Miles Direction Param1 Side1 Width1 Height1 Side2 Width2 Height2 Date
0.5 Right 5 Left 0.6 0.8 Right 0.5 0.9 2023-01-04
1 Left 4 Left 0.3 0.3 Right 0.5 0.5 2023-01-04
I have attempted to use the following:
- pivot function, but it doesn't appear to work when there are multiple duplicate parameters
- pivot_table - it seems like this would work but I think I am missing something.
I tried something like:
df = pd.pivot_table(df, values=['Side','Width','Height'], index=['Miles, Direction','Param1','Date'], columns=None)
but I think something is missing here, as the data was displayed completely incorrect. Any help would be much appreciated - thank you!
CodePudding user response:
Try:
df['tmp'] = df.groupby(['Miles', 'Direction', 'Param1', 'Date']).cumcount() 1
df = df.set_index(['Miles', 'Direction', 'Param1', 'Date', 'tmp'])
df = df.unstack('tmp')
df.columns = [f'{a}{b}' for a, b in df.columns]
df = df.reset_index()
print(df)
Prints:
Miles Direction Param1 Date Side1 Side2 Width1 Width2 Height1 Height2
0 0.5 Right 5 2023-01-04 Left Right 0.6 0.5 0.8 0.9
1 1.0 Left 4 2023-01-04 Left Right 0.3 0.5 0.3 0.5
CodePudding user response:
A proposition using pandas.pivot_table
:
dup_cols = ['Miles', 'Direction','Param1','Date']
var_cols = ['Side','Width','Height']
out = (
pd.pivot_table(df.
assign(idx=df.groupby(dup_cols).cumcount() 1),
index=dup_cols,
values=var_cols,
columns='idx',
fill_value='',
aggfunc=lambda x: x)
.pipe(lambda d: d.set_axis([f'{col}{num}' for col,num in d.columns], axis=1))
.reset_index()
)
# Output :
print(out)
Miles Direction Param1 Date Height1 Height2 Side1 Side2 Width1 Width2
0 0.5 Right 5 2023-01-04 0.8 0.9 Left Right 0.6 0.5
1 1.0 Left 4 2023-01-04 0.3 0.5 Left Right 0.3 0.5