List = [0, 3, 2, 1]
This is the pandas dataframe i have.
TIME | DATA | DMMA CODE | |
---|---|---|---|
0 | 0 days 00:00:00 | Layer1 | B |
1 | 0 days 00:00:00 | Layer2 | S |
2 | 0 days 00:08:00 | Layer3 | B |
3 | 0 days 00:16:00 | Layer4 | S |
4 | 0 days 00:24:00 | Layer5 | B |
5 | 0 days 00:53:00 | Layer6 | S |
6 | 0 days 01:16:00 | Layer7 | B |
7 | 0 days 01:29:00 | Layer8 | S |
This is the end-result im trying to get.
TIME | DATA | |
---|---|---|
0 | 0 days 00:00:00 , 0 days 00:00:00 | Layer1 , Layer2 |
1 | 0 days 00:08:00 , 0 days 01:29:00 | Layer3 , Layer8 |
2 | 0 days 00:24:00 , 0 days 00:53:00 | Layer5 , Layer6 |
3 | 0 days 01:16:00 , 0 days 00:16:00 | Layer7 , Layer4 |
Basically index of the element inside "List" is the index of datas that have DMMA CODE of "B" and the element itself is the index of the datas that have DMMA CODE of "S".
First element of the list tells that "First CODE:B should be grouped with the first CODE:S"
Second element of the list tells that "Second CODE:B should be grouped with the fourth CODE:S" etc.
I am not familiar with pandas and couldn't even find a way to approach this problem.
CodePudding user response:
Your data is basically out of order, so let's re-order it:
info = [0, 3, 2, 1]
mask = df['DMMA CODE'].eq('S')
df.loc[mask] = df[mask].iloc[info].values
# Then we can get what you want fairly easily:
groups = df.groupby('DMMA CODE').cumcount()
out = df.groupby(groups).agg(list)
print(out)
Output:
TIME DATA DMMA CODE
0 [0 days 00:00:00, 0 days 00:00:00] [Layer1, Layer2] [B, S]
1 [0 days 00:08:00, 0 days 01:29:00] [Layer3, Layer8] [B, S]
2 [0 days 00:24:00, 0 days 00:53:00] [Layer5, Layer6] [B, S]
3 [0 days 01:16:00, 0 days 00:16:00] [Layer7, Layer4] [B, S]
CodePudding user response:
I have a suggestion to do the sorting using python dictionary, but there are some steps involving pandas
as well. First I need to output a dictionary for each DMMA CODE "B" and "S", and each to have the natural order index 0,1,2,3. From the starting df
TIME DATA DMMA_CODE
0 0_days_00:00:00 Layer1 B
1 0_days_00:00:00 Layer2 S
2 0_days_00:08:00 Layer3 B
3 0_days_00:16:00 Layer4 S
4 0_days_00:24:00 Layer5 B
5 0_days_00:53:00 Layer6 S
6 0_days_01:16:00 Layer7 B
7 0_days_01:29:00 Layer8 S
Next I append the items according to the desired order, ie, for DMMA CODE "B" with order 0,1,2,3 whereas for DMMA CODE "S" with order 0,3,2,1.
import pandas as pd
df = df.sort_values('DMMA_CODE')
b = df[df['DMMA_CODE']=='B'].reset_index(drop=True).to_dict()
s = df[df['DMMA_CODE']=='S'].reset_index(drop=True).to_dict()
d = {}
time, data = [], []
order_list = [0,3,2,1]
for i, order in enumerate(order_list):
time.append(b['TIME'][i] ' , ' s['TIME'][order])
data.append(b['DATA'][i] ' , ' s['DATA'][order])
d['TIME'] = time
d['DATA'] = data
print(pd.DataFrame(d))
Output:
TIME DATA
0 0_days_00:00:00 , 0_days_00:00:00 Layer1 , Layer2
1 0_days_00:08:00 , 0_days_01:29:00 Layer3 , Layer8
2 0_days_00:24:00 , 0_days_00:53:00 Layer5 , Layer6
3 0_days_01:16:00 , 0_days_00:16:00 Layer7 , Layer4
CodePudding user response:
Here is an alternative way using sort_values()
ndf = (df.sort_values('DMMA CODE')
.sort_values('DMMA CODE',key = lambda x: [i for i in list(range(0,4)) list(np.array(l))]))
ndf.groupby(ndf['DMMA CODE'].eq('B').cumsum()).agg(', '.join)
Output:
TIME DATA DMMA CODE
DMMA CODE
1 0 days 00:00:00, 0 days 00:00:00 Layer1, Layer2 B, S
2 0 days 00:08:00, 0 days 01:29:00 Layer3, Layer8 B, S
3 0 days 00:24:00, 0 days 00:53:00 Layer5, Layer6 B, S
4 0 days 01:16:00, 0 days 00:16:00 Layer7, Layer4 B, S