I need help formatting my datasets in python, it looks like this
l1 | A1 | B1 | A2 | B2 | A3 | B3 | A4 | B4 |
---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 | NAN | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 |
I need to format it to look like this.....
l1 | A1 | B1 |
---|---|---|
1 | 2 | 3 |
12 | 6 | 7 |
20 | 8 | 9 |
30 | 10 | 11 |
40 | 12 | 13 |
please any python function that can rearrange this table will be highly appreciated. Thank you as I await your positive response
CodePudding user response:
The logic is not fully clear on the first column and why part of the data gets missing, but it looks like you need to use pandas.wide_to_long
:
(pd
.wide_to_long(df, stubnames=['A', 'B'], i='l1', j='x')
.droplevel('x')
.reset_index()
)
output:
l1 A B
0 1 2 3
1 12 13 14
2 20 21 22
3 1 6 7
4 12 15 16
5 20 23 24
6 1 8 9
7 12 17 18
8 20 25 26
9 1 10 11
10 12 NAN 19
11 20 27 28
CodePudding user response:
In my understanding, the logic is as follows. For the values 'l1', the entire column is taken, where the values end we begin to add to the previous 'l1', the last value 'A1'.
A1" are even numbers, "B1" odd even numbers taken from the zero line. To continue columns 'A1', 'B1' you need to add the first value 'A1'. Maybe someone will make it more beautiful on pandas.
a = [i for i in df.iloc[0, 1:] if i % 2 == 0]
b = [i for i in df.iloc[0, 1:] if i % 2 != 0]
l = list(df['l1'])
for i in range(2, 4):
l.append(l[i] a[-1])
a.append(a[-1] a[0])
b.append(b[-1] a[0])
aaa = [l, a, b]
df = pd.DataFrame(aaa)
df = df.T
df = df.set_axis(['l1', 'A1', 'B1'], axis=1, inplace=False)
Output
l1 A1 B1
0 1 2 3
1 12 6 7
2 20 8 9
3 30 10 11
4 40 12 13