Every time I try to create a nested list using list comprehension it ends up being a major headache or comes out incorrectly. I have a transposed data frame of four variables that I'm working with that has 9 columns of each variable. For example:
Date0, Date1, Date2, Date3 ... Date 9
GMV0, GMV1, GMV2, GMV3 .... GMV9
Revenue0, Revenue1, Revenue2, Revenue3 .... Revenue9
I am trying to create a nested list for each of these columns. The desired list is as follows:
[[Date0, GMV0, Revenue0], [Date1, GMV1, Revenue1], [Date2, GMV2, Revenue2] ... [Date9, GMV9, Revenue9]]
I can currently create the desired list using
date=[col for col in test.columns if 'Date' in col]
gmv=[col for col in test.columns if 'GMV' in col]
rev=[col for col in test.columns if 'Gross Revenue' in col]
vars=[[Date[i], gmv[i], rev[i]] for i in range(len(Date))]
But this is quite inefficient and I'm quite positive this is a one-liner code.
Can someone help with the correct list comprehension (or possibly some other method that is specific to transposed data) and help me wrap my head around it?
CodePudding user response:
You can use to_dict
:
>>> df
0 1 2 3 4
0 Date0 Date1 Date2 Date3 Date9
1 GMV0 GMV1 GMV2 GMV3 GMV9
2 Revenue0 Revenue1 Revenue2 Revenue3 Revenue9
>>> list(df.to_dict(orient='list').values())
[['Date0', 'GMV0', 'Revenue0'],
['Date1', 'GMV1', 'Revenue1'],
['Date2', 'GMV2', 'Revenue2'],
['Date3', 'GMV3', 'Revenue3'],
['Date9', 'GMV9', 'Revenue9']]
Update
>>> df
Date0 Date1 Date2 Date3 GMV0 GMV1 GMV2 GMV3 Revenue0 Revenue1 Revenue2 Revenue3
0 A B C D E F G H I J K L
>>> [list(t.columns) for _, t in df.groupby(df.columns.str.extract(r'(\d )', expand=False), axis=1)]
[['Date0', 'GMV0', 'Revenue0'],
['Date1', 'GMV1', 'Revenue1'],
['Date2', 'GMV2', 'Revenue2'],
['Date3', 'GMV3', 'Revenue3']]
CodePudding user response:
You can use list comprehension with nested for
clause.
vars = [
col
for key in ['Date', 'GMV', 'Gross Revenue']
for col in test.columns if key in col
]
reference: https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries
or, if you have already three lists, you can use built-in function zip
. It is like a transpose.
vars = list(zip(date, gmv, rev))
update:
Sorry for misunderstanding the question. If you need nested list, following code will work.
vars = list(zip(*(
[col for col in test if key in col]
for key in ['Date', 'GMV', 'Gross Revenue']
)))
If you are using DataFrame already, @Corralien's answer would be better. This answer is useful when you want to do it by vanilla Python.
CodePudding user response:
if the input list is:
>>> test = [['a1','a2','a3'],['b1', 'b2','b3'],['c1','c2','c3']]
then
>>> b = [[test[x][i] for x in range(len(test))] for i in range(len(test[0]))]
>>> b
[['a1', 'b1', 'c1'], ['a2', 'b2', 'c2'], ['a3', 'b3', 'c3']]
for understanding: try the result of inner cycle wit i=0 then i=1 ...
>>> i = 0
>>> [test[x][i] for x in range(len(test))]