Home > front end >  Create a new column in a Pandas DataFrame from exisiting column names
Create a new column in a Pandas DataFrame from exisiting column names

Time:11-10

I want to deconstruct a pandas DataFrame, using column headers as a new data-column and create a list with all combinations of the row index and columns. Easier to show than explain:

index_col = ["store1", "store2", "store3"]
cols = ["January", "February", "March"]
values = [[2,3,4],[5,6,7],[8,9,10]]
df = pd.DataFrame(values, index=index_col, columns=cols)

From this DataFrame I wish to get the following list:

[['store1', 'January', 2],
 ['store1', 'February', 3],
 ['store1', 'March', 4],
 ['store2', 'January', 5],
 ['store2', 'February', 6],
 ['store2', 'March', 7],
 ['store3', 'January', 8],
 ['store3', 'February', 9],
 ['store3', 'March', 10]]

Is there a convenient way to do this?

CodePudding user response:

df.unstack().swaplevel().reset_index().values.tolist()
#OR
df.reset_index().melt(id_vars="index").values.tolist()
# [['store1', 'January', 2],
#  ['store2', 'January', 5],
#  ['store3', 'January', 8],
#  ['store1', 'February', 3],
#  ['store2', 'February', 6],
#  ['store3', 'February', 9],
#  ['store1', 'March', 4],
#  ['store2', 'March', 7],
#  ['store3', 'March', 10]]

With following, the order of elements will match the output in the question.

df.transpose().unstack().reset_index().values.tolist()
# [['store1', 'January', 2],
#  ['store1', 'February', 3],
#  ['store1', 'March', 4],
#  ['store2', 'January', 5],
#  ['store2', 'February', 6],
#  ['store2', 'March', 7],
#  ['store3', 'January', 8],
#  ['store3', 'February', 9],
#  ['store3', 'March', 10]]

CodePudding user response:

True Pandas-style:

lst = [[*k, v] for k, v in df.unstack().swaplevel().to_dict().items()]

CodePudding user response:

The structure that you want your data in is very messy, so this is probably the best method given the data you want.

# Results
res = []

# Nested loop: first for length of index col, then next for cols
for i in range(len(index_col)):
    for j in range(len(cols)):
        # Format of data
        res.append([index_col[i], cols[j], values[i][j]])

# Return results
print(res)
return res

CodePudding user response:

You can iterate over dataframe items using

data = []

for col, row in df.items():
    for ind, val in row.reset_index().values:
        data.append([ind, col, val])

data

You could avoid the second loop for sacrificing the order you requested the output in as it is a bit of a full breakdown of how the structure started.

CodePudding user response:

temp = df.stack()

[[*ent, val] for ent, val in zip(temp.index, temp)]

[['store1', 'January', 2],
 ['store1', 'February', 3],
 ['store1', 'March', 4],
 ['store2', 'January', 5],
 ['store2', 'February', 6],
 ['store2', 'March', 7],
 ['store3', 'January', 8],
 ['store3', 'February', 9],
 ['store3', 'March', 10]]

CodePudding user response:

I'd prefer stacking over unstacking then swapping the levels:

>>> df.stack().reset_index().to_numpy()
array([['store1', 'January', 2],
       ['store1', 'February', 3],
       ['store1', 'March', 4],
       ['store2', 'January', 5],
       ['store2', 'February', 6],
       ['store2', 'March', 7],
       ['store3', 'January', 8],
       ['store3', 'February', 9],
       ['store3', 'March', 10]], dtype=object)
>>> 

Or with melt and ignore_index=False:

>>> df.melt(ignore_index=False).reset_index().to_numpy()
array([['store1', 'January', 2],
       ['store2', 'January', 5],
       ['store3', 'January', 8],
       ['store1', 'February', 3],
       ['store2', 'February', 6],
       ['store3', 'February', 9],
       ['store1', 'March', 4],
       ['store2', 'March', 7],
       ['store3', 'March', 10]], dtype=object)
>>> 
  • Related