Home > Blockchain >  how to add multiple lists while adding multiple columns into pandas dataframe
how to add multiple lists while adding multiple columns into pandas dataframe

Time:06-07

I have 21 list pairs (date, number of items), there are 21 types of items. I would like to add all of this data to a pandas dataframe with 23 columns (the date, number of item a, number item b ,...,number of item u, total items). in some cases a day will only have one type of item, on other days there could be item a, b, and f for example.

My though was to create a blank dataframe, then append each list with the date in the first column and the "item number" in a new column for each item then somehow sort the dataframe to match the days. for example:

df=pd.DataFrame(columns='date','itemA','itemB','itemC','itemD','itemE','itemF','itemG','itemH','itemI','itemJ','itemK','itemL','itemM','itemN','itemO','itemP','itemQ','itemR','itemS','itemT','itemU','total')

For instance day jan 1 2020 might have 20 of item a 40 of item c and 5 of item m. I imagine that when first appended this data would be on 3 separate rows with data in column a and b, column a and d, column a and n. would there be a way for the pandas dataframe to recognize that the date in column a for all 3 rows are the same and consolidate the data so that it was on one row with data in column a and b and d and n?

Lastly how could I create the last column of total items/day (columns b-v) into a final column?

CodePudding user response:

If I understand correctly then you're looking for something like this.

if you managed to put your data into a list of dictionaries, as each dictionary represents a day, you can do it as follows.

Note:

  • The dictionaries don't have to contain all the elements only the one existing in that day.

  • The items inside the dictionary don't have to be ordered.

import pandas as pd

df_list = [
    {'date':'06/06/2022', 'itemA':1, 'itemB':1, 'itemC':1,},
    {'date':'06/06/2022', 'itemA':1, 'itemE':5, 'itemF':1,},
    {'date':'06/06/2022', 'itemB':2, 'itemF':1, 'itemC':1,},
    {'date':'06/06/2022', 'itemD':1, 'itemB':1, 'itemE':1,},
    {'date':'06/06/2022', 'itemF':1, 'itemB':4, 'itemD':1,},
    {'date':'06/06/2022', 'itemE':6, 'itemC':1, 'itemD':1,},
]

# we sort our columns with the sequence of this list
sorted_columns = ['date','itemA','itemB','itemC','itemD','itemE','itemF']

# we create a dataframe from the list of dictionaries
# we fill the empty items with zeros
df = pd.DataFrame(df_list, columns=sorted_columns).fillna(0)

# convert to integers
df[sorted_columns[1:]] = df[sorted_columns[1:]].applymap(int)

# we make a new column 'Total' that summs all the items in each day
# NOTE: the [1:] is to ignore the first column which has the date
df['Total'] = df.apply(lambda row: sum(row[1:]), axis=1)

output:

date itemA itemB itemC itemD itemE itemF Total
06/06/2022 1 1 1 0 0 0 3
06/06/2022 1 0 0 0 5 1 7
06/06/2022 0 2 1 0 0 1 4
06/06/2022 0 1 0 1 1 0 3
06/06/2022 0 4 0 1 0 1 6
06/06/2022 0 0 1 1 6 0 8

if you find this answer helpful, kindly accept it.

CodePudding user response:

import pandas as pd

# input data according to this comment
# https://stackoverflow.com/questions/72520487/#comment128113673_72520940

itemAdates = ['1/1/20', '1/2/20', '1/3/20',  '1/4/20']
itemAcounts = [4, 10, 3, 6]

itemBdates = ['1/1/20', '1/3/20', '1/4/20']
itemBcounts = [9, 5, 6]

itemCdates = ['1/2/20', '1/3/20', '1/4/20']
itemCcounts = [2, 6, 7]

# parsing the data into 1 big list of (date, item_name, item_count)
data = [
    *[(date,  'itemA', item_count) for date, item_count in zip(itemAdates, itemAcounts)],
    *[(date,  'itemB', item_count) for date, item_count in zip(itemBdates, itemBcounts)],
    *[(date,  'itemC', item_count) for date, item_count in zip(itemCdates, itemCcounts)],
]

# parsing the big list into a dictionary with 
# new_data = {date:[('date', date), (item_name, item_count), (item_name, item_count), ...]}
new_data = {}
for date, item_name, item_count in data:
    new_data[date] = new_data.get(date, [('date', date)])   [(item_name, item_count)]

# converting the list of tuples into dict and appending it into the df_list
df_list = []
for date_values in new_data.values():
    df_list.append(dict(date_values))

# we sort our columns with the sequence of this list
# NOTE: the date must be in the first position
sorted_columns = ['date','itemA','itemB','itemC']

# we create a dataframe from the list of dictionaries
# we fill the empty items with zeros
df = pd.DataFrame(df_list, columns=sorted_columns).fillna(0)

# convert to integers
df[sorted_columns[1:]] = df[sorted_columns[1:]].applymap(int)

# we make a new column 'Total' that summs all the items in each day
# NOTE: the [1:] is to ignore the first column which has the date
df['Total'] = df.apply(lambda row: sum(row[1:]), axis=1)

output:

date itemA itemB itemC Total
1/1/20 4 9 0 13
1/2/20 10 0 2 12
1/3/20 3 5 6 14
1/4/20 6 6 7 19
  • Related