Get the value of dict in every column in dataframe-CodePudding

Input data:

data = [
    ['0039384', [{'A': 415}, {'A': 228}, {'B': 360}, {'B': 198}, {'C': 300}, {'C': 165}]],
    ['0035584', [{'A': 345}, {'A': 117}, {'B': 223}, {'B': 554}, {'C': 443}, {'C': 143}]]
]

df = pd.DataFrame(data=data, columns=['id', 'prices'])

I want to get this resut:

id  CurrentPrice_A  LastPrice_C CurrentPrice_B  LastPrice_B CurrentPrice_C  LastPrice_C
0039384 415         228         360         198         300         165

I have tried to separate the dict and then every column to replace and rename than get the price, but it takes around 10 lines code. Do you know any short and fast way to do this.

CodePudding user response：

It is convenient to iterate over each row of the dataframe, so that you have the algorithm under control, zip the dictionaries two by two (so as to merge current and last) and dynamically assign column names with their values.

For convenience, instead of using pd.concat(), you can use lists and temporary dictionaries.

import pandas as pd

data = [
    ['0039384', [{'A': 415}, {'A': 228}, {'B': 360}, {'B': 198}, {'C': 300}, {'C': 165}]],
    ['0035584', [{'A': 345}, {'A': 117}, {'B': 223}, {'B': 554}, {'C': 443}, {'C': 143}]]
]

df = pd.DataFrame(data=data, columns=['id', 'prices'])


new_df_rows = []

for index, row in df.iterrows():

    grouped_prices = zip(row.prices[::2], row.prices[1::2])  # create groups two-by-two
    tmp_dict = {'id': row.id}
    for curr_price, last_price in grouped_prices:
        tmp_dict.update({
            'CurrentPrice_'   str(list(curr_price.keys())[0]): int(list(curr_price.values())[0]),
            'LastPrice_'   str(list(last_price.keys())[0]): int(list(last_price.values())[0])
        })
    new_df_rows.append(tmp_dict)

new_df = pd.DataFrame(new_df_rows)
print(new_df)

output will be:

        id  CurrentPrice_A  LastPrice_A  CurrentPrice_B  LastPrice_B  CurrentPrice_C  LastPrice_C
0  0039384             415          228             360          198             300          165
1  0035584             345          117             223          554             443          143

CodePudding user response：

first convert list rows to new columns:

dfx = pd.DataFrame(df['prices'].tolist(),index=df.id)
print(dfx)
'''
                  0           1           2           3           4           5
id                                                                             
0039384  {'A': 415}  {'A': 228}  {'B': 360}  {'B': 198}  {'C': 300}  {'C': 165}
0035584  {'A': 345}  {'A': 117}  {'B': 223}  {'B': 554}  {'C': 443}  {'C': 143}
'''

Then let's separate the columns into odd and even numbers. Odd numbers will represent the last price, and even numbers will represent the current price:

last=list(filter(lambda x: x % 2, list(dfx.columns))) #[1, 3, 5]
currents=list(sorted(set(dfx.columns) - set(last))) #[0, 2, 4]

now, rename columns:

for i in currents:
    dfx=dfx.rename(columns={i:'CurrentPrice_{}'.format(list(dfx[i].iloc[0].keys())[0])})

for i in last:
    dfx=dfx.rename(columns={i:'LastPrice_{}'.format(list(dfx[i].iloc[0].keys())[0])})
print(dfx)
'''
id      CurrentPrice_A   LastPrice_A    CurrentPrice_B  LastPrice_B  CurrentPrice_C  LastPrice_C
0039384 {'A': 415}       {'A': 228}     {'B': 360}      {'B': 198}   {'C': 300}  {'C': 165}
0035584 {'A': 345}       {'A': 117}     {'B': 223}      {'B': 554}   {'C': 443}  {'C': 143}

'''

finally, get values from dicts:

for i in dfx.columns:
    dfx[i]=dfx[i].apply(lambda x: list(x.values())[0])

print(dfx)
'''
id      CurrentPrice_A  LastPrice_A CurrentPrice_B  LastPrice_B CurrentPrice_C  LastPrice_C
0039384 415             228         360             198         300             165
0035584 345             117         223             554         443             143

'''