Input data:
data = [
['0039384', [{'A': 415}, {'A': 228}, {'B': 360}, {'B': 198}, {'C': 300}, {'C': 165}]],
['0035584', [{'A': 345}, {'A': 117}, {'B': 223}, {'B': 554}, {'C': 443}, {'C': 143}]]
]
df = pd.DataFrame(data=data, columns=['id', 'prices'])
I want to get this resut:
id CurrentPrice_A LastPrice_C CurrentPrice_B LastPrice_B CurrentPrice_C LastPrice_C
0039384 415 228 360 198 300 165
I have tried to separate the dict and then every column to replace and rename than get the price, but it takes around 10 lines code. Do you know any short and fast way to do this.
CodePudding user response:
It is convenient to iterate over each row of the dataframe, so that you have the algorithm under control, zip the dictionaries two by two (so as to merge current and last) and dynamically assign column names with their values.
For convenience, instead of using pd.concat(), you can use lists and temporary dictionaries.
import pandas as pd
data = [
['0039384', [{'A': 415}, {'A': 228}, {'B': 360}, {'B': 198}, {'C': 300}, {'C': 165}]],
['0035584', [{'A': 345}, {'A': 117}, {'B': 223}, {'B': 554}, {'C': 443}, {'C': 143}]]
]
df = pd.DataFrame(data=data, columns=['id', 'prices'])
new_df_rows = []
for index, row in df.iterrows():
grouped_prices = zip(row.prices[::2], row.prices[1::2]) # create groups two-by-two
tmp_dict = {'id': row.id}
for curr_price, last_price in grouped_prices:
tmp_dict.update({
'CurrentPrice_' str(list(curr_price.keys())[0]): int(list(curr_price.values())[0]),
'LastPrice_' str(list(last_price.keys())[0]): int(list(last_price.values())[0])
})
new_df_rows.append(tmp_dict)
new_df = pd.DataFrame(new_df_rows)
print(new_df)
output will be:
id CurrentPrice_A LastPrice_A CurrentPrice_B LastPrice_B CurrentPrice_C LastPrice_C
0 0039384 415 228 360 198 300 165
1 0035584 345 117 223 554 443 143
CodePudding user response:
first convert list rows to new columns:
dfx = pd.DataFrame(df['prices'].tolist(),index=df.id)
print(dfx)
'''
0 1 2 3 4 5
id
0039384 {'A': 415} {'A': 228} {'B': 360} {'B': 198} {'C': 300} {'C': 165}
0035584 {'A': 345} {'A': 117} {'B': 223} {'B': 554} {'C': 443} {'C': 143}
'''
Then let's separate the columns into odd and even numbers. Odd numbers will represent the last price, and even numbers will represent the current price:
last=list(filter(lambda x: x % 2, list(dfx.columns))) #[1, 3, 5]
currents=list(sorted(set(dfx.columns) - set(last))) #[0, 2, 4]
now, rename columns:
for i in currents:
dfx=dfx.rename(columns={i:'CurrentPrice_{}'.format(list(dfx[i].iloc[0].keys())[0])})
for i in last:
dfx=dfx.rename(columns={i:'LastPrice_{}'.format(list(dfx[i].iloc[0].keys())[0])})
print(dfx)
'''
id CurrentPrice_A LastPrice_A CurrentPrice_B LastPrice_B CurrentPrice_C LastPrice_C
0039384 {'A': 415} {'A': 228} {'B': 360} {'B': 198} {'C': 300} {'C': 165}
0035584 {'A': 345} {'A': 117} {'B': 223} {'B': 554} {'C': 443} {'C': 143}
'''
finally, get values from dicts:
for i in dfx.columns:
dfx[i]=dfx[i].apply(lambda x: list(x.values())[0])
print(dfx)
'''
id CurrentPrice_A LastPrice_A CurrentPrice_B LastPrice_B CurrentPrice_C LastPrice_C
0039384 415 228 360 198 300 165
0035584 345 117 223 554 443 143
'''