Home > database >  How a list can enter into a python dataframe's cell?
How a list can enter into a python dataframe's cell?

Time:11-09

I have some dataframes (df, tmp_df and final_df) and I want to enter two columns of tmp_df into two differenrt cells of final_df as list type. My code and dataframes are presented as follow. However, the loop part is not working correctly. The other questions in stackoverflow or other websites, answer this question if there is an available dictionary for the lists from the beginning of the program. But here, the tmp_df dataframe changes during the for loop and at each iteration suitable prices are calculated. Also, the most related data are founded and they must locate as a realted cell of final_df.

    import pandas as pd

    df = pd.read_csv('myfile.csv')
    tmp_df = pd.DataFrame()
    final_df = pd.DataFrame()
    tmp_df = df[df['Type'] == True]
    cnt = 0
    for c in tmp_df['Category']:
        #################
        # Apply some calculations and call different methods to do some changes on Price column of tmp_df.
        #################
        final_df.at[cnt,'Data'] = list(set(tmp_sub['Data']))
        final_df ['Category'], final_df['Acceptable'], final_df['Rank'],final_df['Price'] = \
        tmp_df['Rank'], list(tmp_sub['Price'])
        cnt  =1



df:

| Data    | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| 30275   | A        | Yes        |  1   | 52787 |
| 35881   | C        | No         |  2   | 14804 |
| 28129   | C        | Yes        |  3   | 180543|
| 30274   | D        | No         |  2   | 8066  |
| 30351   | D        | Yes        |  3   | 273478|
| 35886   | A        | Yes        |  2   | 10808 |
| 39900   | C        | Yes        |  1   | 21893 |
| 35887   | A        | No         |  2   | 2244  |
| 35883   | A        | Yes        |  1   | 10066 |
| 35856   | D        | Yes        |  3   | 19011 |
| 35986   | C        | No         |  2   | 6895  |
| 30350   | D        | No         |  3   | 5243  |
| 28129   | C        | Yes        |  1   | 112859|
| 31571   | C        | Yes        |  1   | 20701 |

tmp_df:

| Data    | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| 30275   | A        | Yes        |  1   | 52787 |
| 38129   | C        | Yes        |  3   | 180543|
| 30351   | D        | Yes        |  3   | 273478|
| 35886   | A        | Yes        |  2   | 10808 |
| 39900   | C        | Yes        |  1   | 21893 |
| 35883   | A        | Yes        |  1   | 10066 |
| 35856   | D        | Yes        |  3   | 19011 |
| 28129   | C        | Yes        |  1   | 112859|
| 31571   | C        | Yes        |  1   | 20701 |

The prices in the final dataframe (final_df) are changed because of the calculations over the tmp_df. Now, what should I do if I want the following result?

final_df:

| Data                      | Category | Acceptable | Rank | Price |
| -------                   | -------- | ---------- | ---- | ----- |
| [30275,35886,35883]       | A        | Yes        | [1,2]| 195543|
| [28129,39900,38129,31571] | C        | Yes        | [1,3]| 210089|
| [30351,35856]             | D        | Yes        |   3  | 113859|

CodePudding user response:

You can aggregate list and for Price another aggregation function, e.g. sum, mean...:

#generate custom function for Price
def func(x):
    
    return x.sum()

d = {'Data':list,'Rank':lambda x: list(set(x)), 'Price':func}
final_df = (tmp_df.groupby(['Category','Acceptable'],as_index=False)
                  .agg(d)
                  .reindex(tmp_df.columns, axis=1))

d = {'Data':list,'Rank':lambda x: list(set(x)), 'Price':'max'}
final_df = (tmp_df.groupby(['Category','Acceptable'],as_index=False)
                  .agg(d)
                  .reindex(tmp_df.columns, axis=1))

print (final_df)
                           Data Category Acceptable    Rank   Price
0         [30275, 35886, 35883]        A        Yes  [1, 2]   52787
1  [38129, 39900, 28129, 31571]        C        Yes  [1, 3]  180543
2                [30351, 35856]        D        Yes     [3]  273478

Solution with custom function:

def func1(x):
    return x.sum()


def f(x):

    a = list(x['Data'])
    b = list(set(x['Rank']))
    c = func1(x['Price'])
    
    return pd.Series({'Data':a,'Rank':b,'Price':c})


final_df = (tmp_df.groupby(['Category','Acceptable'])
                  .apply(f)
                  .reset_index()
                  .reindex(tmp_df.columns, axis=1))
  • Related