Home > database >  Data Manipulating one dataframe into another using for loops and dictionaries
Data Manipulating one dataframe into another using for loops and dictionaries

Time:10-04

I have a data set that I need to reformat so that I can plot and work with it further. It is sort of an transpose action but I am struggling to not overwrite the data in the new dataframe. I sorted out the headings using dictionaries and it maps the fields from the original df to the new output df correctly. It is just overwriting the first entry and not adding a new POLY/POLY_NAME

Input dataframe: enter image description here

Output dataframe:

enter image description here

Below is my code so far:

import pandas as pd
fractions = {"A": 1.35, "B": 1.40, "C": 1.45}
quality = {"POLY_NAME":"POLY", "AS":"Ash", "CV":"CV","FC":"FC","MS":"Moist","TS":"Tots","VM":"Vols","YL":"Yield"}

frac = list(fractions.values())
headers = list(quality.values())
df = pd.DataFrame(columns=headers, index=frac)

wash_dic = {'POLY_NAME': {0: 'Asset 1', 1: 'Asset 2', 2: 'Asset 3'},
 'RD': {0: 1.63, 1: 1.63, 2: 1.57},
 'SEAMTH': {0: 3.02, 1: 3.02, 2: 3.37},
 'AAS': {0: 7.76, 1: 7.34, 2: 7.24},
 'ACV': {0: 28.98, 1: 29.18, 2: 29.27},
 'AFC': {0: 54.95, 1: 53.55, 2: 52.38},
 'AMS': {0: 4.22, 1: 4.26, 2: 4.63},
 'ATS': {0: 0.97, 1: 1.09, 2: 1.23},
 'AVM': {0: 33.07, 1: 34.85, 2: 35.75},
 'AYL': {0: 0.4, 1: 0.95, 2: 0.75},
 'BAS': {0: 9.28, 1: 9.27, 2: 9.58},
 'BCV': {0: 28.17, 1: 28.33, 2: 28.09},
 'BFC': {0: 56.21, 1: 54.39, 2: 52.11},
 'BMS': {0: 4.25, 1: 4.25, 2: 4.61},
 'BTS': {0: 0.84, 1: 1.01, 2: 1.22},
 'BVM': {0: 30.25, 1: 32.08, 2: 33.7},
 'BYL': {0: 3.11, 1: 5.44, 2: 4.36},
 'CAS': {0: 11.01, 1: 10.96, 2: 11.25},
 'CCV': {0: 27.31, 1: 27.53, 2: 27.39},
 'CFC': {0: 58.09, 1: 56.0, 2: 53.43},
 'CMS': {0: 4.41, 1: 4.38, 2: 4.62},
 'CTS': {0: 0.63, 1: 0.83, 2: 0.98},
 'CVM': {0: 26.5, 1: 28.66, 2: 30.71},
 'CYL': {0: 13.45, 1: 16.11, 2: 12.94}}

wash = pd.DataFrame(wash_dic)
wash

for label, content in wash.items(): 
    print('fraction:', fractions.get(label[0]), ' quality:', quality.get(label[-2:]))
    for c in content:
        try:
            df.loc[fractions.get(label[0]), quality.get(label[-2:])] = c
        except:
            pass

I have tried to add another for loop but the logic is escaping me currently.

Required outcome as dictionary

outcome = {'Unnamed: 0': {0: 1.35,
  1: 1.4,
  2: 1.45,
  3: 1.35,
  4: 1.4,
  5: 1.45,
  6: 1.35,
  7: 1.4,
  8: 1.45},
 'POLY': {0: 'Asset 1',
  1: 'Asset 1',
  2: 'Asset 1',
  3: 'Asset 2',
  4: 'Asset 2',
  5: 'Asset 2',
  6: 'Asset 3',
  7: 'Asset 3',
  8: 'Asset 3'},
 'Ash': {0: 7.76,
  1: 9.28,
  2: 11.01,
  3: 7.34,
  4: 9.27,
  5: 10.96,
  6: 7.24,
  7: 9.58,
  8: 11.25},
 'CV': {0: 28.98,
  1: 28.17,
  2: 27.31,
  3: 29.18,
  4: 28.33,
  5: 27.53,
  6: 29.27,
  7: 28.09,
  8: 27.39},
 'FC': {0: 54.95,
  1: 56.21,
  2: 58.09,
  3: 53.55,
  4: 54.39,
  5: 56.0,
  6: 52.38,
  7: 52.11,
  8: 53.43},
 'Moist': {0: 4.22,
  1: 4.25,
  2: 4.41,
  3: 4.26,
  4: 4.25,
  5: 4.38,
  6: 4.63,
  7: 4.61,
  8: 4.62},
 'Tots': {0: 0.97,
  1: 0.84,
  2: 0.63,
  3: 1.09,
  4: 1.01,
  5: 0.83,
  6: 1.23,
  7: 1.22,
  8: 0.98},
 'Vols': {0: 33.07,
  1: 30.25,
  2: 26.5,
  3: 34.85,
  4: 32.08,
  5: 28.66,
  6: 35.75,
  7: 33.7,
  8: 30.71},
 'Yiels': {0: 0.4,
  1: 3.11,
  2: 13.45,
  3: 0.95,
  4: 5.44,
  5: 16.11,
  6: 0.75,
  7: 4.36,
  8: 12.94}}

Regards

CodePudding user response:

I resolved to duplicate/overwriting of the values by first grouping the original wash DF and then in the for loop and the data of each loop into a blank DF and at the end of the loop append it to the Final DF. Just for neatness I made the index column a normal column and reordered the columns.

groups = wash.groupby("POLY_NAME")
df_final = pd.DataFrame(columns=headers)

for name, group in groups:
    df = pd.DataFrame(columns=headers)
    for label, content in group.items():
        if quality.get(label[-2:]) in headers:
            #print(label)
            #print(name)
            #print(label, content)
            for c in content:
                try:
                    df.loc[fractions.get(label[0]), "POLY"] = name
                    df.loc[fractions.get(label[0]), quality.get(label[-2:])] = c
                    #print('Poly:', name, ' fraction:', fractions.get(label[0]), ' quality:', quality.get(label[-2:]))
                except:
                    pass
    df_final = df_final.append(df)
df_final = df_final.reset_index().rename({'index':'FLOAT'}, axis = 'columns')
df_final = df_final.reindex(columns=["POLY","FLOAT","Ash","CV","FC","Moist","Tots","Vols","Yield"])

Might not be the neatest or fastest method but it gives the required results.

  • Related