Home > Enterprise >  Apply a function in dictionary composed of DataFrames with different column names
Apply a function in dictionary composed of DataFrames with different column names

Time:12-18

I've been struggling trying to take a dictionary "d" composed of n dataframes, and apply to them this:

idf = idf.iloc[idf.index.repeat(idf.iloc[:,0])]

Which is a function to repeat index-number of times the column 0 of each dataframe. Something like this:

BEFORE:         AFTER:

Index           Index
1290  2         1290  2
1320  3         1290  2 
1400  4         1320  3
                1320  3
                1320  3
                1400  4
                1400  4
                1400  4
                1400  4

So, the dictionary "d" has the dataframes that look like the before column. I tried this way to apply the function:

    for idf in d:
        d = idf.iloc[idf.index.repeat(idf.iloc[:,0])]

I was able to do it this way when I select manually a column name, but these dataframes have different column names (on purpose). But I can't apply this because .iloc[ ] doesn't work on strings (I found weird because it is not selecting the values of the dictionary, instead is using the string of the dictionary).

If I want back the dictionary "d" with the function applied, how can I solve this?

Thanks!

Edits:

  1. Example picture of one of the dataframes inside the dictionary "d", remember that the name of the first column [0] is different in each dataframe (and it shouldn't be changed for data managment things):

Example picture of one of the dataframes inside the dictionary "d"

  1. I already know how to repeat n times, my question is to apply it to a dictionary with dataframes.

CodePudding user response:

Is this doing what you need?

import pandas as pd

df1 = pd.DataFrame({"a":[2, 3, 4, 3], "col1":[1, 2, 3, 4]})
df1.set_index("a", inplace=True)

df2 = pd.DataFrame({"b":[1, 2, 4], "col2":[3, 2, 1]})
df2.set_index("b", inplace=True)

d = {"df1": df1, "df2": df2}


for idf,this_df in d.items():
    d[idf] = this_df.loc[this_df.index.repeat(this_df.iloc[:,0])]
  • Related