Home > Software design >  Get column values of a DataFrame if column name matches row value of another DataFrame pandas
Get column values of a DataFrame if column name matches row value of another DataFrame pandas

Time:08-19

I have two DataFrame,

A = {'col1': [n, b], 'col2': [c,a]}
B = {'a': [1, 24, 30], 'b': [100, nan, 10],'c': [nan, 4.6, nan],'n': [10, 2, 98.2] }
df_a = pd.DataFrame(data=a)
df_b = pd.DataFrame(data=b)

what I’m trying to do is iterating over df_a rows, first considering col1 and then col2, and each time a row is equal to a column name of df_b, I want to retrieve the values under that column.

For example, if in the first row of df_a (col1) is present “n” then I want to get a Dataframe that will have 10,2,98.2 as rows. And then move to col2 to do the same. At the end, for each iteration I’d like to have a new Dataframe with two columns each. In the example, the first iteration would give me a DataFrame like this:

     n      b 
0   10    100
1   2     nan
2   98.2   10

I tried with this, but without success:

if row['col1'] == df_b.columns.any():
     values_df = df_a['col1'].values[:]

CodePudding user response:

This is not a perfect solution and I violated many zen rules here but it still works I hope:

df = pd.DataFrame()
for col in df_a.columns:
    new_df = pd.concat(df_a[col].apply(lambda x: pd.DataFrame({x: df_b[x]}) if x in df_b.columns else None).values, axis=1)
    df[[col for col in new_df.columns]] = new_df

CodePudding user response:

You could write a function to take a list of column names and return the Pandas DataFrame based on the values in respective columns of df_b like this:

def lookup_columns(lookups: list, values_df: pd.DataFrame) -> pd.DataFrame:
    result = {}
    
    for key in lookups:
        try:
            result[key] = list(values_df[key].values)
        except KeyError as err:
            pass
        
    return pd.DataFrame(result)

Since you said you want a DataFrame from each iteration, you could iterate over the columns of df_a and create a dictionary of Pandas DataFrames like this:

mapped_columns = {}

for col in df_a.columns:
    mapped_columns[col] = lookup_columns(list(df_a[col].values), df_b)

Based on the code example you provided, you would end up with a dictionary with two entries:

mapped_columns['col1']

    n       b
0   10.0    100.0
1   2.0 NaN
2   98.2    10.0

And

mapped_columns['col2']


    c   a
0   NaN 1
1   4.6 24
2   NaN 30
  • Related