Home > Blockchain >  Reduce iterating over a zip list of dataframes
Reduce iterating over a zip list of dataframes

Time:10-07

Consider the following code, which uses functools.reduce to concatenate a list of dataframes:

from functools import reduce
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})
df3 = pd.DataFrame({'C': [5, 6]})

reduce(lambda x, y: pd.concat([x, y], axis=1), [df1, df2, df3])

This code works well. However, when I try the following, I get errors:

reduce(lambda x, y: pd.concat([x[0], y[0]], axis=1), zip([df1, df2, df3], [0, 1, 0]))

Could someone please help me to understand that?

CodePudding user response:

Let's understand what's going on in reduce:

# Iteration 1: 
# x = (df1, 0); y = (df2, 1)
# reduce(x, y): pd.concat([x[0], y[0]], axis=1) # okay
# Now the result of `reduce(x, y)` is a dataframe which will be used as new x for iteration 2

# Iteration 2: 
# x = some_dataframe, y = (df3, 0)
# reduce(x, y): pd.concat([x[0], y[0]], axis=1) # error
# Notice that x is not a tuple anymore but a dataframe instead.
# So calling dataframe[0] will raise an key error because there is no such column in the dataframe

In case you are interested in a implementation of reduce, here is the minimal implementation:

def reduce(func, sequence):
    if not sequence:
        raise TypeError('Empty sequence')

    result = sequence[0]
    for item in sequence[1:]:
        result = func(result, item)
    
    return result
  • Related