Consider the following code, which uses functools.reduce
to concatenate a list of dataframes:
from functools import reduce
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})
df3 = pd.DataFrame({'C': [5, 6]})
reduce(lambda x, y: pd.concat([x, y], axis=1), [df1, df2, df3])
This code works well. However, when I try the following, I get errors:
reduce(lambda x, y: pd.concat([x[0], y[0]], axis=1), zip([df1, df2, df3], [0, 1, 0]))
Could someone please help me to understand that?
CodePudding user response:
Let's understand what's going on in reduce
:
# Iteration 1:
# x = (df1, 0); y = (df2, 1)
# reduce(x, y): pd.concat([x[0], y[0]], axis=1) # okay
# Now the result of `reduce(x, y)` is a dataframe which will be used as new x for iteration 2
# Iteration 2:
# x = some_dataframe, y = (df3, 0)
# reduce(x, y): pd.concat([x[0], y[0]], axis=1) # error
# Notice that x is not a tuple anymore but a dataframe instead.
# So calling dataframe[0] will raise an key error because there is no such column in the dataframe
In case you are interested in a implementation of reduce
, here is the minimal implementation:
def reduce(func, sequence):
if not sequence:
raise TypeError('Empty sequence')
result = sequence[0]
for item in sequence[1:]:
result = func(result, item)
return result