Let's say I have a table that looks like this:
table = pd.DataFrame(
columns=["Name", "Size", "Color"],
data=[['A', 1, 'Red'], ['B', 2, 'Green'], ['C', 3, 'Blue']]
)
And a lookup table that looks like this:
lookup = pd.DataFrame(
columns=["Color", "Source", "Lead Days"],
data=[["Red", "Europe", 2],
["Green", "Europe", 3],
["Blue", "US", 1],
["Yellow", "Europe", 2]]
)
How might I add columns "Source" and "Lead Days" to table
by looking up "Color" from lookup
?
A story sometimes helps.
table
has all the items I need to order.
lookup
has where I order them from and how long it takes.
I want to transform table
so that it can show the "Source" and "Lead Days" for each item I need to order.
Here's what the final table should look like:
Note: while I'm sure there's a way to do this with merge, or top level table operations. In the spirit of Minimally Sufficient Pandas, and to avoid the huge kluge that is pandas' over-provisioning of operations, I'd prefer to do it with apply
. Apply is nice because it's easy to consistently reach for apply
in all situations.
Here's my current approach, but it results in the error ValueError: Columns must be same length as key
To me, this makes little sense, since I'm returning a list of length 2 and putting it into two columns. But I'm sure pandas has its reasons for being anti-intuitive here.
lookup_columns = ["Source", "Lead Days"]
table[lookup_columns] = table.apply(
lambda row:
lookup.query('`Color` == "{color}"'.format(color=row["Color"])).loc[:, lookup_columns].values[0]
, axis = 1)
CodePudding user response:
Use result_type="expand"
:
lookup_columns = ["Source", "Lead Days"]
table[lookup_columns] = table.apply(
lambda row:
lookup.query('`Color` == "{color}"'.format(color=row["Color"])).loc[:, lookup_columns].values[0]
, axis=1, result_type="expand")
print(table)
Name Size Color Source Lead Days
0 A 1 Red Europe 2
1 B 2 Green Europe 3
2 C 3 Blue US 1
From the documentation (emphasis mine):
result_type {‘expand’, ‘reduce’, ‘broadcast’, None}, default None
These only act when axis=1 (columns):
‘expand’ : list-like results will be turned into columns.
CodePudding user response:
With apply
, you can do:
>>> pd.concat([table, table['Color'].apply(lambda x: lookup.loc[lookup['Color'] == x, ['Source', 'Lead Days']].squeeze())], axis=1)
Name Size Color Source Lead Days
0 A 1 Red Europe 2
1 B 2 Green Europe 3
2 C 3 Blue US 1
Old answer
Use pd.merge
:
>>> pd.merge(table, lookup, how='left', on='Color')
Name Size Color Source Lead Days
0 A 1 Red Europe 2
1 B 2 Green Europe 3
2 C 3 Blue US 1
CodePudding user response:
I wouldnt' use apply. It is computationally expensive. For your case, join will suffice.
table.set_index('Color').join(lookup.set_index('Color')).reset_index()
Color Name Size Source Lead Days
0 Red A 1 Europe 2
1 Green B 2 Europe 3
2 Blue C 3 US 1