Assuming I have grouped US states into lists:
list1 = ['TX','FL','OR']
list2 = ['AL','MA','NJ']
list3 = ['WA','PA','GA']
I have over 15 lists covering all 50 states, but for the sake of brevity, I will keep it down to 3 lists.
I have the following dataframe df:
State abbrev produce goods service
Alabama AL 5 10 15
Texas TX 7 17 27
Florida FL 2 73 13
Georgia GA 12 65 20
New Jersey NJ 65 26 11
I want to create a column called final_delivery, which is equal to produce
if the state is in list1
, goods
if the state is in list2
, and service
if the state is in list3
State abbrev produce goods service final_delivery
Alabama AL 5 10 15 10
Texas TX 7 17 27 7
Florida FL 2 73 13 2
Georgia GA 12 65 20 20
New Jersey NJ 65 26 11 26
I have tried the following:
df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['produce'], "")
but it replaces the values to missing when I try to move on to filling the column based on list2 and list3.
CodePudding user response:
You could build an inverted index that points to the function that you want to execute. This easily scales to 15 lists.
Inverted Index
si = pd.Series([0,0,0,1,1,1,2,2,2], index=['TX','FL','OR','AL','MA','NJ','WA','PA','GA']).sort_index()
print(si)
AL 1
FL 0
GA 2
MA 1
NJ 1
OR 0
PA 2
TX 0
WA 2
Function Map
fmap = {
0: lambda x: x['produce'],
1: lambda x: x['goods'],
2: lambda x: x['service']
}
Apply it
df.apply(lambda x: fmap[si.at[x['abbrev']]](x), axis=1)
Result
0 10
1 7
2 2
3 20
4 26
CodePudding user response:
The following helps maintain the integrity of the data frame column:
df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['produce'], "")
df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['goods'], df["final_delivery"])
df["final_delivery"] = np.where(df['abbrev'].isin(list1), df['service'], df["final_delivery"])
CodePudding user response:
We can create the mapping first then doing value position slice
mapping = {z : x for x, y in zip(['produce','goods','service'], [list1,list2,list3]) for z in y}
df['final_delivery'] = df.to_numpy()[np.arange(df.shape[0]),
df.columns.get_indexer(df.abbrev.map(mapping))]
df
Out[533]:
State abbrev produce goods service final_delivery
0 Alabama AL 5 10 15 10
1 Texas TX 7 17 27 7
2 Florida FL 2 73 13 2
3 Georgia GA 12 65 20 20
4 NewJersey NJ 65 26 11 26
CodePudding user response:
Firstly, you can create a column with np.nan
df['final_delivery'] = np.nan
Next, for the sake of simplicity and for loop, I will put your lists into a nested list.
new_list = [list1, list2, list3]
You can adjust the new_list
based on the order of your column, because I was thinking of using the order of df.columns
to facilitate the for loop.
for abb, col in zip(newlist,df.columns[2:-1]):
df['final_delivery'].loc[df['abbrev'].isin(abb)] = df[col].loc[df['abbrev'].isin(abb)]
df
State abbrev produce goods service final_delivery
0 Alabama AL 5 10 15 10.0
1 Texas TX 7 17 27 7.0
2 Florida FL 2 73 13 2.0
3 Georgia GA 12 65 20 20.0
4 New Jersey NJ 65 26 11 26.0