I have a Pandas DataFrame that looks something like this:
import pandas as pd
df = pd.DataFrame({
"class":["a","b","c","b","a","b","c","c","a","c","b","c","c","a"],
"country":["aus","usa","fra","aus","ger","usa","usa","ger","aus","usa","fra","fra","usa","aus"],
"category":["w","x","y","x","z","z","w","x","y","y","x","z","y","z"]
})
class country category
0 a aus w
1 b usa x
2 c fra y
3 b aus x
4 a ger z
5 b usa z
6 c usa w
7 c ger x
8 a aus y
9 c usa y
10 b fra x
11 c fra z
12 c usa y
13 a aus z
Furthermore, I have a dictionary for each class
, each of which is a nested dictionary containing a number
for each category
for each country
:
class_a_numbers = {'aus': {'w': 100, 'x': 50, 'y': 80, 'z': 10},
'fra': {'w': 120, 'x': 60, 'y': 90, 'z': 10},
'ger': {'w': 105, 'x': 75, 'y': 85, 'z': 0},
'usa': {'w': 140, 'x': 100, 'y': 110, 'z': 50}}
class_b_numbers = {'aus': {'w': 1000, 'x': 500, 'y': 800, 'z': 100},
'fra': {'w': 1200, 'x': 600, 'y': 900, 'z': 100},
'ger': {'w': 1050, 'x': 750, 'y': 850, 'z': 0},
'usa': {'w': 1400, 'x': 1000, 'y': 1100, 'z': 500}}
class_c_numbers = {'aus': {'w': 10, 'x': 5, 'y': 9, 'z': 2},
'fra': {'w': 13, 'x': 5, 'y': 8, 'z': 1},
'ger': {'w': 9, 'x': 6, 'y': 3, 'z': 1},
'usa': {'w': 12, 'x': 12, 'y': 10, 'z': 3}}
I would like to create a number
column in df
and populate it with the appropriate number from the appropriate dictionary, such that the outcome will look like this:
class country category number
0 a aus w 100
1 b usa x 1000
2 c fra y 8
3 b aus x 500
4 a ger z 0
5 b usa z 500
6 c usa w 12
7 c ger x 6
8 a aus y 80
9 c usa y 10
10 b fra x 600
11 c fra z 1
12 c usa y 10
13 a aus z 10
My real dataset is quite large, so efficiency is an important factor. Normally, I would use map
from a dictionary, but I can't figure out how to make it work with these three nested dictionaries.
CodePudding user response:
df.apply
to the rescue:
numbers = {
'a': class_a_numbers,
'b': class_b_numbers,
'c': class_c_numbers,
}
df["number"] = df.apply(lambda row: numbers[row["class"]][row["country"]][row["category"]], axis=1)
CodePudding user response:
For improve performance join all dictionaries together, create one dictioanry with tuple key and convert to Series
with MultiIndex
, so possible add new column by DataFrame.join
:
d = {'a': class_a_numbers,'b': class_b_numbers,'c': class_c_numbers,}
d= {(k2,k, k1): v1 for k2, v2 in d.items() for k, v in v2.items() for k1, v1 in v.items()}
df = df.join(pd.Series(d, name='number'), on=['class','country','category'])
print (df)
class country category number
0 a aus w 100
1 b usa x 1000
2 c fra y 8
3 b aus x 500
4 a ger z 0
5 b usa z 500
6 c usa w 12
7 c ger x 6
8 a aus y 80
9 c usa y 10
10 b fra x 600
11 c fra z 1
12 c usa y 10
13 a aus z 10
Perfomance: If not use loop by apply
or iterrows
is increased performance:
df = pd.DataFrame({
"class":["a","b","c","b","a","b","c","c","a","c","b","c","c","a"],
"country":["aus","usa","fra","aus","ger","usa","usa","ger","aus","usa",
"fra","fra","usa","aus"],
"category":["w","x","y","x","z","z","w","x","y","y","x","z","y","z"]
})
#[140000 rows x 3 columns]
df = pd.concat([df] * 10000, ignore_index=True)
class_a_numbers = {'aus': {'w': 100, 'x': 50, 'y': 80, 'z': 10},
'fra': {'w': 120, 'x': 60, 'y': 90, 'z': 10},
'ger': {'w': 105, 'x': 75, 'y': 85, 'z': 0},
'usa': {'w': 140, 'x': 100, 'y': 110, 'z': 50}}
class_b_numbers = {'aus': {'w': 1000, 'x': 500, 'y': 800, 'z': 100},
'fra': {'w': 1200, 'x': 600, 'y': 900, 'z': 100},
'ger': {'w': 1050, 'x': 750, 'y': 850, 'z': 0},
'usa': {'w': 1400, 'x': 1000, 'y': 1100, 'z': 500}}
class_c_numbers = {'aus': {'w': 10, 'x': 5, 'y': 9, 'z': 2},
'fra': {'w': 13, 'x': 5, 'y': 8, 'z': 1},
'ger': {'w': 9, 'x': 6, 'y': 3, 'z': 1},
'usa': {'w': 12, 'x': 12, 'y': 10, 'z': 3}}
d = {'a': class_a_numbers,'b': class_b_numbers,'c': class_c_numbers,}
numbers = {
'a': class_a_numbers,
'b': class_b_numbers,
'c': class_c_numbers,
}
print (df)
In [119]: %timeit df.join(pd.Series({(k2,k, k1): v1 for k2, v2 in d.items() for k, v in v2.items() for k1, v1 in v.items()}, name='number'), on=['class','country','category'])
36.4 ms ± 288 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [120]: %timeit df["number"] = df.apply(lambda row: numbers[row["class"]][row["country"]][row["category"]], axis=1)
2.18 s ± 121 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
CodePudding user response:
do something like this.
#make sure u spin this code around a form loop from range(0,len(df))
#this code is for class a dict only make same for class b and c will be same
#just remove a and add 'b' , 'c'.
if df["class"]== a:# checks if class if a
# if class is a then go to class_a_numbers dict
# Then it will search for the key value(country)
# Which is in your df dataframe.
unpacked_dict = class_a_numbers[df.loc[i,"country]]
#now your unpacked_dict contains a dict
# dict = {'w': 100, 'x': 50, 'y': 80, 'z': 10}
# now we search the key whose value match df["category"]
#here i is from the for loop form 0 to len(df)
number_val = unpacked_dict[df.loc[i,"category"]
df.loc[i,"number"] =number_val
complete the code and check if this works if it did good if not ping me in comments will check it. and comments are just to help you understand ever point code is small.