I have a dataframe df
:
Number | Master |
---|---|
1 | Apple |
2 | Orange |
3 | Pineapple |
4 | Strawberrry |
5 | Blueberry |
6 | Plums |
7 | Cherry |
8 | Dragonfruit |
9 | Iceapple |
10 | Litchie |
This is just a sample df
. original dataframe has 10000 rows.
I want to denote Apple,Pineaple,Orange & Strawberry by 1,2,3,4 encoding( which happens to be the fruits with the top 4 value counts in the df
) and rest all fruits as one encoding(5). how to achieve this
expected :-
Number | Master |
---|---|
1 | 1 |
2 | 3 |
3 | 2 |
4 | 4 |
5 | 5 |
6 | 5 |
7 | 5 |
8 | 5 |
9 | 5 |
10 | 5 |
CodePudding user response:
Create dictionary for top N values by counts in column Master
by Series.value_counts
with Series.head
and use them for Series.map
with replace not matched values to N 1
in Series.fillna
:
N = 4
d = {v: k 1 for k, v in enumerate(df['Master'].value_counts().head(N).index)}
print (d)
df['Master'] = df['Master'].map(d).fillna(N 1).astype(int)
If you have list of top values by list:
L = ['Apple','Pineapple','Orange','Strawberrry']
d = {v: k 1 for k, v in enumerate(L)}
print (d)
{'Apple': 1, 'Pineapple': 2, 'Orange': 3, 'Strawberrry': 4}
df['Master'] = df['Master'].map(d).fillna(len(L) 1).astype(int)
print (df)
Number Master
0 1 1
1 2 3
2 3 2
3 4 4
4 5 5
5 6 5
6 7 5
7 8 5
8 9 5
9 10 5
CodePudding user response:
You can use dict
and Series.map
and fillna(5)
for keys that don't exist in dct
.
dct = {'Apple':1, 'Pineapple':2,'Orange':3 , 'Strawberrry':4}
df['Master'] = df['Master'].map(dct).fillna(5).astype(int)
print(df)
Number Master
0 1 1
1 2 3
2 3 2
3 4 4
4 5 5
5 6 5
6 7 5
7 8 5
8 9 5
9 10 5
CodePudding user response:
One way would also be to create a mapping function and apply it to your column:
dic = { 'col1' : [1,2,3,4,5,6], 'fruits' : ['apple', 'banana', 'tomato','something else', 'apple', 'banana'] }
df = pd.DataFrame.from_dict(dic)
def mapping(row):
if row == "apple":
result = 1
elif row == "banana":
result = 2
else:
result = 5
return result
df['fruits'] = df['fruits'].apply(mapping)
CodePudding user response:
you can use converters option of pandas to read this csv file.
#creating functions to rename the columns
converter_dict = {'Apple': '1', 'Orange': '2', 'Pineapple': '3', 'Strawberrry': '4'}
def converter_func(x):
name = converter_dict.keys()
if x not in name:
x = x.replace(x,'5')
else:
x = x.replace(x,converters_dict[x])
return x
df = pd.read_csv('file.csv', converters = {"Master": converter_func})
Output:
Number Master
1 1
2 2
3 3
4 4
5 5
6 5
7 5
8 5
9 5
10 5
for more details on converters parameter you can read medium website or in Official Documentation