I'm looking for help for a way to loop through pandas DF changing the rows from the current presented object datatype of for example '1.15m' to 1150000 and also change the datatype to an integer.
This is what I have so far but it doesnt seem to be picking up the 'm' in the object.
int_cols = ['Avg. Likes', 'Posts', 'New Post Avg. Likes','Total Likes' ]
for c in int_cols:
if 'm' in db[c]:
db[c] = db[c].apply(lambda x: float(x.strip('m'))*1000000)
db[c] = db[c].astype('int')
elif 'k' in db[c]:
db[c] = db[c].apply(lambda x: float(x.strip('k'))*1000)
db[c] = db[c].astype('int')
elif 'b' in db[c]:
db[c] = db[c].apply(lambda x: float(x.strip('b'))*1000000000)
db[c] = db[c].astype('int')
else:
continue
Edit: adding sample data
db.head(3)
|Rank | Channel Info | Influence Score | Followers | Avg. Likes | Posts |60-Day Eng Rate | New Post Avg. Likes | Total Likes | Country Or Region|
|:---:|:------------:|:----------------:|:---------:|:----------:|:------:|:---------------:|:-------------------:|:------------:|:----------------:|
|1 | cristiano |92 |485200000.0|8.7m | 3.4k |0.013 |6.3m |29.1b |Spain |
|2 | kyliejenner |91 |370700000.0|8.2m | 7.0k |0.014 |5.0m |57.4b |United States |
|3 | leomessi |90 |363900000.0|6.7m | 915 |0.010 |3.5m |6.1b |NaN |
CodePudding user response:
here is one way to do it
int_cols = ['Avg. Likes', 'Posts', 'New Post Avg. Likes','Total Likes' ]
int_cols
# create a mapping of the suffixes for multipiers
m={'m': 1000000.0, 'k': 1000.0, 'b': 1000000000.0}
m
# remove digits and map to the dictionary
# then multiply with the numeric part
df[int_cols] = (df[int_cols].apply(lambda x:
(x.replace('[\d\.]','' , regex=True).map(m).fillna(1.).mul(
x.replace('[m|b|k]','', regex=True).fillna(1.).astype(float)))
, axis=1))
df
Avg. Likes Posts New Post Avg. Likes Total Likes
0 8.7 3.4 6.3 29.1
1 8.2 7.0 5.0 57.4
2 6.7 915.0 3.5 6.1
3 6.1 1.9 1.7 11.4
4 1.8 6.8 932.0 12.6
... ... ... ... ...
195 680.6 4.6 305.7 3.1
196 2.2 1.4 2.1 3.0
197 227.8 4.2 103.2 955.9
198 193.3 865.0 82.6 167.2
199 382.5 3.8 128.2 1.5