Home > Blockchain >  Trying to loop through a pandas df changing all the data type and presentation type
Trying to loop through a pandas df changing all the data type and presentation type

Time:10-24

I'm looking for help for a way to loop through pandas DF changing the rows from the current presented object datatype of for example '1.15m' to 1150000 and also change the datatype to an integer.

This is what I have so far but it doesnt seem to be picking up the 'm' in the object.

int_cols = ['Avg. Likes', 'Posts', 'New Post Avg. Likes','Total Likes' ]

for c in int_cols:
    if 'm' in db[c]:
        db[c] = db[c].apply(lambda x: float(x.strip('m'))*1000000)
        db[c] = db[c].astype('int')
    elif 'k' in db[c]: 
        db[c] = db[c].apply(lambda x: float(x.strip('k'))*1000)
        db[c] = db[c].astype('int')
    elif 'b' in db[c]: 
        db[c] = db[c].apply(lambda x: float(x.strip('b'))*1000000000)
        db[c] = db[c].astype('int')
    else:
        continue

Edit: adding sample data

db.head(3)

|Rank | Channel Info | Influence Score  | Followers | Avg. Likes | Posts  |60-Day Eng Rate  | New Post Avg. Likes | Total Likes  | Country Or Region|
|:---:|:------------:|:----------------:|:---------:|:----------:|:------:|:---------------:|:-------------------:|:------------:|:----------------:|                  
|1    | cristiano    |92                |485200000.0|8.7m        | 3.4k   |0.013            |6.3m                 |29.1b         |Spain             |
|2    | kyliejenner  |91                |370700000.0|8.2m        | 7.0k   |0.014            |5.0m                 |57.4b         |United States     |
|3    | leomessi     |90                |363900000.0|6.7m        | 915    |0.010            |3.5m                 |6.1b          |NaN               |

CodePudding user response:

here is one way to do it

int_cols = ['Avg. Likes', 'Posts', 'New Post Avg. Likes','Total Likes' ]
int_cols

# create a mapping of the suffixes for multipiers
m={'m': 1000000.0, 'k': 1000.0, 'b': 1000000000.0}
m

# remove digits and map to the dictionary
# then multiply with the numeric part
df[int_cols] = (df[int_cols].apply(lambda x: 
                                   (x.replace('[\d\.]','' , regex=True).map(m).fillna(1.).mul( 
                                    x.replace('[m|b|k]','', regex=True).fillna(1.).astype(float))) 
                                   , axis=1))
df
Avg. Likes  Posts   New Post Avg. Likes     Total Likes
0   8.7     3.4     6.3     29.1
1   8.2     7.0     5.0     57.4
2   6.7     915.0   3.5     6.1
3   6.1     1.9     1.7     11.4
4   1.8     6.8     932.0   12.6
...     ...     ...     ...     ...
195     680.6   4.6     305.7   3.1
196     2.2     1.4     2.1     3.0
197     227.8   4.2     103.2   955.9
198     193.3   865.0   82.6    167.2
199     382.5   3.8     128.2   1.5
  • Related