To create an Index for the ‘IMEI & IMSI’ pair as follows:
- IMEI index (named A): a number is assigned for each IMEI in sequential order, starting with the integer number 1.
- IMSI index (named as B): a number is assigned for each IMSI, in the sense that: if an IMEI appears more than once, the paired IMSIs will be assigned in sequential order, e.g.: 1, 2, 3...]2
This is what I am trying
import pandas as pd
import numpy as np
df1 = pd.readcsv('file.csv')
df1 = df1[['Imei','Imsi']]
df1 = df1.groupby(['Imei']).count()
df1 = df1.rename(columns = {'Imsi': 'Occurences'})
df1 = df1.sort_values(by=['Occurences'],ascending= False)
df1 = df1.reset_index()
I am able to count occurrences but how can I print them in order starting from 1 like shown in the image
CodePudding user response:
Use cumsum
to compute IMEI Index
and groupby_cumcount
to compute IMSI Index
.
# Setup
df = pd.DataFrame({'Imei': {0: 'A', 1: 'A', 2: 'B', 3: 'B', 4: 'C', 5: 'C', 6: 'C', 7: 'D', 8: 'E'}})
df['IMEI Index'] = df['Imei'].ne(df['Imei'].shift()).cumsum()
df['IMSI Index'] = df.groupby('Imei').cumcount().add(1)
Output:
>>> df
Imei IMEI Index IMSI Index
0 A 1 1
1 A 1 2
2 B 2 1
3 B 2 2
4 C 3 1
5 C 3 2
6 C 3 3
7 D 4 1
8 E 5 1