Home > Software engineering >  Creating new dataframes using like values in an existing dataframe
Creating new dataframes using like values in an existing dataframe

Time:07-08

So I have a dataframe (df1) of phone records roughly 10k rows long with calls from different phone numbers on the same day and the same phone number on different days. (Example of df1)

Date Number
01/01/2022 1234567891
01/01/2022 1234567892
01/02/2022 1234567891
01/02/2022 1234567893
01/02/2022 1234567892

What I want to do write a short script that will iterate over df1 to group the rows by unique phone number and create a new dataframe for each unique phone number.

Now the kicker is I will have to do this periodically do df1 will fluctuate in length and content so simply sorting df1 and assigning rows 1-10 to df2 and 11-33 to df3 wont work.

So far I have only come up with a way to isolate each number 1 at a time manually

df2= df1[df['Number'].isin([1234567891])]

CodePudding user response:

You can extract all unique phonenumbers from your dataframe into a list:

numbers = df['Number'].unique()

Now you can iterate over this list and extract the dataframe for each phonenumber. In this example I print the dataframe:

for number in numbers:
    print(df[df['Number'] == number])

CodePudding user response:

Consider following simple example which make use of .groupby

import pandas as pd
df = pd.DataFrame({'user':['A','B','A','A','C'],'value':[5,4,3,2,1]})
grouped = df.groupby('user')
user_df = {}
for user in df.user.unique():
    user_df[user] = grouped.get_group(user)

Now user_df is dict with 3 DataFrames, 1 for each user, so

print(user_df['A'])

gives output

  user  value
0    A      5
2    A      3
3    A      2

and

print(user_df['B'])

gives output

  user  value
1    B      4

and

print(user_df['C'])

gives output

  user  value
4    C      1

If you need to process 1 user per each loop turn do

import pandas as pd
df = pd.DataFrame({'user':['A','B','A','A','C'],'value':[5,4,3,2,1]})
grouped = df.groupby('user')
for user in df.user.unique():
    user_df = grouped.get_group(user) # user_df is now pandas.DataFrame
    print(user, user_df['value'].min(), user_df['value'].max())

gives output

A 2 5
B 4 4
C 1 1
  • Related