How to save multiple values on different rows as a variable or list in a CSV using Python Pandas-CodePudding

I'm currently trying to iterate through a dataframe/csv and compare the dates of the rows with the same ID. If the dates are different or are a certain time-frame apart I want to create a '1' in another column (not shown) to mark that ID and row/s.

I'm looking to save the DATE values as variables and compare them against other DATE variables with the same ID. If the dates are set amount of time apart I'll create a 1 in another column on the same row.

ID	DATE
1	11/11/2011
1	11/11/2011
2	5/05/2011
2	20/06/2011
3	2/04/2011
3	10/08/2011
4	8/12/2011
4	1/02/2012
4	12/03/2012

For this post, I'm mainly looking to save the multiple values as variables or a list. I'm hoping to figure out the rest once this roadblock has been removed.

Here's what I got currently, but I don't think it'll be much help. Currently it iterates through and converts the date strings to dates. Which is what I want to happen AFTER getting a list of all the dates with the same ID value.

import pandas as pd
from datetime import *
filename = 'TestData.csv'

df = pd.read_csv(filename)

print (df.iloc[0,1])

x = 0

for i in df.iloc:

FixDate = df.iloc[x, 1]

d1, m1, y1 = FixDate.split('/')

d1 = int(d1)
m1 = int(m1)
y1 = int(y1)

finaldate = date(y1, m1, d1)

print(finaldate)

x = x   1

Any help is appreciated, thank you!

CodePudding user response：

In pandas for performance is best avoid loops, if need new column tested if same values in DATE per groups use GroupBy.transform with DataFrameGroupBy.nunique and then compare values by 1:

df = pd.read_csv(filename)

df['test'] = df.groupby('ID')['DATE'].transform('nunique').eq(1).astype(int)
print (df)
   ID        DATE  test
0   1  11/11/2011     1
1   1  11/11/2011     1
2   2   5/05/2011     0
3   2  20/06/2011     0
4   3   2/04/2011     0
5   3  10/08/2011     0
6   4   8/12/2011     0
7   4   1/02/2012     0
8   4  12/03/2012     0

If need filter matched rows:

mask = df.groupby('ID')['DATE'].transform('nunique').eq(1)

df1 = df[mask]
print (df1)
   ID        DATE
0   1  11/11/2011
1   1  11/11/2011

In last step convert values to lists:

IDlist  = df1['ID'].tolist()