Home > other >  How to save multiple values on different rows as a variable or list in a CSV using Python Pandas
How to save multiple values on different rows as a variable or list in a CSV using Python Pandas

Time:10-05

I'm currently trying to iterate through a dataframe/csv and compare the dates of the rows with the same ID. If the dates are different or are a certain time-frame apart I want to create a '1' in another column (not shown) to mark that ID and row/s.

I'm looking to save the DATE values as variables and compare them against other DATE variables with the same ID. If the dates are set amount of time apart I'll create a 1 in another column on the same row.

ID DATE
1 11/11/2011
1 11/11/2011
2 5/05/2011
2 20/06/2011
3 2/04/2011
3 10/08/2011
4 8/12/2011
4 1/02/2012
4 12/03/2012

For this post, I'm mainly looking to save the multiple values as variables or a list. I'm hoping to figure out the rest once this roadblock has been removed.

Here's what I got currently, but I don't think it'll be much help. Currently it iterates through and converts the date strings to dates. Which is what I want to happen AFTER getting a list of all the dates with the same ID value.

import pandas as pd
from datetime import *
filename = 'TestData.csv'

df = pd.read_csv(filename)

print (df.iloc[0,1])

x = 0

for i in df.iloc:

FixDate = df.iloc[x, 1]

d1, m1, y1 = FixDate.split('/')

d1 = int(d1)
m1 = int(m1)
y1 = int(y1)

finaldate = date(y1, m1, d1)

print(finaldate)

x = x   1

Any help is appreciated, thank you!

CodePudding user response:

In pandas for performance is best avoid loops, if need new column tested if same values in DATE per groups use GroupBy.transform with DataFrameGroupBy.nunique and then compare values by 1:

df = pd.read_csv(filename)

df['test'] = df.groupby('ID')['DATE'].transform('nunique').eq(1).astype(int)
print (df)
   ID        DATE  test
0   1  11/11/2011     1
1   1  11/11/2011     1
2   2   5/05/2011     0
3   2  20/06/2011     0
4   3   2/04/2011     0
5   3  10/08/2011     0
6   4   8/12/2011     0
7   4   1/02/2012     0
8   4  12/03/2012     0

If need filter matched rows:

mask = df.groupby('ID')['DATE'].transform('nunique').eq(1)

df1 = df[mask]
print (df1)
   ID        DATE
0   1  11/11/2011
1   1  11/11/2011

In last step convert values to lists:

IDlist  = df1['ID'].tolist()
  • Related