Home > Mobile >  Pandas: Populate rows for same index in a group with different values
Pandas: Populate rows for same index in a group with different values

Time:03-01

Sample df:

In [2004]: df
Out[2004]: 
   index table_name column_name data_type  default  max_length
0      0   f_person      active   integer      NaN         NaN
1      0   f_person        actv   integer      NaN         NaN
2      5   f_person         ssn   varchar      NaN       256.0
3      5   f_person         ssn   varchar      NaN        99.0
4      6   f_person          pl   varchar     10.0       256.0
5      6   f_person          pl    bigint      NaN       256.0
6      8   f_person      prefix   varchar      NaN       256.0
7      8   f_person      prefix   integer      NaN       256.0

For the same index, I want to add a new column schema and populate different values for each row. Number of rows per group will be always <= 2.

Expected Output:

In [2006]: df
Out[2006]: 
   index table_name column_name data_type  default  max_length schema
0      0   f_person      active   integer      NaN         NaN     s1
1      0   f_person        actv   integer      NaN         NaN     s2
2      5   f_person         ssn   varchar      NaN       256.0     s1
3      5   f_person         ssn   varchar      NaN        99.0     s2
4      6   f_person          pl   varchar     10.0       256.0     s1
5      6   f_person          pl    bigint      NaN       256.0     s2
6      8   f_person      prefix   varchar      NaN       256.0     s1
7      8   f_person      prefix   integer      NaN       256.0     s2

I solved it using a for loop, but there must be a better way. Can someone please suggest a more pandaic way?

CodePudding user response:

Assuming you want to populate from a list a defined values:

values = ['s1', 's2']

d = dict(enumerate(values))

df['schema'] = df.groupby('index').cumcount().map(d)

Otherwise, this is already covered in previous questions

output:

   index table_name column_name data_type  default  max_length schema
0      0   f_person      active   integer      NaN         NaN     s1
1      0   f_person        actv   integer      NaN         NaN     s2
2      5   f_person         ssn   varchar      NaN       256.0     s1
3      5   f_person         ssn   varchar      NaN        99.0     s2
4      6   f_person          pl   varchar     10.0       256.0     s1
5      6   f_person          pl    bigint      NaN       256.0     s2
6      8   f_person      prefix   varchar      NaN       256.0     s1
7      8   f_person      prefix   integer      NaN       256.0     s2

CodePudding user response:

Try with groupby.cumcount() which performs a grouped cumulative count and concatenate that with a simple 's':

df['schema'] = 's'   df.groupby('index').cumcount().add(1).astype(str)

df[['index','schema']]
Out[53]: 
   index schema
0      0     s1
1      0     s2
2      5     s1
3      5     s2
4      6     s1
5      6     s2
6      8     s1
7      8     s2

I have a feeling I might be misuderstanding your question, so apologies, if that.

  • Related