Home > Blockchain >  Adding Rows to a pandas dataframe, but running into index issues when using append
Adding Rows to a pandas dataframe, but running into index issues when using append

Time:12-30

I have a data frame with about 20k rows and about 10 columns. One of the columns has a list in it between len=1 and len=4.

If a list has more than one entry in it, I want to duplicate that row and append it to the bottom. The second table is what I want to be added.

Index Col 1 Col 2 List Col List Len
1 ABC CDE ['String1', 'String2'] 2
2 EFG HIJ ['String3'] 1
Index Col 1 Col 2 List Col List Len
3 ABC CDE ['String2'] 2

and change the first row to

Index Col 1 Col 2 List Col List Len
1 ABC CDE ['String1'] 2

Apologies for the really terrible formatting. When I created a second table to show everything, Stack Overflow viewed it as unformatted code and I couldn't post it.

The index is the automatically generated index and not relevant to my data.

#Finding the max number of list len
max_list_len= max(original_df['List Col'])
#Set my counter variable
number=2
#Begin loop    
while number<=max_list_len:
    #copies relevant rows into new data frame
    dataframe_being_duplicated = original_df.loc[(original_df['List Col']).eq(number)] 
    dataframe_being_duplicated.loc[(original_df['List Len']).eq(number), 'List Col'] = dataframe_being_duplicated.loc[(original_df['List Len']).eq(number)]['List Col'].str[number-1]   
    full_quote_df = pd.concat(full_quote_df,dataframe_being_duplicated,ignore_index=True)
    number =1 #increment number

With this, it throws an error that says "cannot reindex on an axis with duplicate labels"

I'm also now realizing that this process for rows where the len is larger than 2, because it only duplicates the row once (if it worked as intended).

Is there a faster way to 'unstack' rows like this?

CodePudding user response:

If you really want to duplicate the rows:

out = df.loc[df.index.repeat(df['List Col'].str.len())]

Output:

   Index Col 1 Col 2            List Col  List Len
0      1   ABC   CDE  [String1, String2]         2
0      1   ABC   CDE  [String1, String2]         2
1      2   EFG   HIJ           [String3]         1

Else a classical explode:

out = df.explode('List Col')

Output:

   Index Col 1 Col 2 List Col  List Len
0      1   ABC   CDE  String1         2
0      1   ABC   CDE  String2         2
1      2   EFG   HIJ  String3         1
  • Related