Home > Software design >  How to remove every n-th element from dataframe column in python
How to remove every n-th element from dataframe column in python

Time:09-13

I am using python and have a pandas dataframe imported from a csv. I would like to remove every nth value from each entry in a specific column.

For example:

  • the dataframe column to transform is called:

    "Linestring"

    • each entry has a varying float lengths and goes like this: Linestring(151.420 -33.540, 155.464722 -39.069046, 153.30925678 -33.08364825, 152.0998 -31.8090, 150.539067 -30.57578)
  • each entry has varying lengths

  • I would like to remove say every two elements after each comma giving: Linestring(151.420 -33.540, 153.30925678 -33.08364825, 150.539067 -30.57578)

Attached/linked is a visual guide of what I am after.

Example problem and outcome

Thanks a lot! :)

CodePudding user response:

Try this. I hope it'll help.

df['Linestring'] = df.Linestring.apply(lambda x: ','.join(x.split(',')[::2]) if ','.join(x.split(',')[::2])[-1] == ')' else ','.join(x.split(',')[::2])   ')')

CodePudding user response:

I wrote a function to replace every nth value with None, you can then drop these values leaving you with a new data frame that does not include these dropped cells. I hope this helps.

import pandas as pd

df = pd.DataFrame({'Numbers': [10,15,33,22,17,77,9]}) #a dataframe with column and some values
print(df) #prints the original dataframe

def rmNth(dFrame, col, n): #dataframe, column, delete every 'nth' 
    rows = len(dFrame.axes[0]) #stores the number of rows
    x = n - 1 #used in the while loop

    while (x <= rows): #replace every nth cell with a null value
        dFrame.at[x ,col] = None
        x = x   n #increment x by n

    print(dFrame) #prints the dataframe showing all cells that will be removed are replaced with 'nan'
    newDF = dFrame.dropna() #remove null cells
    newDF.reset_index(drop=True, inplace=True) #reset the index
    return (newDF)

print(rmNth(df, "Numbers", 3)) #print the data frame with every 3rd value removed from the Numbers column 
  • Related