I am using python and have a pandas dataframe imported from a csv. I would like to remove every nth value from each entry in a specific column.
For example:
the dataframe column to transform is called:
"Linestring"
- each entry has a varying float lengths and goes like this:
Linestring(151.420 -33.540, 155.464722 -39.069046, 153.30925678 -33.08364825, 152.0998 -31.8090, 150.539067 -30.57578)
- each entry has a varying float lengths and goes like this:
each entry has varying lengths
I would like to remove say every two elements after each comma giving:
Linestring(151.420 -33.540, 153.30925678 -33.08364825, 150.539067 -30.57578)
Attached/linked is a visual guide of what I am after.
Example problem and outcome
Thanks a lot! :)
CodePudding user response:
Try this. I hope it'll help.
df['Linestring'] = df.Linestring.apply(lambda x: ','.join(x.split(',')[::2]) if ','.join(x.split(',')[::2])[-1] == ')' else ','.join(x.split(',')[::2]) ')')
CodePudding user response:
I wrote a function to replace every nth value with None, you can then drop these values leaving you with a new data frame that does not include these dropped cells. I hope this helps.
import pandas as pd
df = pd.DataFrame({'Numbers': [10,15,33,22,17,77,9]}) #a dataframe with column and some values
print(df) #prints the original dataframe
def rmNth(dFrame, col, n): #dataframe, column, delete every 'nth'
rows = len(dFrame.axes[0]) #stores the number of rows
x = n - 1 #used in the while loop
while (x <= rows): #replace every nth cell with a null value
dFrame.at[x ,col] = None
x = x n #increment x by n
print(dFrame) #prints the dataframe showing all cells that will be removed are replaced with 'nan'
newDF = dFrame.dropna() #remove null cells
newDF.reset_index(drop=True, inplace=True) #reset the index
return (newDF)
print(rmNth(df, "Numbers", 3)) #print the data frame with every 3rd value removed from the Numbers column