In the code that I present, it reads csv files that are in one folder and prints them in another.In each of these csv contains two columns which were chosen when the dataframe was defined. In column f I need to count how many times this value was above 50.025 and write it in some column
CODE:
import pandas as pd
import numpy as np
import glob
import os
all_files = glob.glob("C:/Users/Gamer/Documents/Colbun/Saturn/*.csv")
file_list = []
for i,f in enumerate(all_files):
df = pd.read_csv(f,header=0,usecols=["t","f"])
df.apply(lambda x: x['f'] > 50.025, axis=1)
df.to_csv(f'C:/Users/Gamer/Documents/Colbun/Saturn2/{os.path.basename(f).split(".")[0]}_ext.csv')
CodePudding user response:
its not logical to store it in some column... since its the summary of entire table..not specific to any row.
df = pd.read_csv(f,header=0,usecols=["t","f"])
how_many_times= len( df[df['f'] > 50.025] )
# you may store it in some unique column but it still doesnt make sense
df['newcol']=how_many_times
CodePudding user response:
To output the count of occurrences in a column according to a particular filter and add it to every row in another column you can simply do the following:
df['cnt'] = df[df['f'] > 50.025]['f'].count()
If you need to use that count to then perform a calculation it would be better to store it in a variable and them perform the calculation while using said variable rather that storing it in your dataframe in an entire column.
In addition I can see from your comments to your question that you also want to remove the index when outputting to CSV so to do that you need to add index=False
to the df.to_csv()
call.
Your code should look something like this:
import pandas as pd
import numpy as np
import glob
import os
all_files = glob.glob("C:/Users/Gamer/Documents/Colbun/Saturn/*.csv")
file_list = []
for i,f in enumerate(all_files):
df = pd.read_csv(f,header=0,usecols=["t","f"])
df['cnt'] = df[df['f'] > 50.025]['f'].count()
df.to_csv(f'C:/Users/Gamer/Documents/Colbun/Saturn2/{os.path.basename(f).split(".")[0]}_ext.csv', index=False)