I have data from a log file in which products were tested periodically. The data is sampled at 125 samples/sec. Products are tested a few minutes apart. This results in a lot of zeros between the data I want to analyze. The data I want to keep is greater than 0.0.
I created a DataFrame by ...
df = pd.read_csv('file.log')
This yields a single column of data. Mostly zeros, but there are periodic groupings of values greater 0.0 which represent test data.
data
0 0.0
1 0.0
2 0.0
3 0.0
4 0.0
... ...
34527 0.0
34528 0.0
34529 0.0
34530 0.0
34531 0.0
34532 rows × 1 columns
I want to find each test sample in the data and create either a groupby object or a new dataframe with each column representing a test sample ['test1', 'test2', etc.]
. Somehow I need to iterate through the data, identify a group of test data, and give it a unique label. I've got to imagine this has been solved already, but I've been unsuccessful at finding a similar solution.
Any suggestions would be greatly appreciated.
Edit: Here is an image of the data, if that helps.
CodePudding user response:
To identify all of your rows non-zero entries in your data, you can do
df[df['data'] != 0]
. This builds a copy of your old dataframe, but only contains rows where the data column != 0. From here, you can make a new column to give each sample a label or you can use iterrows to iterate through each row and apply what you need to.
CodePudding user response:
This is ugly and hacky, but it should give you the result you're looking for:
import pandas as pd
# Test data
test_data = [0,0,0,1,2,1,0,0,1,3,2,0,0,0,3,4,3]
df = pd.DataFrame(test_data, columns=['data'])
# Global variables
zero_counter = 9999
test_counter = 0
group = 1
# Function to identify groups
def group_test_data(n):
global zero_counter, test_counter, group
if n == 0:
if zero_counter == 0:
group = 1
test_counter = 0
zero_counter = 1
return 0
else:
zero_counter = 0
test_counter = 1
return group
# Apply function to each row
df['group'] = df.apply(lambda x: group_test_data(x['data']), axis=1)
print(df)
Now it's as simple as grouping by group
and applying whatever calculation/transformation you'd like.