Trying to split a series in pandas into multiple columns by values greater than zero-CodePudding

I have data from a log file in which products were tested periodically. The data is sampled at 125 samples/sec. Products are tested a few minutes apart. This results in a lot of zeros between the data I want to analyze. The data I want to keep is greater than 0.0.

I created a DataFrame by ...

df = pd.read_csv('file.log')

This yields a single column of data. Mostly zeros, but there are periodic groupings of values greater 0.0 which represent test data.

data
0   0.0
1   0.0
2   0.0
3   0.0
4   0.0
... ...
34527   0.0
34528   0.0
34529   0.0
34530   0.0
34531   0.0
34532 rows × 1 columns

I want to find each test sample in the data and create either a groupby object or a new dataframe with each column representing a test sample ['test1', 'test2', etc.]. Somehow I need to iterate through the data, identify a group of test data, and give it a unique label. I've got to imagine this has been solved already, but I've been unsuccessful at finding a similar solution.

Any suggestions would be greatly appreciated.

Edit: Here is an image of the data, if that helps.

enter image description here

CodePudding user response：

To identify all of your rows non-zero entries in your data, you can do df[df['data'] != 0]. This builds a copy of your old dataframe, but only contains rows where the data column != 0. From here, you can make a new column to give each sample a label or you can use iterrows to iterate through each row and apply what you need to.

CodePudding user response：

This is ugly and hacky, but it should give you the result you're looking for:

import pandas as pd

# Test data
test_data = [0,0,0,1,2,1,0,0,1,3,2,0,0,0,3,4,3]
df = pd.DataFrame(test_data, columns=['data'])

# Global variables
zero_counter = 9999
test_counter = 0
group = 1

# Function to identify groups
def group_test_data(n):
    global zero_counter, test_counter, group
    if n == 0:
        if zero_counter == 0:
            group  = 1
        test_counter = 0
        zero_counter  = 1
        return 0
    else:
        zero_counter = 0
        test_counter  = 1
        return group


# Apply function to each row
df['group'] = df.apply(lambda x: group_test_data(x['data']), axis=1)
print(df)

Now it's as simple as grouping by group and applying whatever calculation/transformation you'd like.