Home > Blockchain >  Trying to split a series in pandas into multiple columns by values greater than zero
Trying to split a series in pandas into multiple columns by values greater than zero

Time:09-14

I have data from a log file in which products were tested periodically. The data is sampled at 125 samples/sec. Products are tested a few minutes apart. This results in a lot of zeros between the data I want to analyze. The data I want to keep is greater than 0.0.

I created a DataFrame by ...

df = pd.read_csv('file.log')

This yields a single column of data. Mostly zeros, but there are periodic groupings of values greater 0.0 which represent test data.

data
0   0.0
1   0.0
2   0.0
3   0.0
4   0.0
... ...
34527   0.0
34528   0.0
34529   0.0
34530   0.0
34531   0.0
34532 rows × 1 columns

I want to find each test sample in the data and create either a groupby object or a new dataframe with each column representing a test sample ['test1', 'test2', etc.]. Somehow I need to iterate through the data, identify a group of test data, and give it a unique label. I've got to imagine this has been solved already, but I've been unsuccessful at finding a similar solution.

Any suggestions would be greatly appreciated.

Edit: Here is an image of the data, if that helps.

enter image description here

CodePudding user response:

To identify all of your rows non-zero entries in your data, you can do df[df['data'] != 0]. This builds a copy of your old dataframe, but only contains rows where the data column != 0. From here, you can make a new column to give each sample a label or you can use iterrows to iterate through each row and apply what you need to.

CodePudding user response:

This is ugly and hacky, but it should give you the result you're looking for:

import pandas as pd

# Test data
test_data = [0,0,0,1,2,1,0,0,1,3,2,0,0,0,3,4,3]
df = pd.DataFrame(test_data, columns=['data'])

# Global variables
zero_counter = 9999
test_counter = 0
group = 1

# Function to identify groups
def group_test_data(n):
    global zero_counter, test_counter, group
    if n == 0:
        if zero_counter == 0:
            group  = 1
        test_counter = 0
        zero_counter  = 1
        return 0
    else:
        zero_counter = 0
        test_counter  = 1
        return group


# Apply function to each row
df['group'] = df.apply(lambda x: group_test_data(x['data']), axis=1)
print(df)

Now it's as simple as grouping by group and applying whatever calculation/transformation you'd like.

  • Related