Trying to split 3 different datastets from 1 column-CodePudding

I currently have a CSV column that contains three different datasets (P,T,C) looking like: P= 0.456T= 12.659C=39.285

This is just one out of many records but all the data is separated the same with P,T,C. I have posted the dataframe head below:

dataframe Head

I want to separate each individual component (P,T,C) and create a time series analysis.

Currently, I am using str.split() command but it does not seem to work.

My code is as follows:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import read_csv


df = pd.read_csv('20220519 - test data.csv')
df.columns = ['node_letter','wind_Dir','Wind Speed','Corrected_Dir','GPS_Corr_Speed','Pressure','Relative_Humidity','Temperature','Dewpoint','Prec_Total','Precip_Intensity','Solar_Radiation','GPS_Data','Date/Time','Supply_Voltage','satus','ETX','CTD_Data']



df[['Pressure','Temperature','Conducivity']] = df['CTD_Data'].str.split('=', expand=True)

Dataframe name: df

Column name: df['CTD_Data']

Do I need to create a seperate dataframe for the aforementioned column?

Any help would be greatly appreciated

CodePudding user response：

try this:

s='P=  0.456T= 12.659C=39.285'
p = re.findall('[PTC]=\ ?[0-9.] ',s.replace(' ', ''))
p
>['P= 0.456', 'T= 12.659', 'C=39.285']


[float(r.split('=')[1]) for r in p]
>[0.456, 12.659, 39.285]

CodePudding user response：

A naive, manual and probably not so efficient (for large datasets) way:

df = pd.DataFrame()
df['problematic_column'] = [('P=  0.456T= 12.659C=39.285')]
df['P'] = float(df['problematic_column'].str.split('=').str[1].str.split('T').str[0])
df['T'] = float(df['problematic_column'].str.split('=').str[2].str.split('C').str[0])
df['C'] = float(df['problematic_column'].str.split('=').str[3])

There must be more elegant solutions using regex out there, but that's probably for you to investigate. This will get the job done.