Editing dataframe and exporting to excel in pandas-CodePudding

Following is a simplified example of data I have in csv file format. Data from sensor A extends from column 1 to column 3 & sensor B from column 4 onwards.

Tool	Time A[s]	Val1	time B [s]	Val11	Val22
ZED112	0	0	0	2.406871364	2.825639219
ZED112	0	0	0	2.598295494	3.047110501
ZED112	0	0	0	2.532422572	2.708097541
ZED112	0	0	0	2.780893807	2.678502954
ZED112	0	0	0	3.202879068	3.062976324
ZED112	0	973.9316	0	2.784906006	2.642870937
ZED112	0	985.1649	0	214.2264642	98.62286259
ZED112	0	992.4298	0	414.5566632	217.3663605
ZED112	0	995.9097	0	599.8237339	315.2468997
ZED112	0	997.9854	0.3	643.9497261	399.9559597
ZED112	0	997.7411	0.8	524.0249116	328.1903351
ZED112	0	991.8193
ZED112	0	999.5726
ZED112	0.1	999.2064
ZED112	0.2	999.939
ZED112	0.3	999.6948
ZED112	0.4	996.2759
ZED112	0.5	999.1453
ZED112	0.6	999.8779
ZED112	0.7	1000.427
ZED112	0.8	1000.549

1)How can I delete rows in column 1 to 3 which correspond to Time A[s]==0, while keeping data from column 4 onwards intact.
2) The following is just a snippet of data and actual data is hundreds of columns long. What is the most efficient way of exporting to excel after performing action 1?
3) My final end goal after exporting to excel is to rearrange data like below(currently doing manually), such that data of same time is on same row. I don't know how to accomplish this right now. Any suggestions?

Tool	Time A[s]	Val1	time B [s]	Val11	Val22
ZED112	0.1	999.2064
ZED112	0.2	999.939
ZED112	0.2	999.939
ZED112	0.3	999.6948	0.3	643.9497261	399.9559597
ZED112	0.4	996.2759
ZED112	0.5	999.1453
ZED112	0.6	999.8779
ZED112	0.7	1000.427
ZED112	0.8	1000.549	0.8	524.0249116	328.1903351

My solution for 1 & 2 is below, but this isn't giving me expected result. Also excel export of "real data" is taking extremely long time.

    import pandas as pd
    alldata= pd.read_csv(r"C:\Users\max\Desktop\Input data.csv");
    df2 = alldata.iloc[: , 3:-1].copy();
    df1 = alldata.drop(alldata.columns[3:-1], axis=1);
    newdata=df1[df1[' Time sensor A[s]'] != 0];
    result=pd.concat([newdata,df2], axis=1, join="inner");
    result.to_excel(r"C:\Users\max\Desktop\data2.xlsx",)

CodePudding user response：

I think you might want to merge your data and then drop the rows that have 0 for Time A[s]. I also changed the name of your columns for testing because spaces in pandas column names are very problematic:

import pandas as pd
import numpy as np

# Create some sample data
df = pd.DataFrame(columns=["Tool"," Time A[s]","Val1"," time B[s]","Val11","Val22"],
                  data=[["ZED112"   ,0      ,0      ,0      ,2.406871364    ,2.825639219]
                        ,["ZED112"  ,0      ,0      ,0      ,2.598295494    ,3.047110501]
                        ,["ZED112"  ,0.1        ,999.2064, "", "", ""]
                        ,["ZED112"  ,0.2        ,999.939, "", "", ""]
                        ,["ZED112"  ,0.3        ,999.6948, "", "", ""]  
                        ,["ZED112"  ,0      ,997.9854   ,0.3        ,643.9497261    ,399.9559597],
                        ["ZED112"   ,0      ,997.9854   ,0.3        ,643.9497261    ,399.9559597]],
)
# df= pd.read_csv(r"C:\Users\max\Desktop\Input data.csv");
df

Output:

    Tool    Time A[s]   Val1     time B[s]  Val11   Val22
0   ZED112  0.0          0.0000    0        2.406871    2.825639
1   ZED112  0.0          0.0000    0        2.598295    3.047111
2   ZED112  0.1          999.2064           
3   ZED112  0.2          999.9390           
4   ZED112  0.3          999.6948           
5   ZED112  0.0          997.9854  0.3      643.949726  399.95596
6   ZED112  0.0          997.9854  0.3      643.949726  399.95596

# Change the name of columns for merging
df1 = df.iloc[:,0:2].rename(columns={" Time A[s]":"time"})
df2 = df.iloc[:,3:].rename(columns={" time B[s]":"time"})

# You can't merge on floats so change the datatype to str
df1 = df1.astype({'time':'str'})
df2 = df2.astype({'time':'str'})

# Merge on the time column
results = pd.merge(df1,df2, on='time')

results['time']=results['time'].replace(0, np.nan)
results.dropna(axis=0, inplace=True)
results

Output:

Tool    time    Val11   Val22
0   ZED112  0.3 643.949726  399.95596
1   ZED112  0.3 643.949726  399.95596

CodePudding user response：

In my opinion, you can slice the data into 2 parts and easily fulfill your requirement. Here is how:

#reading the data
import pandas as pd
df= pd.read_csv(r"C:\Users\max\Desktop\Input data.csv");

#copying the first & last 3 columns in variables dfA and dfB respectively
dfA=df.iloc[:,:3]
dfB= df.iloc[:,3:]

#droping the values corresponding to Time A[s]==0
dfA= dfA[dfA['Time A[s]']!=0].reset_index(drop=True)

#merging dfA and dfB back again as per your final requirement
result=pd.merge(dfA,dfB, how='left',left_on=['Time A[s]'],right_on=['time B [s]'])

#exporting the data to excel
result.to_excel(r"C:\Users\max\Desktop\data2.xlsx",)

You can print the variables dfA,dfB and result after each step to get a better clearity.