Following is a simplified example of data I have in csv file format. Data from sensor A extends from column 1 to column 3 & sensor B from column 4 onwards.
Tool | Time A[s] | Val1 | time B [s] | Val11 | Val22 |
---|---|---|---|---|---|
ZED112 | 0 | 0 | 0 | 2.406871364 | 2.825639219 |
ZED112 | 0 | 0 | 0 | 2.598295494 | 3.047110501 |
ZED112 | 0 | 0 | 0 | 2.532422572 | 2.708097541 |
ZED112 | 0 | 0 | 0 | 2.780893807 | 2.678502954 |
ZED112 | 0 | 0 | 0 | 3.202879068 | 3.062976324 |
ZED112 | 0 | 973.9316 | 0 | 2.784906006 | 2.642870937 |
ZED112 | 0 | 985.1649 | 0 | 214.2264642 | 98.62286259 |
ZED112 | 0 | 992.4298 | 0 | 414.5566632 | 217.3663605 |
ZED112 | 0 | 995.9097 | 0 | 599.8237339 | 315.2468997 |
ZED112 | 0 | 997.9854 | 0.3 | 643.9497261 | 399.9559597 |
ZED112 | 0 | 997.7411 | 0.8 | 524.0249116 | 328.1903351 |
ZED112 | 0 | 991.8193 | |||
ZED112 | 0 | 999.5726 | |||
ZED112 | 0.1 | 999.2064 | |||
ZED112 | 0.2 | 999.939 | |||
ZED112 | 0.3 | 999.6948 | |||
ZED112 | 0.4 | 996.2759 | |||
ZED112 | 0.5 | 999.1453 | |||
ZED112 | 0.6 | 999.8779 | |||
ZED112 | 0.7 | 1000.427 | |||
ZED112 | 0.8 | 1000.549 |
1)How can I delete rows in column 1 to 3 which correspond to Time A[s]==0, while keeping data from column 4 onwards intact.
2) The following is just a snippet of data and actual data is hundreds of columns long. What is the most efficient way of exporting to excel after performing action 1?
3) My final end goal after exporting to excel is to rearrange data like below(currently doing manually), such that data of same time is on same row. I don't know how to accomplish this right now. Any suggestions?
Tool | Time A[s] | Val1 | time B [s] | Val11 | Val22 |
---|---|---|---|---|---|
ZED112 | 0.1 | 999.2064 | |||
ZED112 | 0.2 | 999.939 | |||
ZED112 | 0.2 | 999.939 | |||
ZED112 | 0.3 | 999.6948 | 0.3 | 643.9497261 | 399.9559597 |
ZED112 | 0.4 | 996.2759 | |||
ZED112 | 0.5 | 999.1453 | |||
ZED112 | 0.6 | 999.8779 | |||
ZED112 | 0.7 | 1000.427 | |||
ZED112 | 0.8 | 1000.549 | 0.8 | 524.0249116 | 328.1903351 |
My solution for 1 & 2 is below, but this isn't giving me expected result. Also excel export of "real data" is taking extremely long time.
import pandas as pd
alldata= pd.read_csv(r"C:\Users\max\Desktop\Input data.csv");
df2 = alldata.iloc[: , 3:-1].copy();
df1 = alldata.drop(alldata.columns[3:-1], axis=1);
newdata=df1[df1[' Time sensor A[s]'] != 0];
result=pd.concat([newdata,df2], axis=1, join="inner");
result.to_excel(r"C:\Users\max\Desktop\data2.xlsx",)
CodePudding user response:
I think you might want to merge your data and then drop the rows that have 0 for Time A[s]
. I also changed the name of your columns for testing because spaces in pandas column names are very problematic:
import pandas as pd
import numpy as np
# Create some sample data
df = pd.DataFrame(columns=["Tool"," Time A[s]","Val1"," time B[s]","Val11","Val22"],
data=[["ZED112" ,0 ,0 ,0 ,2.406871364 ,2.825639219]
,["ZED112" ,0 ,0 ,0 ,2.598295494 ,3.047110501]
,["ZED112" ,0.1 ,999.2064, "", "", ""]
,["ZED112" ,0.2 ,999.939, "", "", ""]
,["ZED112" ,0.3 ,999.6948, "", "", ""]
,["ZED112" ,0 ,997.9854 ,0.3 ,643.9497261 ,399.9559597],
["ZED112" ,0 ,997.9854 ,0.3 ,643.9497261 ,399.9559597]],
)
# df= pd.read_csv(r"C:\Users\max\Desktop\Input data.csv");
df
Output:
Tool Time A[s] Val1 time B[s] Val11 Val22
0 ZED112 0.0 0.0000 0 2.406871 2.825639
1 ZED112 0.0 0.0000 0 2.598295 3.047111
2 ZED112 0.1 999.2064
3 ZED112 0.2 999.9390
4 ZED112 0.3 999.6948
5 ZED112 0.0 997.9854 0.3 643.949726 399.95596
6 ZED112 0.0 997.9854 0.3 643.949726 399.95596
# Change the name of columns for merging
df1 = df.iloc[:,0:2].rename(columns={" Time A[s]":"time"})
df2 = df.iloc[:,3:].rename(columns={" time B[s]":"time"})
# You can't merge on floats so change the datatype to str
df1 = df1.astype({'time':'str'})
df2 = df2.astype({'time':'str'})
# Merge on the time column
results = pd.merge(df1,df2, on='time')
results['time']=results['time'].replace(0, np.nan)
results.dropna(axis=0, inplace=True)
results
Output:
Tool time Val11 Val22
0 ZED112 0.3 643.949726 399.95596
1 ZED112 0.3 643.949726 399.95596
CodePudding user response:
In my opinion, you can slice the data into 2 parts and easily fulfill your requirement. Here is how:
#reading the data
import pandas as pd
df= pd.read_csv(r"C:\Users\max\Desktop\Input data.csv");
#copying the first & last 3 columns in variables dfA and dfB respectively
dfA=df.iloc[:,:3]
dfB= df.iloc[:,3:]
#droping the values corresponding to Time A[s]==0
dfA= dfA[dfA['Time A[s]']!=0].reset_index(drop=True)
#merging dfA and dfB back again as per your final requirement
result=pd.merge(dfA,dfB, how='left',left_on=['Time A[s]'],right_on=['time B [s]'])
#exporting the data to excel
result.to_excel(r"C:\Users\max\Desktop\data2.xlsx",)
You can print the variables dfA,dfB and result after each step to get a better clearity.