I have a pandas dataframe that looks like this:
X[m] Y[m] Z[m] ... beta newx newy
0 1.439485 0.087100 0.029771 ... 0.063807 1439 87
1 1.439485 0.089729 0.029121 ... 0.065871 1439 89
2 1.439485 0.091992 0.030059 ... 0.067653 1439 91
3 1.439485 0.082073 0.030721 ... 0.059883 1439 82
4 1.439485 0.084095 0.028952 ... 0.061458 1439 84
5 1.439485 0.085937 0.028019 ... 0.062897 1439 85
There are hundreds of thousands of such lines, while I have multiple dataframes like this. X and Y are coordinates on plane (Z is not important) that is moved 45 degrees by the middle to the right. I need to put all points back to the original place, -45 degrees from its location. I have variables newx
and newy
that represent coordinates before changing, I want to edit these two columns to have values of new coordinates. As I know coordinates of middle point, the point itself, the angle of middle-to-point (alpha) and angle middle-to-fixedpoint (beta), I can use approach presented in mathematics SO. I have transformed the code to python like this:
for i in range(len(df)):
if df.iloc[i].alpha == math.pi/2 or df.iloc[i].alpha == 3*math.pi/2:
df.newx[i] = mid
df.newy[i] = int(math.tan(df.iloc[i].beta*(df.iloc[i].x-mid) mid))
elif df.iloc[i].beta == math.pi/2 or df.iloc[i].beta == 3*math.pi/2:
#df.newx[i] = df.iloc[i].x -- this is already set
df.newy[i] = int(math.tan(df.iloc[i].alpha*(mid-df.iloc[i].x) mid))
else:
m0 = math.tan(df.iloc[i].alpha)
m1 = math.tan(df.iloc[i].beta)
x = ((m0 * df.iloc[i].x - m1 * mid) - (df.iloc[i].y - mid)) / (m0 - m1)
df.newx[i] = int(x)
df.newy[i] = int(m0 * (x - df.iloc[i].x) df.iloc[i].y)
Although this does what I need and moves the point to the correct position, the time complexity is enormous and I have too much files to proceed it like this. I know that there are way faster methods, such as serialization, apply and list comprehension. I however can't figure out how to use it with this function.
Here are first 10 lines as dictionary:
{'X[m]': {0: 1.439484727008419, 1: 1.439484727008419, 2: 1.439484727008419, 3: 1.439484727008419, 4: 1.439484727008419, 5: 1.439484727008419, 6: 1.439484727008419, 7: 1.439484727008419, 8: 1.439484727008419, 9: 1.439484727008419}, 'Y[m]': {0: 0.08709958190841899, 1: 0.08972904270841897, 2: 0.091991981408419, 3: 0.08207325440841898, 4: 0.08409548540841899, 5: 0.08593746080841899, 6: 0.09416210370841899, 7: 0.08874029660841898, 8: 0.09168940400841899, 9: 0.09434491760841898}, 'Z[m]': {0: 0.029770726299999998, 1: 0.0291213803, 2: 0.030058834700000002, 3: 0.0307212565, 4: 0.028951926200000002, 5: 0.0280194897, 6: 0.030717188500000003, 7: 0.026446931099999998, 8: 0.0269318204, 9: 0.0273838975}, 'Velocity[ms^-1]': {0: ['-1.67570162e 00', '-2.59946979e-15', '-2.54510192e-15'], 1: ['-1.63915336e 00', '-2.54277343e-15', '-2.48959140e-15'], 2: ['-1.69191790e 00', '-2.62462561e-15', '-2.56973173e-15'], 3: ['-1.72920227e 00', '-2.68246377e-15', '-2.62636012e-15'], 4: ['-1.62961555e 00', '-2.52797767e-15', '-2.47510523e-15'], 5: ['-1.57713342e 00', '-2.44656340e-15', '-2.39539372e-15'], 6: ['-1.72897375e 00', '-2.68210929e-15', '-2.62601305e-15'], 7: ['-1.48862195e 00', '-2.30925809e-15', '-2.26096006e-15'], 8: ['-1.51591396e 00', '-2.35159534e-15', '-2.30241195e-15'], 9: ['-1.54135919e 00', '-2.39106792e-15', '-2.34105888e-15']}, 'L': {0: 0.9582306809661671, 1: 0.9564957485824027, 2: 0.9550059224371557, 3: 0.9615583774318917, 4: 0.9602177760259737, 5: 0.9589987519260235, 6: 0.9535800607266656, 7: 0.9571476500665267, 8: 0.9552049510914844, 9: 0.953460072490227}, 'x': {0: 1439, 1: 1439, 2: 1439, 3: 1439, 4: 1439, 5: 1439, 6: 1439, 7: 1439, 8: 1439, 9: 1439}, 'y': {0: 87, 1: 89, 2: 91, 3: 82, 4: 84, 5: 85, 6: 94, 7: 88, 8: 91, 9: 94}, 'alpha': {0: -0.7215912027987663, 1: -0.719527331916007, 2: -0.7177451479100487, 3: -0.7255156166536015, 4: -0.7239399868865558, 5: -0.7225009735356016, 6: -0.7160308360594005, 7: -0.7203042790640757, 8: -0.7179837655204843, 9: -0.7158861861473951}, 'beta': {0: 0.06380696059868196, 1: 0.06587083148144124, 2: 0.06765301548739955, 3: 0.05988254674384674, 4: 0.06145817651089247, 5: 0.06289718986184667, 6: 0.06936732733804774, 7: 0.0650938843333726, 8: 0.06741439787696402, 9: 0.0695119772500532}, 'newx': {0: 1439, 1: 1439, 2: 1439, 3: 1439, 4: 1439, 5: 1439, 6: 1439, 7: 1439, 8: 1439, 9: 1439}, 'newy': {0: 87, 1: 89, 2: 91, 3: 82, 4: 84, 5: 85, 6: 94, 7: 88, 8: 91, 9: 94}}
CodePudding user response:
Same approach as @Joshua Voskamp, but I still wanted to share
import pandas as pd
import numpy as np
import math
df = pd.DataFrame({'X[m]': {0: 1.439484727008419, 1: 1.439484727008419, 2: 1.439484727008419, 3: 1.439484727008419, 4: 1.439484727008419, 5: 1.439484727008419, 6: 1.439484727008419, 7: 1.439484727008419, 8: 1.439484727008419, 9: 1.439484727008419}, 'Y[m]': {0: 0.08709958190841899, 1: 0.08972904270841897, 2: 0.091991981408419, 3: 0.08207325440841898, 4: 0.08409548540841899, 5: 0.08593746080841899, 6: 0.09416210370841899, 7: 0.08874029660841898, 8: 0.09168940400841899, 9: 0.09434491760841898}, 'Z[m]': {0: 0.029770726299999998, 1: 0.0291213803, 2: 0.030058834700000002, 3: 0.0307212565, 4: 0.028951926200000002, 5: 0.0280194897, 6: 0.030717188500000003, 7: 0.026446931099999998, 8: 0.0269318204, 9: 0.0273838975}, 'Velocity[ms^-1]': {0: ['-1.67570162e 00', '-2.59946979e-15', '-2.54510192e-15'], 1: ['-1.63915336e 00', '-2.54277343e-15', '-2.48959140e-15'], 2: ['-1.69191790e 00', '-2.62462561e-15', '-2.56973173e-15'], 3: ['-1.72920227e 00', '-2.68246377e-15', '-2.62636012e-15'], 4: ['-1.62961555e 00', '-2.52797767e-15', '-2.47510523e-15'], 5: ['-1.57713342e 00', '-2.44656340e-15', '-2.39539372e-15'], 6: ['-1.72897375e 00', '-2.68210929e-15', '-2.62601305e-15'], 7: ['-1.48862195e 00', '-2.30925809e-15', '-2.26096006e-15'], 8: ['-1.51591396e 00', '-2.35159534e-15', '-2.30241195e-15'], 9: ['-1.54135919e 00', '-2.39106792e-15', '-2.34105888e-15']}, 'L': {0: 0.9582306809661671, 1: 0.9564957485824027, 2: 0.9550059224371557, 3: 0.9615583774318917, 4: 0.9602177760259737, 5: 0.9589987519260235, 6: 0.9535800607266656, 7: 0.9571476500665267, 8: 0.9552049510914844, 9: 0.953460072490227}, 'x': {0: 1439, 1: 1439, 2: 1439, 3: 1439, 4: 1439, 5: 1439, 6: 1439, 7: 1439, 8: 1439, 9: 1439}, 'y': {0: 87, 1: 89, 2: 91, 3: 82, 4: 84, 5: 85, 6: 94, 7: 88, 8: 91, 9: 94}, 'alpha': {0: -0.7215912027987663, 1: -0.719527331916007, 2: -0.7177451479100487, 3: -0.7255156166536015, 4: -0.7239399868865558, 5: -0.7225009735356016, 6: -0.7160308360594005, 7: -0.7203042790640757, 8: -0.7179837655204843, 9: -0.7158861861473951}, 'beta': {0: 0.06380696059868196, 1: 0.06587083148144124, 2: 0.06765301548739955, 3: 0.05988254674384674, 4: 0.06145817651089247, 5: 0.06289718986184667, 6: 0.06936732733804774, 7: 0.0650938843333726, 8: 0.06741439787696402, 9: 0.0695119772500532}, 'newx': {0: 1439, 1: 1439, 2: 1439, 3: 1439, 4: 1439, 5: 1439, 6: 1439, 7: 1439, 8: 1439, 9: 1439}, 'newy': {0: 87, 1: 89, 2: 91, 3: 82, 4: 84, 5: 85, 6: 94, 7: 88, 8: 91, 9: 94}})
mid = 0 #not sure what mid value should be
near_threshold = 0.001
alpha_near_half_pi = df.alpha.sub(math.pi/2).abs().le(near_threshold)
alpha_near_three_half_pi = df.alpha.sub(3*math.pi/2).abs().le(near_threshold)
beta_near_half_pi = df.beta.sub(math.pi/2).abs().le(near_threshold)
beta_near_three_half_pi = df.beta.sub(3*math.pi/2).abs().le(near_threshold)
cond1 = alpha_near_half_pi | alpha_near_three_half_pi
cond2 = beta_near_half_pi | beta_near_three_half_pi
cond2 = cond2 & (~cond1) #if cond1 is true, we don't want to do cond2
cond3 = ~(cond1 | cond2) #if neither cond1 nor cond2, then we are in cond3
#Process cond1 rows
c1 = df.loc[cond1]
df.loc[cond1,'newx'] = mid
df.loc[cond1,'newy'] = np.tan(c1.beta*(c1.x-mid) mid)
#Process cond2 rows
c2 = df.loc[cond2]
df.loc[cond2,'newy'] = np.tan(c2.alpha*(mid-c2.x) mid)
#Process cond3 rows
c3 = df.loc[cond3]
m0 = np.tan(c3.alpha)
m1 = np.tan(c3.beta)
# Is this a mistake? always 0?
# |
# --------------
x = ((m0 * c3.x - m1 * mid) - (c3.y - c3.y)) / (m0 - m1)
df.loc[cond3,'newx'] = x.astype(int)
df.loc[cond3,'newy'] = (m0 * (x - c3.x) c3.y).astype(int)
df
CodePudding user response:
TL;DR this was a mess; then I borrowed some ideas from @mitoRibo. Go vote up on their answer. Both of us used a strategy of "selectively calculate newx/newy
using masking, where the mask is equivalent to the if/elif/else
condition provided".
#setup
import pandas as pd
import numpy as np
import math
df = pd.DataFrame({'X[m]': {0: 1.439484727008419, 1: 1.439484727008419, 2: 1.439484727008419, 3: 1.439484727008419, 4: 1.439484727008419, 5: 1.439484727008419, 6: 1.439484727008419, 7: 1.439484727008419, 8: 1.439484727008419, 9: 1.439484727008419}, 'Y[m]': {0: 0.08709958190841899, 1: 0.08972904270841897, 2: 0.091991981408419, 3: 0.08207325440841898, 4: 0.08409548540841899, 5: 0.08593746080841899, 6: 0.09416210370841899, 7: 0.08874029660841898, 8: 0.09168940400841899, 9: 0.09434491760841898}, 'Z[m]': {0: 0.029770726299999998, 1: 0.0291213803, 2: 0.030058834700000002, 3: 0.0307212565, 4: 0.028951926200000002, 5: 0.0280194897, 6: 0.030717188500000003, 7: 0.026446931099999998, 8: 0.0269318204, 9: 0.0273838975}, 'Velocity[ms^-1]': {0: ['-1.67570162e 00', '-2.59946979e-15', '-2.54510192e-15'], 1: ['-1.63915336e 00', '-2.54277343e-15', '-2.48959140e-15'], 2: ['-1.69191790e 00', '-2.62462561e-15', '-2.56973173e-15'], 3: ['-1.72920227e 00', '-2.68246377e-15', '-2.62636012e-15'], 4: ['-1.62961555e 00', '-2.52797767e-15', '-2.47510523e-15'], 5: ['-1.57713342e 00', '-2.44656340e-15', '-2.39539372e-15'], 6: ['-1.72897375e 00', '-2.68210929e-15', '-2.62601305e-15'], 7: ['-1.48862195e 00', '-2.30925809e-15', '-2.26096006e-15'], 8: ['-1.51591396e 00', '-2.35159534e-15', '-2.30241195e-15'], 9: ['-1.54135919e 00', '-2.39106792e-15', '-2.34105888e-15']}, 'L': {0: 0.9582306809661671, 1: 0.9564957485824027, 2: 0.9550059224371557, 3: 0.9615583774318917, 4: 0.9602177760259737, 5: 0.9589987519260235, 6: 0.9535800607266656, 7: 0.9571476500665267, 8: 0.9552049510914844, 9: 0.953460072490227}, 'x': {0: 1439, 1: 1439, 2: 1439, 3: 1439, 4: 1439, 5: 1439, 6: 1439, 7: 1439, 8: 1439, 9: 1439}, 'y': {0: 87, 1: 89, 2: 91, 3: 82, 4: 84, 5: 85, 6: 94, 7: 88, 8: 91, 9: 94}, 'alpha': {0: -0.7215912027987663, 1: -0.719527331916007, 2: -0.7177451479100487, 3: -0.7255156166536015, 4: -0.7239399868865558, 5: -0.7225009735356016, 6: -0.7160308360594005, 7: -0.7203042790640757, 8: -0.7179837655204843, 9: -0.7158861861473951}, 'beta': {0: 0.06380696059868196, 1: 0.06587083148144124, 2: 0.06765301548739955, 3: 0.05988254674384674, 4: 0.06145817651089247, 5: 0.06289718986184667, 6: 0.06936732733804774, 7: 0.0650938843333726, 8: 0.06741439787696402, 9: 0.0695119772500532}})
# make the new columns
df['newx'] = np.nan
df['newy'] = np.nan
# if any of the values are np.nan when we're done, something went wrong
# Do the float `between` comparison but cleverly
EPSILON = 1e-6
(sides, directions) = ((1, -1), (1, 3))
windows = tuple(tuple(d*math.pi/2 s*EPSILON for s in sides) for d in directions)
# challenge: make this more DRY (don't repeat yourself)
alpha_vertical = sum([df.alpha.between(*w) for w in windows]).astype(bool)
beta_vertical = sum([ df.beta.between(*w) for w in windows]).astype(bool)\
& ~alpha_vertical
neither = (~alpha_vertical & ~beta_vertical)
# Handle `alpha_is_in_y_axis`:
c1 = df.loc[alpha_vertical]
df.loc[alpha_vertical,'newx'] = mid
df.loc[alpha_vertical,'newy'] = np.tan(c1.beta*(c1.x - mid) mid).astype(int)
# Handle `beta_is_in_y_axis`:
c2 = df.loc[beta_vertical]
# ignore the x-values
df.loc[beta_vertical,'newy'] = np.tan(c2.alpha*(mid - c2.x) mid).astype(int)
# Handle the other cases:
c3 = df.loc[neither]
m0 = np.tan(c3.alpha)
m1 = np.tan(c3.beta)
t = ((m0 * c3.x - m1 * mid) - (c3.y - mid)) / (m0 - m1)
df.loc[neither,'newx'] = t.astype(int)
df.loc[neither,'newy'] = (m0 * (t - c3.x) c3.y).astype(int)