My question is somewhat similar to subtracting-two-columns-named-in-certain-pattern
I'm having trouble performing operations on columns that share the same root substring, without a loop. Basically I want to calculate a percentage change using columns that end with '_PY' with another column that shares the same name except for the suffix.
What's a possible one line solution, or one that doesn't involve a for loop?
url = r'https://www2.arccorp.com/globalassets/forms/corpstats.csv?1653338666304'
df = pd.read_csv(url)
df = df[df['TYPE'] == 'M']
PY_cols = [col for col in df.columns if col.endswith("PY")]
reg_cols = [col.split("_PY")[0] for col in PY_cols]
for k,v in zip(reg_cols,PY_cols):
df[f"{k}_YOY%"] = round((df[k] - df[v]) / df[v] * 100,2)
df
CodePudding user response:
You can use:
v = (df[df.columns[df.columns.str.endswith('_PY')]]
.rename(columns=lambda x: x.rsplit('_', maxsplit=1)[0]))
k = df[v.columns]
out = pd.concat([df, k.sub(v).div(v).mul(100).round(2).add_suffix('_YOY%')], axis=1)
CodePudding user response:
Gotta subset the df
into the columns you need. Then zip
will pull the pairs you need to do the percent calculation.
url = r'https://www2.arccorp.com/globalassets/forms/corpstats.csv?1653338666304'
df = pd.read_csv(url)
df = df[df['TYPE'] == 'M']
df_cols = [col for col in df.columns]
PY_cols = [col for col in df.columns if col.endswith("PY")]
# find the matching column, where the names match without the suffix.
PY_use = [col for col in PY_cols if col.split("_PY")[0] in df_cols]
df_use = [col.split("_PY")[0] for col in PY_use]
for k,v in zip(df_use,PY_use):
df[f"{k}_YOY%"] = round((df[k] - df[v]) / df[v] * 100,2)
CodePudding user response:
You can take advantage of numpy:
py_df_array = (df[df_use].values, df[PY_use].values)
perc_dif = np.round((py_df_array[0] - py_df_array[1]) / py_df_array[1] * 100, 2)
df_perc = pd.DataFrame(perc_def, columns=[f"{col}_YOY%" for col in df_use])
df = pd.concat([df, df_perc], axis=1)