I have a boolean dataframe indexed by timestamps
df
>>>
timestamp x0 x1 x2
2020-01-01 True False True
2020-01-02 True False True
2020-01-03 False True True
I want to save a csv of the column names for each row where the column is True
, with the current timestamp as the csv filename. So in the example above, the desired output would be 3 csv's:
20200101.csv:
x0,
x2,
20200102.csv:
x0,
x2,
20200103.csv:
x1,
x2,
I have managed to do this using a for loop and some pandas methods, but it seems clunky. (This would be almost a one-liner in R, like using split
and lapply
.)
import numpy as np
for idx, row in df.iterrows():
tmp = row.replace({False: np.nan}).dropna()
tmp = pd.DataFrame({"my_col": tmp.index.tolist()})
file_name = ''.join(str(idx.date()).split('-'))
tmp.to_csv(f"{file_name}.csv", index=False)
Is there a clean way to do this using pure pandas / map reduce / pandas apply and avoiding for loops?
CodePudding user response:
I honestly don't know if it's appropriate to use .apply
for side effects, but this works. row.index[row]
uses the row as a boolean mask to filter the column labels, the rest should be self-explanatory.
df.apply(
lambda row: row.index[row]
.to_frame()
.to_csv(f"{row.name.replace('-', '')}.csv",
index=False,
header=False),
axis=1 # apply row-wise
)
CodePudding user response:
Had to stick with a loop to write out the CSVs.
df_out = df.melt(id_vars='timestamp').loc[lambda x: x['value']].sort_values('timestamp')
print(df_out)
timestamp variable value
0 2020-01-01 x0 True
6 2020-01-01 x2 True
1 2020-01-02 x0 True
7 2020-01-02 x2 True
5 2020-01-03 x1 True
8 2020-01-03 x2 True
Resorted to the much-maligned loop for output to CSV:
for t in df_out['timestamp'].unique():
df_out.loc[df_out['timestamp']==t,'variable'] \
.to_csv(re.sub('-','',fr'd:\jchtempnew\SO\{t}.csv'),
index=None, header=None, line_terminator=',\r\n')
20200101.csv:
x0,
x2,
20200102.csv:
x0,
x2,
20200103.csv:
x1,
x2,