Use case: golfing on the CLI in a utility function that I can't afford to make complicated.
I need to peek at only the column names only of a large file in binary format, and not the column names plus, say, the first data row.
In my current implementation, I have to write the burdensome command to peek at the first row of large files:
my-tool peek -n 1 huge-file.parquet | head -n 1 | tr ',' '\n' | less
What I would like is to:
my-tool peek --cols huge-file.parquet | tr ',' '\n' | less
or
my-tool peek --cols -d '\n' huge-file.parquet | less
Without getting complicated in python. I currently use the following mechanism to generate the csv:
out = StringIO()
df.to_csv(out)
print(out.getvalue())
Is there a DataFrame
-ish way to output just the columns to out
via to_csv(...)
or similarly simple technique?
CodePudding user response:
Maybe something like this?
import pandas as pd
import numpy as np
if __name__ == "__main__":
# some fake data for setup
np.random.seed(1)
df = pd.DataFrame(
data=np.random.random(size=(5, 5)),
columns=list("abcde")
)
out = df.columns.to_frame(name="columns")
out.to_csv("file.csv", index=False)
print(out)
columns
a a
b b
c c
d d
e e