Home > Enterprise >  Is there a way to handle `empty csv` in python when using `read_csv` of polars
Is there a way to handle `empty csv` in python when using `read_csv` of polars

Time:06-28

My question is pretty like this but I'm using polars.

Environment: python 3.8, polars >=0.13.24

I have a CSV file to parse every 500ms, but it may be reset by another program. When it is reset via reopening it, polars will through exceptions.NoDataError: empty csv and exit.

What I've tried is to wrap my read_csv in a try-except block:

# These codes are in a function body
try:
    result = pl.read_csv(result_file_name)
    # do some transformation with the dataframe
    return result
except:
    # return an empty dataframe
    return pl.DataFrame(
        None, ["column", "names", "as", "it", "exist"]
    )

But it still throws exceptions. I'm not very familiar with python, so I don't know how to let it fall into the except branch and return the empty dataframe.


Updates: (More detail)

The above code is in a function named parse_result, which is used to parse the CSV file into polars.DataFrame. It will be called in the method calculate of a class named UpdateData:

class UpdateData:
    def __init__(
        self, ax: Axes, trace_file_name: str, result_file_name: str, title: str
    ):
        self.trace = parse_trace(trace_file_name)
        self.result_file_name = result_file_name
        self.ax = ax
        self.lines = []
        for i in range(2):
            self.lines.append(self.ax.plot([], [], label=f"{i}")[0])

        # plot parameters
        # set ax parameters
        # ...

    def __call__(self, frame):
        x, y = self.calculate()
        self.lines[0].set_data(x, y[1])
        self.lines[1].set_data(x, y[2])
        return self.lines

    def calculate(self):
        result = parse_result(self.result_file_name)
        # calculate x and y from result
        # not important here
        return x, y


# argpase code (not important)

if __name__ == "__main__":
    args = parser.parse_args()

    fig, ax = plt.subplots()
    update_data = UpdateData(ax, args.trace, args.result, args.title)
    anim = FuncAnimation(fig, update_data, interval=500)
    plt.show()

I define the function parse_result pretty much like cbilot's answer, and it works well individually.

But when I use it in UpdateData to draw an animation via matplotlib, the error occurs:

File "/home/duskmoon/.local/lib/python3.8/site-packages/matplotlib/animation.py", line 907, in _start
    self._init_draw()
  File "/home/duskmoon/.local/lib/python3.8/site-packages/matplotlib/animation.py", line 1696, in _init_draw
    self._drawn_artists = self._func(framedata, *self._args)
  File "./liveshow.py", line 97, in __call__
    x, y = self.calculate()
  File "./liveshow.py", line 107, in calculate
    result = parse_result(self.result_file_name)
  File "./liveshow.py", line 72, in parse_result
    self._draw_frame(frame_data)
  File "/home/duskmoon/.local/lib/python3.8/site-packages/matplotlib/animation.py", line 1718, in _draw_frame
    result = pl.read_csv(result_file_name)
  File "/home/duskmoon/.local/lib/python3.8/site-packages/polars/io.py", line 333, in read_csv
    df = DataFrame._read_csv(
  File "/home/duskmoon/.local/lib/python3.8/site-packages/polars/internals/frame.py", line 587, in _read_csv
    self._drawn_artists = self._func(framedata, *self._args)
  File "./liveshow.py", line 97, in __call__
    self._df = PyDataFrame.read_csv(
    x, y = self.calculate()
exceptions.NoDataError: empty csv
  File "./liveshow.py", line 107, in calculate
    result = parse_result(self.result_file_name)
  File "./liveshow.py", line 72, in parse_result
    result = pl.read_csv(result_file_name)
  File "/home/duskmoon/.local/lib/python3.8/site-packages/polars/io.py", line 333, in read_csv
    df = DataFrame._read_csv(
  File "/home/duskmoon/.local/lib/python3.8/site-packages/polars/internals/frame.py", line 587, in _read_csv
    self._df = PyDataFrame.read_csv(
exceptions.NoDataError: empty csv

CodePudding user response:

You can try:

result = pl.read_csv(result_file_name, ignore_errors=true, columns=["column", "names", "as", "it", "exist"])

Referring to - https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.read_csv.html

CodePudding user response:

you can check length of dataframe

result = pl.read_csv(result_file_name)
if len(result.index)==0:
  quit()
  #oranything you want to do

CodePudding user response:

Does the error look like this?

SyntaxError: 'return' outside function

If so, it means that you are trying to use a return statement that is not part of a function definition.

The return statement can only be used in a function definition, such as:

def my_function(result_file_name):
    try:
        result = pl.read_csv(result_file_name)
        # do some transformation with the dataframe
        return result
    except:
        # return an empty dataframe
        result= pl.DataFrame(
            None, ["column", "names", "as", "it", "exist"]
        )
        return result

my_function("/tmp/tmp.csv")
>>> my_function("/tmp/tmp.csv")
shape: (0, 5)
┌────────┬───────┬─────┬─────┬───────┐
│ column ┆ names ┆ as  ┆ it  ┆ exist │
│ ---    ┆ ---   ┆ --- ┆ --- ┆ ---   │
│ f32    ┆ f32   ┆ f32 ┆ f32 ┆ f32   │
╞════════╪═══════╪═════╪═════╪═══════╡
└────────┴───────┴─────┴─────┴───────┘

However, if you are not in a function definition, you can just assign the result variable, without a return statement:

try:
    result = pl.read_csv(result_file_name)
    # do some transformation with the dataframe
except:
    # return an empty dataframe
    result= pl.DataFrame(
        None, ["column", "names", "as", "it", "exist"]
    )

print(result)
shape: (0, 5)
┌────────┬───────┬─────┬─────┬───────┐
│ column ┆ names ┆ as  ┆ it  ┆ exist │
│ ---    ┆ ---   ┆ --- ┆ --- ┆ ---   │
│ f32    ┆ f32   ┆ f32 ┆ f32 ┆ f32   │
╞════════╪═══════╪═════╪═════╪═══════╡
└────────┴───────┴─────┴─────┴───────┘
  • Related