I am trying to loop through a Polar recordset using the following code:
import polars as pl
mydf = pl.DataFrame(
{"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"],
"Name": ["John", "Joe", "James"]})
print(mydf)
start_date ┆ Name │
│ --- ┆ --- │
│ str ┆ str │
╞════════════╪═══════╡
│ 2020-01-02 ┆ John │
│ 2020-01-03 ┆ Joe │
│ 2020-01-04 ┆ James │
for row in mydf.rows():
print(row)
('2020-01-02', 'John')
('2020-01-03', 'Joe')
('2020-01-04', 'James')
Is there a way to specifically reference 'Name' using the named column as opposed to the index. In Pandas this would look something like:
import pandas as pd
mydf = pd.DataFrame(
{"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"],
"Name": ["John", "Joe", "James"]})
for index, row in mydf.iterrows():
mydf['Name'][index]
'John'
'Joe'
'James'
CodePudding user response:
You can specify that you want the rows to be named
for row in mydf.rows(named=True):
print(row)
It will give you a dict:
{'start_date': '2020-01-02', 'Name': 'John'}
{'start_date': '2020-01-03', 'Name': 'Joe'}
{'start_date': '2020-01-04', 'Name': 'James'}
You can then call row['Name']
Note that:
- previous versions returned namedtuple instead of dict.
- it's less memory intensive to use
iter_rows
- overall it's not recommended to iterate through the data this way
Row iteration is not optimal as the underlying data is stored in columnar form; where possible, prefer export via one of the dedicated export/output methods.
CodePudding user response:
You would use select
for that
names = mydf.select(['Name'])
for row in names:
print(row)