I am struggling with Rust Polars ...
In Pandas its fairly straight forward ... using Polars in Rust ... is another story;
Here is what I like to do (not working code):
use polars::prelude::*;
fn main() {
let days_data = df!("weekday" => &["Tuesday", "Wednesday", "Thursday", "Friday"],
"date" => &["1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05"]);
let sales_data = df!("sales" => &[1_000u32, 20, 300, 40_000, 555],
"rep" => &["Joe", "Mary", "Robert", "Thomas", "Susanna"],
"date" => &["1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05"]);
// here is what I like to do in pseudo code:
// Q1: How do I iterate over a column ?
for each_date in days_data["date"]{
// Q2 how do I access a specific value in Series/DataFrame?
let sales = sales_data[each_date]["sales"].value;
let rep =sales_data[each_date]["rep"].value;
let weekday = days_data[each_date]["weekday"];
if sales > 10_000 {println!("Nice Job {} and all this on a {}!", rep, weekday)};
}
}
I was looking for a couple days, but was not able o find anything close to a solution ...
CodePudding user response:
You can do it this way, but it's very convoluted and probably not what you want. DataFrames offer you a powerful relational data model and you're not making use of it at all. Instead of manually indexing and checking rows and columns, let the DataFrame do the work for you.
The way I would write out the logic of your program (and I'm no DataFrame expert, by the way, so do look into it yourself some more) is more like this:
use polars::prelude::*;
fn main() -> Result<()> {
// Usually, there are five weekdays, so I added the missing one.
let days_data = df!(
"weekday" => &["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
"date" => &["1900-01-01", "1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05"])?;
// This DF was wrong. You had one date too few.
let sales_data = df!(
"sales" => &[1_000u32, 20, 300, 40_000, 555],
"rep" => &["Joe", "Mary", "Robert", "Thomas", "Susanna"],
"date" => &["1900-01-01", "1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05"])?;
// First of all, we can join the DFs on the dates, matching up each sale
// with its corresponding weekday:
let joined = sales_data.left_join(&days_data, ["date"], ["date"])?;
// Next up, we create a filter out every row where the sales column is less
// than or equal to 10,000 (it's actually more efficient to first filter
// and then join but optimizations are best left for last):
let filter = joined.column("sales")?.gt(10_000)?;
// Er apply this filter to our joined DF:
let filtered = joined.filter(&filter)?;
// We extract the "rep" and "weekday" columns:
let reps = filtered.column("rep")?.utf8()?;
let weekdays = filtered.column("weekday")?.utf8()?;
// We iterate over both and print out the results:
for (rep, day) in reps.into_iter().zip(weekdays.into_iter()) {
println!(
"Nice Job {} and all this on a {}!",
rep.unwrap(),
day.unwrap()
);
}
Ok(())
}