Iterators and Algorithms

Summing numbers - v1

fn sum(numbers: &[i32]) -> i64 {
    let mut sum = 0;
    let mut index = 0;
    while index < numbers.len() {
        sum += numbers[index] as i64;
        index += 1;
    }
    sum
}
  • Conditional jumps and offsets:
    • while
    • numbers[index]

Summing numbers - v2

fn sum(numbers: &[i32]) -> i64 {
    let mut sum = 0;
    for number in numbers {
        sum += number as i64;
    }
    sum
}
  • Language magic:
    • for ... in ...

Summing numbers - v3

fn sum(numbers: &[i32]) -> i64 {
    match numbers {
        [] => 0,
        [front, rest @ ..] => *front as i64 + sum(rest),
    }
}
  • Recursion: Repeatedly splitting of the first element

Different ways to perform iteration

  • Which one is the fastest? The easiest to write?
  • Is there one canonical definition of iteration?
  • Do they all work for different types of collections?

Summing numbers - HashSet

fn sum(numbers: &HashSet<i32>) -> i64 {
    let mut sum = 0;
    let mut index = 0;
    while index < numbers.len() {
        sum += numbers[index] as i64;
        index += 1;
    }
    sum
}

Summing numbers - HashSet

fn sum(numbers: &HashSet<i32>) -> i64 {
    let mut sum = 0;
    let mut index = 0;
    while index < numbers.len() {
        sum += numbers[index] as i64;
        index += 1;
    }
    sum
}

Does not compile! HashSet is not an ordered collection

cannot index into a value of type `&HashSet<i32>`

Iteration and structure

  • Does iteration require that we know the inner structure of the thing that we want to iterate on?
  • Word stem: From latin “itero”: “To do a second time, repeat”
  • What is the action we want to repeat? What do we need to perform this action?

“Give me the next element”

  • A reasonable definition of iteration
  • As we will see, this is quite general and composes nicely!
  • How do we realize it?

“Give me the next element” - cont.

fn sum(numbers: &[i32]) -> i64 {
    match numbers.head() { // Give me the next element...
        None => 0,
        Some(front) => *front as i64 
            + sum(&numbers[1..]), // ...then do it again!
    }
}

“Give me the next element” - cont.

fn sum(numbers: &[i32]) -> i64 {
    match numbers.head() { // Give me the next element...
        None => 0,
        Some(front) => *front as i64 
            + sum(numbers.tail()), // ...then do it again!
    }
}

“Give me the next element” - cont.

struct NumbersIterator<'a> {
    numbers: &'a [i32],
}

impl NumbersIterator<'_> {
    pub fn next(&mut self) -> Option<&i32> {
        match self.numbers {
            [] => None,
            [head, tail @ ..] => {
                self.numbers = tail;
                Some(head)
            },
        }
    }
}

“Give me the next element” - cont.

fn sum(mut numbers: NumbersIterator<'_>) -> i64 {
    let mut sum = 0;
    while let Some(number) = numbers.next() {
        sum += *number as i64;
    }
    sum
}

Our first iterator!

  • Encapsulates iteration behind a single function next
  • No specific container type anymore
    • Though it is hidden inside NumbersIterator!
    • More work needed to support other containers…
  • Unsure if it is as fast as manual iteration

Supporting more container types

struct NumbersIterator<'a> {
    numbers: &'a [i32], // Only one fixed type here
                        // How would we change this to HashSet<i32>?
}
  • Observation: Our iterator is not a specific type but an interface!
trait NumbersIterator {
    fn next(&mut self) -> Option<&i32>;
}

Requires generics!

fn sum(mut numbers: impl NumbersIterator) -> i64 {
    let mut sum = 0;
    while let Some(number) = numbers.next() {
        sum += *number as i64;
    }
    sum
}

// Could also have written:
fn sum<I: NumbersIterator>(mut numbers: I) -> i64;

Supporting more value types!

  • Currently we can only iterate over &i32 values
  • Let’s make it generic:
trait Iterator<Item> {
    fn next(&mut self) -> Option<Item>;
}

struct NumbersIterator<'a> {
    numbers: &'a [i32],
}

impl<'a> Iterator<&'a i32> for NumbersIterator<'a> {
    fn next(&mut self) -> Option<&'a i32> {
        todo!("same code as before")
    }
}

Iterator is stateful

  • Our iterator instance (i.e. NumbersIterator) needs to keep track of its position within the collection
  • We could implement it directly on &'a [i32] and do *self = rest, but that doesn’t work for HashSet<i32>!
  • Therefore we resort to separate iterator types that are stateful
  • Different than in C/C++ where we use cursors (e.g. begin and end pointers)

Problems with the generic iterator

  • Overly verbose! We have to introduce a generic type T everywhere:
fn sum<T, I: Iterator<T>>(iter: I) -> T { ... }

trait PairIterator<T>: Iterator<T> {
    fn next_pair(&mut self) -> Option<(T, T)>;
}
  • We could implement Iterator<T> multiple times on the same type for different T
    • Doesn’t make a lot of sense. Every iterator will yield one specific type of items!

Associated types

  • Generics: The user of a type chooses the generic parameter(s):
let numbers = Vec::<i32>::new();
  • Associated types: The implementor of a trait defines the type:
impl Add for i32 {
  type Output = i32; // <-
  fn add(self, rhs: i32) -> i32 {}
}
  • An associated type can be queried from the outer type:
fn multiply<T: Add>(a: T, b: T) -> T::Output {
                                 //   ^^^^^^ 
                                 //   defined by the implementor of Add
    todo!()
}

The Iterator trait in the Rust standard library

  • Uses associated types because each iterator knows (and defines!) the elements it iterates over:
trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

impl<'a> Iterator for NumbersIterator<'a> {
    type Item = &'a i32;
    fn next(&mut self) -> Option<Self::Item> {
        todo!()
    }
}

Usage

fn sum(mut iter: impl Iterator<Item = i32>) -> i64 {
                            // ^^^^^^^^^^ specify the associated type
    let mut sum = 0;
    while let Some(number) = iter.next() {
        sum += *number as i64;
    }
    sum
}

How can we implement a product function?

Similarities

fn sum(mut iter: impl Iterator<Item = i32>) -> i64 {
    let mut sum = 0;
    while let Some(number) = iter.next() {
        sum += *number as i64;
    }
    sum
}


fn product(mut iter: impl Iterator<Item = i32>) -> i64 {
    let mut sum = 1;
    while let Some(number) = iter.next() {
        sum *= *number as i64;
    }
    sum
}

The fold algorithm

  • Start with some B and combine (“fold”) elements using a function f
  • fold(0, +) == sum
  • fold(1, *) == product

Algorithms on the Iterator trait

trait Iterator {
    // ...

    fn fold<B, F>(self, init: B, f: F) -> B
    where 
        F: FnMut(B, Self::Item) -> B 
    {
        let mut accum = init;
        while let Some(item) = self.next() {
            accum = f(accum, item);
        }
        accum
    }
}

Combining with Option<T>::map

  • Suppose we have a collection over Strings and want to parse them all to integers:
fn parse_all(mut strings: impl Iterator<Item = String>) -> Vec<i32> {
    let mut integers = Vec::default();
    while let Some(string) = strings.next() {
        let integer = string.parse::<i32>()
                        .expect("String is not an i32");
        integers.push(integer);
    }
    integers
}
  • next() returns Option<String> and we want to apply the parsing function only if it is Some

Combining with Option<T>::map

  • Suppose we have a collection over Strings and want to parse them all to integers:
fn parse_all(mut strings: impl Iterator<Item = String>) -> Vec<i32> {
    let mut integers = Vec::default();
    while let Some(integer) = strings
                                .next()
                                .map(|s| s.parse::<i32>().expect("...")) 
    {
        integers.push(integer);
    }
    integers
}
  • Why not have an iterator that calls .map() for us on every next() call?

The map algorithm

struct MapIter<I, F> {
    iter: I,
    map_fn: F,
}

impl<B, I, F> Iterator for MapIter<I, F>
where
    I: Iterator,
    F: FnMut(I::Item) -> B,
{
    type Item = B;
    fn next(&mut self) -> Option<Self::Item> {
        self.iter.next().map(&mut self.map_fn)
    }
}
  • Wrap the iterator in a new iterator that calls Option::map

The map algorithm

trait Iterator {
    // ...

    fn map<B, F>(self, f: F) -> MapIter<Self, F> 
    where
        F: FnMut(Self::Item) -> B 
    {
        MapIter {
            iter: self,
            map_fn: f,
        }
    }
}

Using map

fn parse_all(mut strings: impl Iterator<Item = String>) -> Vec<i32> {
    let mut integers = Vec::default();
    while let Some(integer) = strings
                                .map(|s| s.parse::<i32>().expect("..."))
                                .next()
    {
        integers.push(integer);
    }
    integers
}
  • Doesn’t look much better than before, we just flipped the next and map calls…

The while let Some(next) = iter.next() pattern

  • This is nasty to type and hard to understand
  • Built into the language as for item in iter!
  • No manual call to next() anymore!
fn parse_all(strings: impl Iterator<Item = String>) -> Vec<i32> {
    let mut integers = Vec::default();
    for integer in strings.map(|s| s.parse::<i32>().expect("...")) {
        integers.push(integer);
    }
    integers
}

Collecting all elements in an iterator

  • Pushing every element in an iterator into some collection is a very common operation called collect:
fn parse_all(strings: impl Iterator<Item = String>) -> Vec<i32> {
    strings
        .map(|s| s.parse::<i32>().expect("..."))
        .collect() // Type is inferred from function return type
}
  • Works backwards by having the collection implement a trait called FromIterator!
fn collect<B: FromIterator<Self::Item>>(self) -> B { ... }

FromIterator is complicated

trait FromIterator<A> {
    // Convert `iter` into an iterator first
    // Then convert that iterator into `Self`
    // `Self` will typically be a collection, but doesn't have to be
    fn from_iter<T: IntoIterator<Item = A>>(iter: T) -> Self;
}

pub trait IntoIterator {
    // The type of the elements being iterated over.
    type Item;
    // Which kind of iterator are we turning this into?
    type IntoIter: Iterator<Item = Self::Item>;

    // Convert `self` into an iterator. Consumes `self`!
    // This is a bit like `From` and `Into`!
    fn into_iter(self) -> Self::IntoIter;
}

How do we get an iterator to a collection?

let numbers = vec![1,2,3,4,5];
numbers.iter();         // Iterator<Item = &i32>
numbers.iter_mut();     // Iterator<Item = &mut i32>
numbers.into_iter();    // Iterator<Item = i32>, consumes `numbers`
  • Most collections implement IntoIterator and also provide .iter()/.iter_mut()
  • Some collections have special iterators: .keys(), .values() etc.

The algorithms zoo

  • Many common for-loop patterns can be converted into algorithms with speakable names:
// **Find** the first element matching some predicate
for element in collection { 
    if predicate(element) {
        return element;
    }
}

// **Filter** all elements not matching a predicate
for element in collection { 
    if !predicate(element) {
        continue;
    }
    // Do something with element
}

Categories of algorithms

  • Does the algorithm evaluate the iterator or does it adapt into a new iterator?
    • In other words: Is it eager or lazy?
strings
    .map(|s| s.parse::<i32>().expect("...")) 
    .collect() 
  • map returns a new iterator but doesn’t call .next() -> It is lazy
  • collect evaluates the iterator by calling .next() until it yields None -> It is eager

Processing infinite sequences

  • Lazy iterators can work with infinite sequences:
let all_numbers = 2_u64..; // (almost) infinite sequence: ~2^64 elements
for prime in all_numbers.filter(is_prime) {
    println!("{prime} is prime");
}
  • filter does no work and allocates no memory
  • All evaluation comes from the for ... in ... syntax

Categories of algorithms - cont.

  • Queries: all, any, filter, find etc.
    • Check some property of a sequence of elements
  • Evaluators: collect, count, fold, for_each etc.
    • Evalute the iterator, visiting all elements
  • Transformations: enumerate, filter, flatten, map etc.
    • Changes the way the iterator behaves: Changing item types, step size, nesting etc.
  • Groupings: chain, unzip, zip
    • Grouping multiple iterators together
  • Orderings: cmp, eq, is_partitioned, partition etc.
    • Concerned with the ordering of elements

Where is sort?

Sorting

  • Sorting is an offline algorithm
    • Requires access to all data upfront
  • Rust iterators only support online algorithms
    • Looking at one element at a time
  • sort functions are available on the collections themselves (i.e. Vec::sort)

Advantages of algorithms

  • They have speakable names
    • Makes them easier to understand than for-loops
  • State intent explicitly
  • They combine better than for-loops: iter().filter().map().collect()
  • Performance:
    • Most algorithms use the #[inline] attribute
    • Enables LLVM to optimize heavily

Downsides

  • Harder to debug as the “interesting stuff” often is hidden between a lot of boilerplate
  • It can be tempting to be clever and find a convoluted chain of algorithms for some problem
    • Looking at you fold