I was reading a log file from an actor computation. In particular, I was looking at the outcome of a kmer counting computation performed with Argonnite, which runs on top of Thorium. Argonnite is an application in the BIOSAL project and Thorium is the engine of the BIOSAL project (which means that all BIOSAL applications run on top of Thorium).
In BIOSAL, everything is an actor or a message. And these are handled by the Thorium engine. Thorium is a distributed engine. A computation with Thorium is distributed across BIOSAL runtime nodes. Each node has 1 pacing thread and 1 bunch of worker threads (for example, with 32 threads, you get 1 pacing thread and 31 workers).
Each worker is responsible for a subset of the actors that live inside a given BIOSAL node. Obviously, you want each worker to have their own actors to keep every worker busy. Each worker has a scheduling queue with 4 priorities: max, high, normal, and low (these are the priority used by the Erlang ERTS called BEAM). An actor with max priority always wins. Otherwise, there is a ratio of N*N*N to N*N to N between high, normal, and low. This ratio protects against starvation.
In the current code, every actor is classified in normal by default.
If I put every actor in the same priority (BSAL_PRIORITY_NORMAL), I see this (for node 0 and worker 5) when I run a actor computation on one single physical machine (no latency when passing messages around):
node/0 worker/5 SchedulingQueue Levels: 4
node/0 worker/5 scheduling_queue: Priority Queue 1048576 (BSAL_PRIORITY_MAX), actors: 0
node/0 worker/5 scheduling_queue: Priority Queue 128 (BSAL_PRIORITY_HIGH), actors: 0
node/0 worker/5 scheduling_queue: Priority Queue 64 (BSAL_PRIORITY_NORMAL), actors: 4
node/0 worker/5  actor aggregator/1291935834 (1 messages)
node/0 worker/5  actor kmer_store/1477943366 (511 messages)
node/0 worker/5  actor aggregator/443747990 (1 messages)
node/0 worker/5  actor aggregator/710261816 (1 messages)
node/0 worker/5 scheduling_queue: Priority Queue 4 (BSAL_PRIORITY_LOW), actors: 0
node/0 worker/5 SchedulingQueue... completed report !
Arguably, actor 1477943366 should be classified in a higher priority than NORMAL (such as HIGH or MAX). But is it required ? I think, at least in this case, that the answer is no. Here is the reason.
The only thing that counts at the end of the day is that you want to waste CPU cycles. As long as CPU cycles are not wasted (called efficiency), the order (that's a partial order right there) of events is unimportant as long as no worker starve (remember, wasting CPU cycles is like wasting money: it's bad.).
Below are the load values across the actor system for an actor computation that lasted 17 minutes and that had an efficiency of 94% (the computation wasted around 6% of the CPU cycles, which is not bad, but not perfect neither).
At the beginning, there is some I/O, which waits for the magnetic disk. So CPU cycles are wasted.
Then, actors flow their messages at full capacity which is shown with a load between 0.99 and 1.00.
Then at the end the load drops a little because of work scarcity.
The first step is the data counting. In this computation, there were only one data file. So only one worker is busy from 0 seconds to 45 seconds.
Data distribution from input_stream actors to the sequence_store actors happens between 50 seconds and 60 seconds.
From 65 seconds to 950 seconds (most of the computation), the load of every worker reported by the Thorium runtime system is 99% or 100%. This is good enough.
Starting at 955 seconds, there is the fiinal phase which as a variable load across the system.
The overall load reported by Thorium appears below.
This means that 6% of the CPU cycles went into the garbage bin and were not used. Usually, this is caused by unavailable operands. In an actor computation, unavailable operands are happening when a worker has 0 actors scheduled in its priority scheduling queue, which means that none of its actors have any message in their message inboxes.
Obviously, actors are cool. And BIOSAL will bring this coolness to genomics, at scale.