This looks like a really cool architecture, but I'm mildly uncertain about how well it scales given the simplicity of the tasks demonstrated. Delivery tasks are the most basic tasks in RL research. I'd be interested in predator/hazard avoidance, as well as some sort of rudimentary reasoning (might require analogy structure built-in?).
Additionally, the pure-pixel input is neato, but the discrete outputs seem limiting.
That being said, I'm overall very happy with the direction of research!