A naive brute force algorithm would take a lot longer to complete some of these ...

garaetjjte · on May 23, 2019

This approach is similiar: http://tom7.org/mario/

negamax · on May 23, 2019

They have 350 hrs of playing per level. They could brute force on these parameters.

1. Detect to_avoid moving obstacles

2. Detect to_avoid stationary obstacles (gaps)

3. Player moves (jump (left, right)(up, far, farthest), walk)

4. Success (hitting the flag, reaching princess)

Once above information is available on the screen, it becomes easily bruteforcible. I was hoping use of genetic algorithms i.e. they could have taken success from one part (path) of level and crossed it with another and tried that with across levels. But there won't be a generic strategy or learning anyways as there's fair bit of randomness. So this does seem like brute forcing for success path

null000 · on May 23, 2019

The whole point is that the algorithm doesn't know about obstacles or success as a concept baked into the algorithm. Likewise, this is pretty initial research, meant to inform and promote

In other words, this isn't meant to be super useful by itself. It seems tailor made (as many of these things do) to play super-simple 80's video games and literally nothing else, but it's an interesting proof of concept. I'd also be interested in different iterations on this general pattern - for instance, something that didn't translate directly from screen + button -> prediction, and instead had some interstitial systems - translating from screen -> entities, then predicting entity state of entities given button presses. It'd also be interesting to see how this performs with ML algorithms designed to learn on the fly instead of through training from a static set of data (at least, this looked like it learned through back propagation - I skimmed).

But I can see broader practical applications for this in, for instance, recommender systems trying to break users out of the closed feedback loop that people tend to end up in when going down certain rabbit holes (e.g. watch one Flat Earther conspiracy video and suddenly that's all you see for a week because the recommender system knows that people who look at one will look at more). The point being: the real test comes when this strategy is exposed to more diverse problem spaces, it's just that those are harder to model and we need to weed out the pointless stuff first.