Quick Overview: OpenAI Gym
Given what I've found so far, it looks like Unity would be a good way to train reinforcement learning agents, and Gazebo would be used afterwards to see how they work before deploying on actual physical robots. I might end up doing something different, but they are good targets to work towards. But where would I start? That's where OpenAI Gym comes in.
It is a collection of prebuilt environments that are free and open for hobbyists, students, and researchers alike. The list of available environments range across a wide variety of problem domains - from text-based activity that should in theory be easy for computers, to full-on 3D simulations like what I'd expect to find in Unity and Gazebo. Putting them all under the same umbrella and easily accessed from Python in a consistent manner makes it simple to gradually increase complexity of problems being solved.
Following the Getting Started guide, I was able to install the Python package and run the CartPole-v0 example. I was also able to bring up its Atari subsystem in the form of MsPacman-v4. The 3D simulations used MuJoCo as its physics engine, which has a 30-day trial and after that it costs $500/yr for personal non-commercial use. At the moment I don't see enough benefit to justify the cost so the tentative plan is to learn the basics of reinforcement learning on simple 2D environments. By the time I'm ready to move into 3D, I'll use Unity instead of paying for MuJoCo, bypassing the 3D simulation portion of OpenAI Gym.
I'm happy OpenAI Gym provides a beginner-friendly set of standard reinforcement learning textbook environments. Now I'll need to walk through some corresponding textbook examples on how to create an agent that learns to work in those environments.