Ray B. answered 10/02/20
Adobe Premiere Editor
You're assumption is pretty much spot on! There is a fair amount of manual work involved, but visual effects artists, compositors, and rotoscoping specialists have tools that help automate the process in, more-or-less exactly, the way that you described.
More precisely, an artist working on a shot that, for example, contains a person, a prop, and a building in the background (where each is at a different distance form the camera), that artist would need to "cut out" or "outline" that person, that prop, and that building. (This is called "masking".)
Modern software tools usually have masks that, once they're manually created by the artist, can automatically track the edges of whatever object they've been drawn to outline. These automated processes aren't flawless though, and a lot of touching up is required by the artist doing the work. For example, if the object being tracked passes behind a foreground object, it can cause the automated tracking to fail, requiring manual intervention by the artist. (This can really only be done shot-by-shot as well, so once the camera cuts, there's a bit of starting over involved.)
Once the person, the prop, and the building are "cut out" they can be separated from the background, and offset to either the left or the right, to whatever degree necessary depending on their distance from the camera. However, because the film wasn't shot in 3D, after offsetting the person, the prop, and the building, "holes" are (essentially) left where the objects were before the offsetting was done.
Imagine printing out a picture of a person walking through a city, then physically cutting that person out of the imagine, and finally, moving that cut-out person just a little bit to the right. There's now a person-shaped hole in the picture where the person was and, presumably, where the city behind the person should be.
What the artists have to do at this point is "fill in" that hole. They have to recreate the missing parts of whatever was behind the person so the hole isn't there. This can be done a number of different ways, but it typically involves using what parts of the background are available, and "cloning" or "copy/pasting" them.
So, yes, there is a lot of automation that can help speed the process up, but there is still a lot of manual fine-tuning involved. Usually, for a 3D conversion of a 2D movie, very large teams of people are employed to get the job done, with a lot of division of labor, which, in addition to the automated tools available, helps reduce the overall time it takes to complete.
I hope that helps!