We typically use AR to visualize products in our home to see how they look, but AR can’t give us a sense for how products feel. Aesthetics matter when shopping for a couch, but it’s just as important to sit down and feel if the couch is comfortable or not. Today, only a visit to the store can deliver that tactile experience.
Recently, photogrammetry apps like Polycam and Reality Capture have made it easy for anyone to create a high fidelity model of their room using their iPhone. We previously explored using Apple’s new RoomPlan API to create high fidelity room models as well. NeRFs are an exciting breakthrough in 3D capture and rendering that are taking 3D capture to new levels of fidelity.
We wondered if we could use these technologies to deliver the best of both worlds: look and feel. We explored what it would be like to reverse the role AR typically plays: instead of rendering a virtual thing in a real room we’d render a virtual room around a real thing.
“We’ve turned traditional augmented reality on its head.”
Reversing AR makes it easy for shoppers to touch a product directly and see it in their space instantly. Before arriving at the store, shoppers take just a few moments to create a high fidelity model of their room by recording a short video. Still frames from the video are used to build a detailed 3D model of the room in seconds which can be used again and again in any store.
In a store, shoppers use their phone or AR glasses to project their 3D room model into the environment. Computer vision systems segment furniture and people from their camera feed and composite them into the virtual room.
“Now shoppers can sit on the real thing and see if it fits their space at the same time.”
Multiplayer capabilities allow two shoppers to view the same scene simultaneously, making it easier to discuss ideas. Gestures can be used to control the orientation of the room, creating some mind-bending moments.
Merchants could offer shoppers multiple scenes to showcase the versatility of a piece—in a Paris flat, a modern mountain cabin, or next to an infinity pool nestled in a desert oasis.
In exploring this concept we had to solve two problems. We had to leverage existing APIs to classify and segment a piece of furniture from a scene in real time and we needed to render the live segmented video and virtual room together with a high degree of realism.
We began by exploring classification and segmentation using YOLOv3 and DeepLabV3, but the quality of the segmentation didn’t pass the bar for our needs.
In parallel, we explored RealityKit’s implementation, trying to figure out why Apple’s furniture segmentation worked so well. It looked to us like they were leveraging meshes constructed from LiDAR data.
We decided to use Unity and ARFoundation to explore meshing. We built a quick demo that generates a mesh for any object classified as a sofa and applies an occlusion material to it. That way, when a piece of furniture is surrounded by a virtual room, its mesh becomes a window into the real world that reveals the real object.
It captured the form well for some objects, but we weren’t happy with the edge quality. Pieces with thin parts or gaps weren’t meshed correctly, and cushions and cats poked holes in the meshes. All these problems broke the illusion.
Along the way, we explored depth-based segmentation on LiDAR equipped devices, which worked surprisingly well when seated on the sofa, but not when a user stood up. This approach was promising but we needed to find a way to segment parts of the scene.
Then we discovered that we could use depth-based segmentation to more cleanly cut objects out of the scene if we had a bounding box to constrain the depth buffer to world space.
Here’s how it works:
In parallel, we also needed to generate grounding shadows for real objects in the virtual scene. A top-down orthographic projection onto the floor plane of the chair’s computed mesh creates a sharp-looking shadow. Adding a gaussian blur to the effect softens the edges and makes it much more believable.
When you bring it all together, it’s a magical experience we can build with today’s technology. This prototype is running live in a real furniture store on an iPad Pro.
The depth-based segmentation approach constrained by the bounding box produces incredible results, and it isn’t adversely affected by unrecognized objects like cushions and pets. In fact, with this approach, users can bring their pets into beautiful virtual worlds, even if they don’t want to come.