Localisation and Size Estimation

I am trying to do a project where I can localise and estimate the sizes of a ML inference bounding boxes in a video stream. I am thinking of doing a Visual Odometry on the images and fusing it with IMU using the robot_localisation package to generate pose changes between frames and then using openCV triangulation methods to triangulate the 3D coordinates of the bounding box vertices. Has anyone does something similar to this? How should I go about performing visual odometery with ROS such that robot_localization package is compatible with the output?

Hi,

If I understood correctly you are tryin to do odometry based on Camera Images? I did a small experiment that got something marginally working: RGBSLAM

If you get this improvedand working, it would be great that you could share the ROSject or git here for the rest of the comunity because its something that its high in demand.