3d ans 3d object detection questions

In order to have to have 3D detection do we need to do both 2D and 3D image training ?

When we have many object and many images of all of them let say
10 coke can images
20 beer can images
Is there a way to tell the detector that i have 2 different objects not 20 ?

Is there a way to access the coordinates of the detected objects ( like we have the table_array ) ?
So we can choose to pickup the coke can or the bear can ( as an exemple )

The object detection in this curse is really simple so what you are saing woulndt be done easily.
I would reccomend you having a look to the manipulation course, which dwelved more in this matter. Also we have a course that uses deeplearning with random environments to position a certain object.

Manipulation Couse

Random Env Object detection

Thanks for the advices about these other modules but i would like to be able to use the results of the detections obtained by this module. I have problems interpreting the find_object_2d/ObjectsStamped message. From rosmessage info i get a description that is not clear for me see below:

rosmsg info find_object_2d/ObjectsStamped
std_msgs/Header header
uint32 seq
time stamp
string frame_id
std_msgs/Float32MultiArray objects
std_msgs/MultiArrayLayout layout
std_msgs/MultiArrayDimension dim
string label
uint32 size
uint32 stride
uint32 data_offset
float32 data

Here is an exemple of the topic echo when i detect the table in front of the robot ( and below the understanding that i have of it)

ser:~$ rostopic echo /objectsStamped
header:
seq: 894
stamp:
secs: 1172
nsecs: 134000000
frame_id: “head_camera_rgb_optical_frame”
objects:
layout:
dim:
data_offset: 0
data: [6.0, 382.0, 146.0, 0.9914313554763794, -0.008840509690344334, -3.84211161872372e-05, 0.018584784120321274, 1.0304908752441406, 9.121886250795797e-05, 68.8837890625, 265.6440734863281, 1.0]

Interpretation
6 Object label (it is the number the image file in the directory saved_pictures2d ex: 6.png )
382 Object x size in pixels ?
146 Object y size in pixels ?
0.9914313554763794, -0.008840509690344334, -3.84211161872372e-05 ( looks like x1,y1,z1 ?)
0.018584784120321274, 1.0304908752441406, 9.121886250795797e-05 ( looks like x2,y2,z2 ?)
9.121886250795797e-05, 68.8837890625, 265.6440734863281, 1.0 ( looks like x3,y3,z3 ?)

I would have expected the object pose at it center but it is looking to be something else?