Chapter 8 How long must be robot training

fgenestcvdl · March 11, 2020, 1:17am

With just a move every few minutes, how long dos it takes before having any interesting results ?

Is it normal to have no points on the rqt_multiplot for this training with her ?

simon.steinmann91 · March 11, 2020, 12:37pm

With a move every few minutes, there must be something wrong with your code. One step should not really take that long. I’m not 100% sure about this implementation in the chapter, but usually you need several hundreds episodes (consisting of many steps).

Perhaps check your code, nodes and topics and see what exactly is going on and where the bottleneck is.

fgenestcvdl · March 11, 2020, 2:05pm

Here is the situation.

Something that is not clear in the lesson is the moveit package that must be used. To avoid copiinng the package from the from ros manipulation in 5 day. I modified the launch file to use the default fetch_moveit_package with the joint_names.yaml from the ros manipulation in 5 days.

   rosparam command="load" file="$(find fetch_train)/config/joint_names.yaml"

At the very beginning, i have a few moves where the robot hits the table and reset to the original position. After i have a very log of robots failed move plannings in rviz with once in a while robot move.

I ran it for an hour in the studio and in parallel left it run in my local PC thru the night

In the morning the simulation was stooped with the message.
…
ValueError: shape mismatch: value array of shape (2,1001,7) could not be broadcast to indexing result of shape (2,1000,7)

When i looked at the output of train.py it is just a very long list of
…
Action:
[-0.822648 -0.19145986 -0.25391966 0.01888367]
Get Obs
Is done
Entered step
Unpause sim
Set action
Action:
…

simon.steinmann91 · March 12, 2020, 12:11pm

This sounds like a problem with the inverse kinematic solver (motion planning). It is trying to plan according to the action you are sending, and cannot find a solution. Try to investigate, what exact goal positions it is trying to solve for and failing. Perhaps you have some wrong values, where instead of a millimeter it wants to move 10cm or somethingn, always colliding. Most likely it has something to do with your random number generation: check this link:

fgenestcvdl · March 12, 2020, 11:11pm

I worked very hard to make it work but did not succeed. Here is my analysis.

I have checked all the files of the lesson for potential errors and i have not find any **
** ( they are identical to the ones in the IPython notebook )

I have placed a few debugging prints in the file of the environment and found that none of then were printing.

After a detailled analysis, i have found that the first program that we executes ( train.py ) do not use the files of that we worked in the course.

Train.py imports --> from openai_ros.task_envs.fetch_reach import fetch_reach,

obviously after that, the other file of the training environment are imported from the openai_ros package. 

to a certain point it does not mater since all the files are exactly the same except 
line 17, of fetch_reach.py that is different in the fetch_train package

	In openai_ros we have max_episode_steps=1000, 
                         - this 1000 max step could generation the error message that i had in my first post 
            In my_fetch_train we have timestep_limit=1000, 
                           but it is generating an error at execution if we try to use it.

So it is looking that the problem is in the proposed lesson itself instead than any thing that i did.

Here is more info on what is happening.

After a few moves (4) with the robot jumping at startup, the robot is planning in RVIZ and still in gazebo for longs minutes.

Typical by console (constantly repeating pattern )

*********** roslaunch fetch_train myrobot_planning_execution.launch console *****************
[ERROR] [1584053129.801063497, 314.431000000]: RRTConnect: Unable to sample any valid states for goal tree
[ INFO] [1584053129.801447761, 314.431000000]: RRTConnect: Created 1 states (1 start + 0 goal)
[ INFO] [1584053129.801491822, 314.431000000]: No solution found after 5.008066 seconds
[ INFO] [1584053129.825875822, 314.444000000]: Unable to solve the planning problem
[ INFO] [1584053129.867042987, 314.465000000]: Combined planning and execution request received for MoveGroup action. Forwarding to planning and execution pipeline.
[ INFO] [1584053129.867475032, 314.465000000]: Planning attempt 1 of at most 1
[ INFO] [1584053129.868442792, 314.465000000]: Planner configuration ‘arm’ will use planner ‘geometric::RRTConnect’. Additional configuration parameters will be set when the planner is constructed.
[ INFO] [1584053129.869913789, 314.465000000]: RRTConnect: Starting planning with 1 states already in datastructure

*********** rosrun fetch_train execute_trajectories.py console ***********************************
[ INFO] [1584053206.717277655, 353.587000000]: ABORTED: No motion plan found. No execution attempted.
[ WARN] [1584053215.913367407, 358.437000000]: Fail: ABORTED: No motion plan found. No execution attempted.

************* python train.py console *************************************
Action:
[-0.23174383 0.00595765 0.13028064 0.09770413]
[0.5834501 0.05778984 0.13608912 0.85119325]
[ 0.19025649 -0.2030494 0.20117997 0.01077664]
[-0.11125261 0.05778081 0.31276262 0.00094743]
[0.22419144 0.23386799 0.8874962 0.3636406 ]

A new line every 10 seconds

simon.steinmann91 · March 13, 2020, 12:30pm

Did you fix this issue? Are your problems persisting, even after it is using your files? The Issue seems to be in the motion planning, the inverse kinematics not being able to get a valid goal state.

fgenestcvdl · March 13, 2020, 4:24pm

I have tried it, but had a lot new errors.

The code of the lesson is made to use systematicaly the files in the openai_ros package not the code
of the my_fetch_train package.

More than that, even if i make a lot of effort to make it run without error, it should give the
same result since like i explained in my previous post, the code of
the corresponding file in the two packages is mostly identical.

As a reminder the only differences in the my_fetch_reach package are
line 17 of the fetch_reach.py is timestep_limit=1000, vs max_episode_steps=1000
the version in the my_fetch_train package generate an error message.

in the fetch.env.v2.py we are asked to modify the import as follow
from my_fetch_train import robot_gazebo_env_goal
    but both robot_gazebo_env_goal are absolutely identical.

Trying to use the use the my_fetch_train package is not the way to solve the problem.

Did you try recently to go thru the lesons 7 and 8 steps and run the train program.