mesos - Integrate Apache Aurora with dcos -


there 2 mesos frameworks support gpu resources: marathon , aurora. launch batch jobs on mesos agents gpu resources. so, aurora supports such kind of jobs. aurora not supported dcos officially @ moment. i'v tried integrate not successful. dcos mesos masters don't register aurora framework exhibitor creates records aurora. i'v not managed find records aurora in mesos masters logs. here aurora-scheduler config:

 #!/bin/bash   glog_v=0  libprocess_port=8083  #libprocess_ip=127.0.0.1   java_home=/opt/mesosphere/active/java/usr/java   java_opts="-server -djava.library.path='/opt/mesosphere/lib;/usr/lib;/usr/lib64'"   path=$path:/opt/mesosphere/bin   mesos_native_java_library=/opt/mesosphere/lib/libmesos.so   ld_library_path=$ld_library_path:/opt/mesosphere/lib   java_library_path=$java_library_path:/opt/mesosphere/lib   # flags control behavior of aurora scheduler.  # full list of available flags, run /usr/lib/aurora/bin/aurora-scheduler -help  aurora_flags=(     # name of cluster.    -cluster_name='my cluster'      # http port upon aurora listen.    -http_port=8088      # zookeeper url of znode mesos master has registered.     -mesos_master_address=zk://master_ip1:2181,master_ip2:2181,master_ip3:2181/mesos      # zookeeper quorum aurora register itself.     -zk_endpoints=master_ip1:2181,master_ip1:2181,master_ip1:2181      # zookeeper znode within specified quorum aurora register     # serverset, keeps track of live aurora schedulers.     -serverset_path='/aurora/scheduler'      # allows scheduling of containers of provided type.     -allowed_container_types='docker,mesos'      -allow_docker_parameters=true     -allow_gpu_resource=true     -executor_user=root     ### native log settings ###      # native log serves replicated database stores state of     # scheduler, allowing multi-master operation.      # size of quorum of aurora schedulers possess native log.  if running in     # multi-master mode, consult following document determine appropriate values:     #     # https://aurora.apache.org/documentation/latest/deploying-aurora-scheduler/#replicated-log-configuration     -native_log_quorum_size=2     # zookeeper znode aurora register locations of replicated log.     -native_log_zk_group_path='/aurora/replicated-log'     # local directory in aurora scheduler can find aurora's replicated log.     -native_log_file_path='/var/lib/aurora/scheduler/db'     # local directory in aurora schedulers place state backups.     -backup_dir='/var/lib/aurora/scheduler/backups'     ### thermos settings ###     # local path of thermos executor binary.     -thermos_executor_path='/usr/bin/thermos_executor'    # flags pass thermos executor.     -thermos_executor_flags='--announcer-ensemble 127.0.0.1:2181') 

i'v managed start aurora framework on dc/os 1.8. due mesos , java embedded ds/os , have custom configuration, paths have isolate aurora docker. so, can find docker images aurora components @ docker repo: aurora scheduler, aurora executor. allows me or else create universe package.

steps deploying aurora scheduler on dc/os:

  1. create folder /var/lib/aurora on each of dc/os agents

  2. start aurora executor on dc/os agents using next json:

    {   "id": "/aurora/aurora-executor",   "env": {     "mesos_root": "/var/lib/mesos/slave"   },   "instances": 20,   "cpus": 1,   "mem": 128,   "disk": 0,   "gpus": 0,   "constraints": [     [       "hostname",       "unique"     ]   ],   "container": {     "docker": {       "image": "krot/aurora-executor",       "forcepullimage": true,       "privileged": false,       "network": "host"     },     "type": "docker",     "volumes": [       {         "containerpath": "/var/lib/mesos/slave",         "hostpath": "/var/lib/mesos/slave",         "mode": "rw"       },       {         "containerpath": "/var/lib/aurora",         "hostpath": "/var/lib/aurora",         "mode": "rw"       }     ]   } } 

    note. set "instances" number of agents.

    2a. alternative way of aurora executor deployment (should done on each of dc/os agents):

     sudo yum install -y python2 wget  wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.16.0-1.el7.centos.aurora.x86_64.rpm  rpm -uhv --nodeps aurora-executor-0.16.0-1.el7.centos.aurora.x86_64.rpm 

    make edit add --mesos-root flag resulting in like:

    grep -a5 observer_args /etc/sysconfig/thermos observer_args=(    --port=1338    --mesos-root=/var/lib/mesos/slave    --log_to_disk=none    --log_to_stderr=google:info ) 
  3. start aurora scheduler using next json (3 or more instances recommended fault tolerance):

    {       "id": "/aurora/aurora-scheduler",       "env": {         "cluster_name": "yourcluster",         "zk_endpoints": "master.mesos:2181",         "mesos_master": "zk://master.mesos:2181/mesos",         "quorum_size": "2",         "extra_scheduler_args": "-allow_gpu_resource=true"       },       "instances": 3,       "cpus": 1,       "mem": 1024,       "disk": 0,       "gpus": 0,       "constraints": [         [           "hostname",           "unique"         ]       ],       "container": {         "docker": {           "image": "krot/aurora-scheduler",           "forcepullimage": true,           "privileged": false,           "network": "host"         },         "type": "docker",         "volumes": [           {             "containerpath": "/var/lib/aurora",             "hostpath": "/var/lib/aurora",             "mode": "rw"           }         ]       } } 

    note. -allow_gpu_resource=true enables gpu support. aurora scheduler can configured using environment variables. please refer documentation details.


Comments

Popular posts from this blog

php - How to add and update images or image url in Volusion using Volusion API -

Laravel mail error `Swift_TransportException in StreamBuffer.php line 269: Connection could not be established with host smtp.gmail.com [ #0]` -

c# SetCompatibleTextRenderingDefault must be called before the first -