mesos - Integrate Apache Aurora with dcos -
there 2 mesos frameworks support gpu resources: marathon , aurora. launch batch jobs on mesos agents gpu resources. so, aurora supports such kind of jobs. aurora not supported dcos officially @ moment. i'v tried integrate not successful. dcos mesos masters don't register aurora framework exhibitor creates records aurora. i'v not managed find records aurora in mesos masters logs. here aurora-scheduler config:
#!/bin/bash glog_v=0 libprocess_port=8083 #libprocess_ip=127.0.0.1 java_home=/opt/mesosphere/active/java/usr/java java_opts="-server -djava.library.path='/opt/mesosphere/lib;/usr/lib;/usr/lib64'" path=$path:/opt/mesosphere/bin mesos_native_java_library=/opt/mesosphere/lib/libmesos.so ld_library_path=$ld_library_path:/opt/mesosphere/lib java_library_path=$java_library_path:/opt/mesosphere/lib # flags control behavior of aurora scheduler. # full list of available flags, run /usr/lib/aurora/bin/aurora-scheduler -help aurora_flags=( # name of cluster. -cluster_name='my cluster' # http port upon aurora listen. -http_port=8088 # zookeeper url of znode mesos master has registered. -mesos_master_address=zk://master_ip1:2181,master_ip2:2181,master_ip3:2181/mesos # zookeeper quorum aurora register itself. -zk_endpoints=master_ip1:2181,master_ip1:2181,master_ip1:2181 # zookeeper znode within specified quorum aurora register # serverset, keeps track of live aurora schedulers. -serverset_path='/aurora/scheduler' # allows scheduling of containers of provided type. -allowed_container_types='docker,mesos' -allow_docker_parameters=true -allow_gpu_resource=true -executor_user=root ### native log settings ### # native log serves replicated database stores state of # scheduler, allowing multi-master operation. # size of quorum of aurora schedulers possess native log. if running in # multi-master mode, consult following document determine appropriate values: # # https://aurora.apache.org/documentation/latest/deploying-aurora-scheduler/#replicated-log-configuration -native_log_quorum_size=2 # zookeeper znode aurora register locations of replicated log. -native_log_zk_group_path='/aurora/replicated-log' # local directory in aurora scheduler can find aurora's replicated log. -native_log_file_path='/var/lib/aurora/scheduler/db' # local directory in aurora schedulers place state backups. -backup_dir='/var/lib/aurora/scheduler/backups' ### thermos settings ### # local path of thermos executor binary. -thermos_executor_path='/usr/bin/thermos_executor' # flags pass thermos executor. -thermos_executor_flags='--announcer-ensemble 127.0.0.1:2181')
i'v managed start aurora framework on dc/os 1.8. due mesos , java embedded ds/os , have custom configuration, paths have isolate aurora docker. so, can find docker images aurora components @ docker repo: aurora scheduler, aurora executor. allows me or else create universe package.
steps deploying aurora scheduler on dc/os:
create folder
/var/lib/aurora
on each of dc/os agentsstart aurora executor on dc/os agents using next json:
{ "id": "/aurora/aurora-executor", "env": { "mesos_root": "/var/lib/mesos/slave" }, "instances": 20, "cpus": 1, "mem": 128, "disk": 0, "gpus": 0, "constraints": [ [ "hostname", "unique" ] ], "container": { "docker": { "image": "krot/aurora-executor", "forcepullimage": true, "privileged": false, "network": "host" }, "type": "docker", "volumes": [ { "containerpath": "/var/lib/mesos/slave", "hostpath": "/var/lib/mesos/slave", "mode": "rw" }, { "containerpath": "/var/lib/aurora", "hostpath": "/var/lib/aurora", "mode": "rw" } ] } }
note. set
"instances"
number of agents.2a. alternative way of aurora executor deployment (should done on each of dc/os agents):
sudo yum install -y python2 wget wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.16.0-1.el7.centos.aurora.x86_64.rpm rpm -uhv --nodeps aurora-executor-0.16.0-1.el7.centos.aurora.x86_64.rpm
make edit add
--mesos-root
flag resulting in like:grep -a5 observer_args /etc/sysconfig/thermos observer_args=( --port=1338 --mesos-root=/var/lib/mesos/slave --log_to_disk=none --log_to_stderr=google:info )
start aurora scheduler using next json (3 or more instances recommended fault tolerance):
{ "id": "/aurora/aurora-scheduler", "env": { "cluster_name": "yourcluster", "zk_endpoints": "master.mesos:2181", "mesos_master": "zk://master.mesos:2181/mesos", "quorum_size": "2", "extra_scheduler_args": "-allow_gpu_resource=true" }, "instances": 3, "cpus": 1, "mem": 1024, "disk": 0, "gpus": 0, "constraints": [ [ "hostname", "unique" ] ], "container": { "docker": { "image": "krot/aurora-scheduler", "forcepullimage": true, "privileged": false, "network": "host" }, "type": "docker", "volumes": [ { "containerpath": "/var/lib/aurora", "hostpath": "/var/lib/aurora", "mode": "rw" } ] } }
note.
-allow_gpu_resource=true
enables gpu support. aurora scheduler can configured using environment variables. please refer documentation details.
Comments
Post a Comment