flambe.cluster.instance.instance
¶
This modules includes base Instance classes to represent machines.
All Instance objects will be managed by Cluster objects (flambe.cluster.cluster.Cluster).
This base implementation is independant to the type of instance used.
Any new instance that flambe should support should inherit from the classes that are defined in this module.
Module Contents¶
-
class
flambe.cluster.instance.instance.
Instance
(host: str, private_host: str, username: str, key: str, config: ConfigParser, debug: bool, use_public: bool = True)[source]¶ Bases:
object
Encapsulates remote instances.
In this context, the instance is a running computer.
All instances used by flambe remote mode will inherit Intance. This class provides high-level methods to deal with remote instances (for example, sending a shell command over SSH).
Important: Instance objects should be pickeable. Make sure that all child classes can be pickled.
The flambe local process will communicate with the remote instances using SSH. The authentication mechanism will be using private keys.
Parameters: - host (str) – The public DNS host of the remote machine.
- private_host (str) – The private DNS host of the remote machine.
- username (str) – The machine’s username.
- key (str) – The path to the ssh key used to communicate to the instance.
- config (ConfigParser) – The config object that contains useful information for the instance. For example, config[‘SSH’][‘SSH_KEY’] should contain the path of the ssh key to login the remote instance.
- debug (bool) – True in case flambe was installed in dev mode, False otherwise.
- use_public (bool) – Wether this instance should use public or private IP. By default, the public IP is used. Private host is used when inside a private LAN.
-
fix_relpaths_in_config
(self)[source]¶ Updates all paths to be absolute. For example, if it contains “~/a/b/c” it will be change to /home/user/a/b/c (the appropiate $HOME value)
-
__enter__
(self)[source]¶ Method to use Instance instances with context managers
Returns: The current instance Return type: Instance
-
__exit__
(self, exc_type: Optional[Type[BaseException]], exc_value: Optional[BaseException], traceback: Optional[TracebackType])[source]¶ Exit method for the context manager.
This method will catch any uprising exception and raise it.
-
prepare
(self)[source]¶ Runs all neccessary processes to prepare the instances.
The child classes should implement this method according to the type of instance.
-
wait_until_accessible
(self)[source]¶ Waits until the instance is accesible through SSHClient
It attempts const.RETRIES time to ping SSH port to See if it’s listening for incoming connections. In each attempt, it waits const.RETRY_DELAY.
Raises: ConnectionError
– If the instance is unaccesible through SSH
-
is_up
(self)[source]¶ Tests wether port 22 is open to incoming SSH connections
Returns: True if instance is listening in port 22. False otherwise. Return type: bool
-
_get_cli
(self)[source]¶ Get an SSHClient in order to execute commands.
This will cache an existing SSHClient to optimize resource. This is a private method and should only be used in this module.
Returns: The client for latter use. Return type: paramiko.SSHClient Raises: SSHConnectingError
– In case opening an SSH connection fails.
-
_run_cmd
(self, cmd: str, retries: int = 1, wd: str = None)[source]¶ Runs a single shell command in the instance through SSH.
The command will be executed in one ssh connection. Don’t expect calling several time to _run_cmd expecting to keep state between commands. To use mutliple commands, use: _run_script
Important: when running docker containers, don’t use -it flag!
This is a private method and should only be used in this module.
Parameters: - cmd (str) – The command to execute.
- retries (int) – The amount of attempts to run the command if it fails. Default to 1.
- wd (str) – The working directory to ‘cd’ before running the command
Returns: A RemoteCommand instance with success boolean and message.
Return type: RemoteCommand
Examples
To get $HOME env
>>> instance._run_cmd("echo $HOME") RemoteCommand(True, "/home/ubuntu")
This will not work
>>> instance._run_cmd("export var=10") >>> instance._run_cmd("echo $var") RemoteCommand(False, "")
This will work
>>> instance._run_cmd("export var=10; echo $var") RemoteCommand(True, "10")
Raises: RemoteCommandError
– In case the cmd failes after retries attempts.
-
_run_script
(self, fname: str, desc: str)[source]¶ Runs a script by copyinh the script to the instance and executing it.
This is a private method and should only be used in this module.
Parameters: - fname (str) – The script filename
- desc (str) – A description for the script purpose. This will be used for the copied filename
Returns: A RemoteCommand instance with success boolean and message.
Return type: RemoteCommand
Raises: RemoteCommandError
– In case the script fails.
-
_remote_script
(self, host_fname: str, desc: str)[source]¶ Sends a local file containing a script to the instance using Paramiko SFTP.
It should be used as a context manager for latter execution of the script. See _run_script on how to use it.
After the context manager exists, then the file is removed from the instance.
This is a private method and should only be used in this module.
Parameters: - host_fname (str) – The local script filename
- desc (str) – A description for the script purpose. This will be used for the copied filename
Yields: str – The remote filename of the copied local file.
Raises: RemoteCommandError
– In case sending the script fails.
-
run_cmds
(self, setup_cmds: List[str])[source]¶ Execute a list of sequential commands
Parameters: setup_cmds (List[str]) – The list of commands Returns: In case at least one command is not successful Return type: RemoteCommandError
-
send_rsync
(self, host_path: str, remote_path: str, params: List[str] = None)[source]¶ Send a local file or folder to a remote instance with rsync.
Parameters: - host_path (str) – The local filename or folder
- remote_path (str) – The remote filename or folder to use
- params (List[str], optional) – Extra parameters to be passed to rsync. For example, [“–filter=’:- .gitignore’”]
Raises: RemoteFileTransferError
– In case sending the file fails.
-
get_home_path
(self)[source]¶ Return the $HOME value of the instance.
Returns: The $HOME env value. Return type: str Raises: RemoteCommandError
– If after 3 retries it is not able to get $HOME.
-
clean_containers
(self)[source]¶ Stop and remove all containers running
Raises: RemoteCommandError
– If command fails
-
clean_container_by_image
(self, image_name: str)[source]¶ Stop and remove all containers given an image name.
Parameters: image_name (str) – The name of the image for which all containers should be stopped and removed. Raises: RemoteCommandError
– If command fails
-
clean_container_by_command
(self, command: str)[source]¶ Stop and remove all containers with the given command.
Parameters: command (str) – The command used to stop and remove the containers Raises: RemoteCommandError
– If command fails
-
install_docker
(self)[source]¶ Install docker in a Ubuntu 18.04 distribution.
Raises: RemoteCommandError
– If it’s not able to install docker. ie. then the installation script fails
-
install_extensions
(self, extensions: Dict[str, str])[source]¶ Install local + pypi extensions.
Parameters: extension (Dict[str, str]) – The extensions, as a dict from module_name to location Raises: errors.RemoteCommandError
– If could not install an extension
-
install_flambe
(self)[source]¶ Pip install Flambe.
If dev mode is activated, then it rsyncs the local flambe folder and installs that version. If not, downloads from pypi.
Raises: RemoteCommandError
– If it’s not able to install flambe.
-
is_docker_installed
(self)[source]¶ Check if docker is installed in the instance.
Executes command “docker –version” and expect it not to fail.
Returns: True if docker is installed. False otherwise. Return type: bool
-
is_flambe_installed
(self, version: bool = True)[source]¶ Check if flambe is installed and if it matches version.
Parameters: version (bool) – If True, also the version will be used. That is, if flag is True and the remote flambe version is different from the local flambe version, then this method will return False. If they match, then True. If version is False this method will return if there is ANY flambe version in the host. Returns: Return type: bool
-
is_docker_running
(self)[source]¶ Check if docker is running in the instance.
Executes the command “docker ps” and expects it not to fail.
Returns: True if docker is running. False otherwise. Return type: bool
-
start_docker
(self)[source]¶ Restart docker.
Raises: RemoteCommandError
– If it’s not able to restart docker.
-
existing_dir
(self, _dir: str)[source]¶ Return if a directory exists in the host
Parameters: _dir (str) – The name of the directory. It needs to be relative to $HOME Returns: True if exists. Otherwise, False. Return type: bool
-
shutdown_node
(self)[source]¶ Shut down the ray node in the host.
If the node is also the main node, then the entire cluster will shut down
-
create_dirs
(self, relative_dirs: List[str])[source]¶ Create the necessary folders in the host.
Parameters: relative_dirs (List[str]) – The directories to create. They should be relative paths and $HOME of each host will be used to add the prefix.
-
class
flambe.cluster.instance.instance.
CPUFactoryInstance
[source]¶ Bases:
flambe.cluster.instance.instance.Instance
This class represents a CPU Instance in the Ray cluster.
CPU Factories are instances that can run only one worker (no GPUs available). This class is mostly useful debugging.
Factory instances will not keep any important information. All information is going to be sent to an orchestrator machine.
-
prepare
(self)[source]¶ Prepare a CPU machine to be a worker node.
Checks if flambe is installed, and if not, installs it.
Raises: RemoteCommandError
– In case any step of the preparing process fails.
-
-
class
flambe.cluster.instance.instance.
GPUFactoryInstance
[source]¶ Bases:
flambe.cluster.instance.instance.CPUFactoryInstance
This class represents an Nvidia GPU Factory Instance.
Factory instances will not keep any important information. All information is going to be sent to an Orchestrator machine.
-
prepare
(self)[source]¶ Prepare a GPU instance to run a ray worker node. For this, it installs CUDA and flambe if not installed.
Raises: RemoteCommandError
– In case any step of the preparing process fails.
-
-
class
flambe.cluster.instance.instance.
OrchestratorInstance
[source]¶ Bases:
flambe.cluster.instance.instance.Instance
The orchestrator instance will be the main machine in a cluster.
It is going to be the main node in the ray cluster and it will also host other services. TODO: complete
All services besides ray will run in docker containers.
This instance does not needs to be a GPU machine.
-
prepare
(self)[source]¶ Install docker and flambe
Raises: RemoteCommandError
– In case any step of the preparing process fails.
-
launch_report_site
(self, progress_file: str, port: int, output_log: str, output_dir: str, tensorboard_port: int)[source]¶ Launch the report site.
The report site is a Flask web app.
Raises: RemoteCommandError
– In case the launch process fails
-
is_tensorboard_running
(self)[source]¶ Return wether tensorboard is running in the host as docker.
Returns: True if Tensorboard is running, False otherwise. Return type: bool
-
is_report_site_running
(self)[source]¶ Return wether the report site is running in the host
Returns: Return type: bool
-
launch_tensorboard
(self, logs_dir: str, tensorboard_port: int)[source]¶ Launch tensorboard.
Parameters: - logs_dir (str) – Tensorboard logs directory
- tensorboard_port (int) – The port where tensorboard will be available
Raises: RemoteCommandError
– In case the launch process fails
-
existing_tmux_session
(self, session_name: str)[source]¶ Return if there is an existing tmux session with the same name
Parameters: session_name (str) – The exact name of the searched tmux session Returns: Return type: bool
-
kill_tmux_session
(self, session_name: str)[source]¶ Kill an existing tmux session
Parameters: session_name (str) – The exact name of the tmux session to be removed
-
launch_flambe
(self, config_file: str, secrets_file: str, force: bool)[source]¶ Launch flambe execution in the remote host
Parameters: - config_file (str) – The config filename relative to the orchestrator
- secrets_file (str) – The filepath containing the secrets for the orchestrator
- force (bool) – The force parameters that was originally passed to flambe
-
launch_node
(self, port: int)[source]¶ Launch the main ray node in given sftp server in port 49559.
Parameters: port (int) – Available port to launch the redis DB of the main ray node Raises: RemoteCommandError
– In case the launch process fails
-