Missing Semester Lecture 5 - Command-line Environment
MIT The Missing semester Lecture of Your CS Education Lecture 5 - Command-line Environment
Job Control
In some cases you will need to interrupt a job while it is executing, for instance if a command is taking too long to complete (such as a find with a very large directory structure to search through). Most of the time, you can do Ctrl-C and the command will stop. But how does this actually work and why does it sometimes fail to stop the process?
Killing a process
Your shell is using a UNIX communication mechanism called a signal to communicate information to the process. When a process receives a signal it stops its execution, deals with the signal and potentially changes the flow of execution based on the information that the signal delivered. For this reason, signals are software interrupts.
In our case, when typing Ctrl-C this prompts the shell to deliver a SIGINT signal to the process.
Here’s a minimal example of a Python program that captures SIGINT and ignores it, no longer stopping. To kill this program we can now use the SIGQUIT signal instead, by typing Ctrl-\.
1 |
|
Here’s what happens if we send SIGINT twice to this program, followed by SIGQUIT. Note that ^ is how Ctrl is displayed when typed in the terminal.
1 | $ python sigint.py |
While SIGINT and SIGQUIT are both usually associated with terminal related requests, a more generic signal for asking a process to exit gracefully is the SIGTERM signal. To send this signal we can use the kill command, with the syntax kill -TERM <PID>.
Pausing and backgrounding processes
Signals can do other things beyond killing a process. For instance, SIGSTOP pauses a process. In the terminal, typing Ctrl-Z will prompt the shell to send a SIGTSTP signal, short for Terminal Stop (i.e. the terminal’s version of SIGSTOP).
We can then continue the paused job in the foreground or in the background using fg or bg, respectively.
The jobs command lists the unfinished jobs associated with the current terminal session. You can refer to those jobs using their pid (you can use pgrep to find that out). More intuitively, you can also refer to a process using the percent symbol followed by its job number (displayed by jobs). To refer to the last backgrounded job you can use the $! special parameter.
One more thing to know is that the & suffix in a command will run the command in the background, giving you the prompt back, although it will still use the shell’s STDOUT which can be annoying (use shell redirections in that case).
To background an already running program you can do Ctrl-Z followed by bg. Note that backgrounded processes are still children processes of your terminal and will die if you close the terminal (this will send yet another signal, SIGHUP). To prevent that from happening you can run the program with nohup (a wrapper to ignore SIGHUP), or use disown if the process has already been started. Alternatively, you can use a terminal multiplexer as we will see in the next section.
Below is a sample session to showcase some of these concepts.
1 | $ sleep 1000 |
A special signal is SIGKILL since it cannot be captured by the process and it will always terminate it immediately. However, it can have bad side effects such as leaving orphaned children processes.
Aliases
It can become tiresome typing long commands that involve many flags or verbose options. For this reason, most shells support aliasing. A shell alias is a short form for another command that your shell will replace automatically for you. For instance, an alias in bash has the following structure:
1 | alias alias_name="command_to_alias arg1 arg2" |
Note that there is no space around the equal sign =, because alias is a shell command that takes a single argument.
Aliases have many convenient features:
1 | # Make shorthands for common flags |
Note that aliases do not persist shell sessions by default. To make an alias persistent you need to include it in shell startup files, like .bashrc or .zshrc.
Remote Machines
It has become more and more common for programmers to use remote servers in their everyday work. If you need to use remote servers in order to deploy backend software or you need a server with higher computational capabilities, you will end up using a Secure Shell (SSH). As with most tools covered, SSH is highly configurable so it is worth learning about it.
To ssh into a server you execute a command as follows
1 | ssh foo@bar.mit.edu |
Here we are trying to ssh as user foo in server bar.mit.edu. The server can be specified with a URL (like bar.mit.edu) or an IP (something like foobar@192.168.1.42). Later we will see that if we modify ssh config file you can access just using something like ssh bar.
Executing commands
An often overlooked feature of ssh is the ability to run commands directly. ssh foobar@server ls will execute ls in the home folder of foobar. It works with pipes, so ssh foobar@server ls | grep PATTERN will grep locally the remote output of ls and ls | ssh foobar@server grep PATTERN will grep remotely the local output of ls.
Port Forwarding
In many scenarios you will run into software that listens to specific ports in the machine. When this happens in your local machine you can type localhost:PORT or 127.0.0.1:PORT, but what do you do with a remote server that does not have its ports directly available through the network/internet?.
This is called port forwarding and it comes in two flavors: Local Port Forwarding and Remote Port Forwarding.
Local Port Forwarding
Remote Port Forwarding
The most common scenario is local port forwarding, where a service in the remote machine listens in a port and you want to link a port in your local machine to forward to the remote port. For example, if we execute jupyter notebook in the remote server that listens to the port 8888. Thus, to forward that to the local port 9999, we would do ssh -L 9999:localhost:8888 foobar@remote_server and then navigate to localhost:9999 in our local machine.
Exercises
Job control
- From what we have seen, we can use some
ps aux | grepcommands to get our jobs’ pids and then kill them, but there are better ways to do it. Start asleep 10000job in a terminal, background it withCtrl-Zand continue its execution withbg. Now usepgrepto find its pid andpkillto kill it without ever typing the pid itself. (Hint: use the-afflags).
1 | sleep 10000 |
- Say you don’t want to start a process until another completes. How would you go about it? In this exercise, our limiting process will always be
sleep 60 &. One way to achieve this is to use thewaitcommand. Try launching the sleep command and having anlswait until the background process finishes.
1 | sleep 60 & |
However, this strategy will fail if we start in a different bash session, since wait only works for child processes. One feature we did not discuss in the notes is that the kill command’s exit status will be zero on success and nonzero otherwise. kill -0 does not send a signal but will give a nonzero exit status if the process does not exist. Write a bash function called pidwait that takes a pid and waits until the given process completes. You should use sleep to avoid wasting CPU unnecessarily.
1 | pidwait () { |
Aliases
- Create an alias
dcthat resolves tocdfor when you type it wrongly.
1 | alias dc="cd" |
- Run
history | awk '{$1="";print substr($0,2)}' | sort | uniq -c | sort -n | tail -n 10to get your top 10 most used commands and consider writing shorter aliases for them. Note: this works for Bash; if you’re using ZSH, usehistory 1instead of justhistory.
1 | alias recent="history | awk '{\$1=\"\";print substr(\$0,2)}' | sort | uniq -c | sort -n | tail -n 10" |