Missing Semester Lecture 5 - Command-line Environment

Posted at 2024-09-01 Missing Semester

MIT The Missing semester Lecture of Your CS Education Lecture 5 - Command-line Environment

Job Control

In some cases you will need to interrupt a job while it is executing, for instance if a command is taking too long to complete (such as a find with a very large directory structure to search through). Most of the time, you can do Ctrl-C and the command will stop. But how does this actually work and why does it sometimes fail to stop the process?

Killing a process

Your shell is using a UNIX communication mechanism called a signal to communicate information to the process. When a process receives a signal it stops its execution, deals with the signal and potentially changes the flow of execution based on the information that the signal delivered. For this reason, signals are software interrupts.

In our case, when typing Ctrl-C this prompts the shell to deliver a SIGINT signal to the process.

Here’s a minimal example of a Python program that captures SIGINT and ignores it, no longer stopping. To kill this program we can now use the SIGQUIT signal instead, by typing Ctrl-\.

#!/usr/bin/env python
import signal, time

def handler(signum, time):
    print("\nI got a SIGINT, but I am not stopping")

signal.signal(signal.SIGINT, handler)
i = 0
while True:
    time.sleep(.1)
    print("\r{}".format(i), end="")
    i += 1

Here’s what happens if we send SIGINT twice to this program, followed by SIGQUIT. Note that ^ is how Ctrl is displayed when typed in the terminal.

$ python sigint.py
24^C
I got a SIGINT, but I am not stopping
26^C
I got a SIGINT, but I am not stopping
30^\[1]    39913 quit       python sigint.py

While SIGINT and SIGQUIT are both usually associated with terminal related requests, a more generic signal for asking a process to exit gracefully is the SIGTERM signal. To send this signal we can use the kill command, with the syntax kill -TERM <PID>.

Pausing and backgrounding processes

Signals can do other things beyond killing a process. For instance, SIGSTOP pauses a process. In the terminal, typing Ctrl-Z will prompt the shell to send a SIGTSTP signal, short for Terminal Stop (i.e. the terminal’s version of SIGSTOP).

We can then continue the paused job in the foreground or in the background using fg or bg, respectively.

The jobs command lists the unfinished jobs associated with the current terminal session. You can refer to those jobs using their pid (you can use pgrep to find that out). More intuitively, you can also refer to a process using the percent symbol followed by its job number (displayed by jobs). To refer to the last backgrounded job you can use the $! special parameter.

One more thing to know is that the & suffix in a command will run the command in the background, giving you the prompt back, although it will still use the shell’s STDOUT which can be annoying (use shell redirections in that case).

To background an already running program you can do Ctrl-Z followed by bg. Note that backgrounded processes are still children processes of your terminal and will die if you close the terminal (this will send yet another signal, SIGHUP). To prevent that from happening you can run the program with nohup (a wrapper to ignore SIGHUP), or use disown if the process has already been started. Alternatively, you can use a terminal multiplexer as we will see in the next section.

Below is a sample session to showcase some of these concepts.

$ sleep 1000
^Z
[1]  + 18653 suspended  sleep 1000

$ nohup sleep 2000 &
[2] 18745
appending output to nohup.out

$ jobs
[1]  + suspended  sleep 1000
[2]  - running    nohup sleep 2000

$ bg %1
[1]  - 18653 continued  sleep 1000

$ jobs
[1]  - running    sleep 1000
[2]  + running    nohup sleep 2000

$ kill -STOP %1
[1]  + 18653 suspended (signal)  sleep 1000

$ jobs
[1]  + suspended (signal)  sleep 1000
[2]  - running    nohup sleep 2000

$ kill -SIGHUP %1
[1]  + 18653 hangup     sleep 1000

$ jobs
[2]  + running    nohup sleep 2000

$ kill -SIGHUP %2

$ jobs
[2]  + running    nohup sleep 2000

$ kill %2
[2]  + 18745 terminated  nohup sleep 2000

$ jobs

A special signal is SIGKILL since it cannot be captured by the process and it will always terminate it immediately. However, it can have bad side effects such as leaving orphaned children processes.

Aliases

It can become tiresome typing long commands that involve many flags or verbose options. For this reason, most shells support aliasing. A shell alias is a short form for another command that your shell will replace automatically for you. For instance, an alias in bash has the following structure:

1	alias alias_name="command_to_alias arg1 arg2"

Note that there is no space around the equal sign =, because alias is a shell command that takes a single argument.

Aliases have many convenient features:

# Make shorthands for common flags
alias ll="ls -lh"

# Save a lot of typing for common commands
alias gs="git status"
alias gc="git commit"
alias v="vim"

# Save you from mistyping
alias sl=ls

# Overwrite existing commands for better defaults
alias mv="mv -i"           # -i prompts before overwrite
alias mkdir="mkdir -p"     # -p make parent dirs as needed
alias df="df -h"           # -h prints human readable format

# Alias can be composed
alias la="ls -A"
alias lla="la -l"

# To ignore an alias run it prepended with \
\ls
# Or disable an alias altogether with unalias
unalias la

# To get an alias definition just call it with alias
alias ll
# Will print ll='ls -lh'

Note that aliases do not persist shell sessions by default. To make an alias persistent you need to include it in shell startup files, like .bashrc or .zshrc.

Remote Machines

It has become more and more common for programmers to use remote servers in their everyday work. If you need to use remote servers in order to deploy backend software or you need a server with higher computational capabilities, you will end up using a Secure Shell (SSH). As with most tools covered, SSH is highly configurable so it is worth learning about it.

To ssh into a server you execute a command as follows

1	ssh foo@bar.mit.edu

Here we are trying to ssh as user foo in server bar.mit.edu. The server can be specified with a URL (like bar.mit.edu) or an IP (something like foobar@192.168.1.42). Later we will see that if we modify ssh config file you can access just using something like ssh bar.

Executing commands

An often overlooked feature of ssh is the ability to run commands directly. ssh foobar@server ls will execute ls in the home folder of foobar. It works with pipes, so ssh foobar@server ls | grep PATTERN will grep locally the remote output of ls and ls | ssh foobar@server grep PATTERN will grep remotely the local output of ls.

Port Forwarding

In many scenarios you will run into software that listens to specific ports in the machine. When this happens in your local machine you can type localhost:PORT or 127.0.0.1:PORT, but what do you do with a remote server that does not have its ports directly available through the network/internet?.

This is called port forwarding and it comes in two flavors: Local Port Forwarding and Remote Port Forwarding.

Local Port Forwarding

Remote Port Forwarding

The most common scenario is local port forwarding, where a service in the remote machine listens in a port and you want to link a port in your local machine to forward to the remote port. For example, if we execute jupyter notebook in the remote server that listens to the port 8888. Thus, to forward that to the local port 9999, we would do ssh -L 9999:localhost:8888 foobar@remote_server and then navigate to localhost:9999 in our local machine.

Exercises

Job control

From what we have seen, we can use some ps aux | grep commands to get our jobs’ pids and then kill them, but there are better ways to do it. Start a sleep 10000 job in a terminal, background it with Ctrl-Z and continue its execution with bg. Now use pgrep to find its pid and pkill to kill it without ever typing the pid itself. (Hint: use the -af flags).

sleep 10000
^Z
[1]+  已停止               sleep 10000

jobs
[1]+  已停止               sleep 10000

bg %1
[1]+ sleep 10000 &

pgrep -af "sleep 10000"
1611 sleep 10000

pkill -f "sleep 10000"
[1]+  已终止               sleep 10000

jobs

Say you don’t want to start a process until another completes. How would you go about it? In this exercise, our limiting process will always be sleep 60 &. One way to achieve this is to use the wait command. Try launching the sleep command and having an ls wait until the background process finishes.

sleep 60 &
[1] 1617

jobs
[1]+  运行中               sleep 60 &

wait 1617
[1]+  完成                  sleep 60

ls
anaconda-ks.cfg  file_system_hw  HW4  last-modified.txt  nohup.out  test

However, this strategy will fail if we start in a different bash session, since wait only works for child processes. One feature we did not discuss in the notes is that the kill command’s exit status will be zero on success and nonzero otherwise. kill -0 does not send a signal but will give a nonzero exit status if the process does not exist. Write a bash function called pidwait that takes a pid and waits until the given process completes. You should use sleep to avoid wasting CPU unnecessarily.

pidwait () {
    while kill -0 $1 2>/dev/null
    do
    sleep 1
    done
    ls
}

Aliases

Create an alias dc that resolves to cd for when you type it wrongly.

1	alias dc="cd"

Run history | awk '{$1="";print substr($0,2)}' | sort | uniq -c | sort -n | tail -n 10 to get your top 10 most used commands and consider writing shorter aliases for them. Note: this works for Bash; if you’re using ZSH, use history 1 instead of just history.

1
2
3

alias recent="history | awk '{\$1=\"\";print substr(\$0,2)}' | sort | uniq -c | sort -n | tail -n 10"

alias recent='history | awk '\''{$1="";print substr($0,2)}'\'' | sort | uniq -c | sort -n | tail -n 10'

Previous post: Missing Semester Lecture 6 - Version Control (Git) Next post: Missing Semester Lecture 4 - Data Wrangling

Cheyne's Blog

Job Control

Killing a process

Pausing and backgrounding processes

Aliases

Remote Machines

Executing commands

Port Forwarding

Exercises

Job control

Aliases