
Introduction to UNIX/LINUX Terminal/Command-Line Computing
0
21
0
Understanding the basics of UNIX/Linux computing on command line interfaces is a core foundation for performing bioinformatic tasks and designing pipelines. Tasks such as directory navigation, file manipulation, and compression is a fundamental starting point for beginners learning bioinformatics. Large alignment or fastq read files, for example, are often compressed to save disk space or allow compatibility with other software. Being able to check working directories as well as file or directory contents are important not just for navigating and using files in a command line interface, but are key starting points for debugging scripts that require using files across nested directory structures. Piping and redirection commands are ubiquitous when performing several sequential tasks on a given file and generating new output files. In this post we will go over basic functions and commands to navigate and manipulate directories and data files.
Table of Contents
1.Navigating Directories and Networking
2.Checking and Manipulating Files
1. Navigating Directories and Networking
pwd: Print Working Directory
Description: Shows the current directory you are in. This basic command can be used to orient yourself during navigation across different directories and is also very useful for debugging code to ensure the script is in the appropriate working directory required for executing downstream code.
Example Usage:
input:
[user@cluster directory]$ pwd
output:
/path/to/your/current/directory
ls: List Directory Contents
Description: Lists files and directories in the current directory. ls -l gives detailed information, and ls -a includes hidden files. Detailed information returns the following columns for each file: Permissions, User File Origin, User Group, File size, Date Created, File Name. Using the '*'+'extension' asterisk wildcard after the ls allows you to only list files with a specific suffix. This command is also useful for debugging code to ensure required files are present in the current directory prior to executing downstream code.
Example Usage:
input:
[user@cluster ~]$ ls
output:
stderror.txt
stdout.txt
logfile.log
folder
Detailed Example Usage:
input:
[user@cluster ~]$ ls -l *.txt
output:
-rw-r--r-- 1 user clusterusers 47849 Oct 18 2024 stdout.txt
-rw------- 1 user clusterusers 1073 Oct 18 2024 stderror.txt
cd: Change Directory
Description: Moves you to the specified directory. cd .. goes up one directory level (i.e. goes to preceding directory), and cd ~ takes you to your home directory. You must be downstream of the folder you want to navigate to if you want to navigate through several nested directories
Example Usage:
input:
[user@cluster ~]$ cd directory_name
output:
[user@cluster directory_name]$ (Now in folder named 'directory_name')
Detailed Example Usage:
input:
[user@cluster ~]$ cd path/to/directory_name
output:
[user@cluster directory_name]$
(Now in nested directory: '/path/to/directory_name')
Testing directory path:
[user@cluster directory_name]$ pwd
/path/to/directory_name
ssh: Secure Shell
Description: Connects to a remote machine securely. This command is used to connect to computing clusters (i.e. remote hosts) from your local machine.
Example Usage:
input:
Local-Machine:~ local_user$ ssh user@cluster.org
output:
(Will prompt to enter user's password)
user@cluster.org's password: *************
(Connects to remote host)
scp: Secure Copy
Description: Copies files between hosts over a secure connection. Allows you to copy files from your local computer to a remote host/cluster.
Example Usage:
Local-Machine:~ local_user$ scp file_name user@cluster.org:/path/to/destination
(Will begin copying file 'file_name' to cluster.org host, in the directory: /path/to/destination/ found within the remote host)
2. Checking and Manipulating Files
cp: Copy Files and Directories
Description: Copies files or directories represented in your current directory into another directory. Use -r for copying directories using the folder name rather than a file.
Example Usage:
input:
[user@cluster ~]$ cp file_name /path/to/copy/destination
(file_name will now be copied from your current home directory ('~') to the directory: /path/to/copy/destination/)
Checking file was properly copied:
[user@cluster ~]$ cd /path/to/copy/destination/
[user@cluster destination]$ ls
file_name
(file was properly copied)
mv: Move (or Rename) Files and Directories.
Description: Moves or renames files or directories rather than copying them. Using the -f flag will overwrite any file or directory folder with the same name in the destination directory. Does not require additional flag to move directories.
Example Usage:
input:
[user@cluster ~]$ mv file_name.txt /path/to/copy/destination
(file 'file_name.txt' has been moved to /path/to/copy/destination)
rm: Remove Files or Directories
Description: Deletes file. Use -r for directories.
Example Usage:
input:
[user@cluster ~]$ rm file_name
(file 'file_name.txt' has been removed)
[user@cluster ~]$ rm -r directory_name
(directory 'directory_name' has been removed)
head/tail: Display Leading or Ending Lines of a File.
Description: head will display the first 10 lines of a file and tail will dsiplay the last 10 lines of a file. Using the -n flag allows you to designate the specific number of lines to display.Can be used on several files by sequentially denoting multiple filenames after the command.
Example Usage: input:
[user@cluster ~]$ head -n 3 file_name.txt
output:
line1
line2
line3
(Displays first 3 lines in file_name.txt, use tail to display last 3 lines)
wc: Word Count.
Description: The wc command is used to display the number of lines, words, and bytes in a file or standard input. It is a versatile utility that can be used to get various statistics about file content based on different available flags. Can be used on several files by sequentially denoting multiple filenames after the command.
Common Options:
-l: Count the number of lines.
Example Usage:
input:
[user@cluster ~]$ wc -l file.txt
output:
20 file.txt
(displays the number of lines in file.txt)
-w: Count the number of words.
Example Usage:
input:
[user@cluster ~]$ wc -w file.txt
output:
48 file.txt
(displays the number of words in file.txt)
-c: Count the number of bytes.
Example Usage:
input:
[user@cluster ~]$ wc -c file.txt
output:
512 file.txt
(displays the number of bytes in file.txt)
-m: Count the number of characters.
Example Usage:
input:
[user@cluster ~]$ wc -m file.txt
output:
478 file.txt
(displays the number of characters in file.txt)
-L: Display the length of the longest line.
Example Usage:
input:
[user@cluster ~]$ wc -L file.txt
output:
140 file.txt
(displays the length of the longest line in file.txt)
nano: Command Line Text Editor
Description: nano is a text editor that is commonly pre-installed on many UNIX/Linux systems and is useful to check and navigate the content of text files while using the command line. Below are commonly used shortcuts, but they are also referenced at the bottom of the window once nano has been opened
Example Usage:
input:
[user@cluster ~]$ nano textfile.txt
(Opens textfile.txt in nano)
Commonly used shortcuts:
Ctrl + X: Exit
Description: Exits nano. If there are unsaved changes, nano will ask if you want to save them.
Ctrl + W: Where Is (Search)
Description: Searches for a string or text within the file.
Ctrl + K: Cut Text
Description: Cuts the entire line where the cursor is located and stores it in the cutbuffer.
Ctrl + U: Uncut Text (Paste)
Description: Pastes the text from the cutbuffer (where cut text is stored) at the cursor position.
Ctrl + _: Go to Line and Column Number
Description: Prompts you to enter the line and column number to move the cursor directly to that position.
3. Compression
tar: Tape Archive (used for creating and extracting archive files)
Description: Compresses (-c) and extracts (-x) files or directories into/from a .tar.gz file. -z specifies gzip compression, -v is for verbose output, and -f specifies the filename.
Example Usage:
input:
[user@cluster ~]$ tar -czf archive_name.tar.gz directory/name/path/
(creates empty gzip compressed archive into directory/name/path/)
input:
[user@cluster ~]$ tar -xzvf archive_name.tar.gz
(extracts files from gzip compressed file named archive_name.tar.gz)
gzip/gunzip: GNU zip/unzip (compresses/uncompresses individual files)
Description: Compresses the specified file into a .gz file.
Example Usage:
input:
[user@cluster ~]$ gzip file_name.fastq
(file has been compressed)
checking output:
[user@cluster ~]$ ls
file_name.fastq.gz
input:
[user@cluster ~]$ gunzip file_name.fastq.gz OR gzip -d file_name.fastq.gz
(file has been uncompressed)
checking output:
[user@cluster ~]$ ls
file_name.fastq
Common gzip Flags:
-c: Write output to standard output and keep original files unchanged.
-d: Decompress the file (same as gunzip).
-k: Keep the input files (do not delete the original file after compression).
-r: Recursively compress files in directories to compress all files in a folder.
-v: Verbose mode; displays the name and percentage reduction for each file compressed.
-1 to -9: Compression levels; -1 is the fastest (least compression), -9 is the slowest (most compression). Default is -6. Choosing the appropriate compression level with gzip depends on your specific needs regarding speed and file size. Lower levels prioritize speed, while higher levels prioritize the reduction of file size.
4. Piping and Redirection
|: Pipe (e.g. command1 | command2)
Description: Piping in UNIX/Linux takes the output of one command (command1) and uses it as the input for another command (command2). This is especially useful when you need to use different software or commands in sequence, starting from an initial input file. This allows you perform multiple tasks in a single line, which saves time, storage space, and eliminates the need to create and remove intermediate files.
Example Usage:
input:
[user@cluster ~]$ tail -n 1000 logfile.log | head -n 20
(Shows the last 1000 lines of logfile.log and then displays the first 20 lines of that output, effectively showing lines 981-1000 of the log file)
>: Redirect Output
Description: Redirects the output of a command to a file, overwriting the file if it exists. This can be combined with pipes to perform several commands prior to outputting the final file.
Example Usage:
input:
[user@cluster ~]$ tail -n 1000 logfile.log | head -n 20 > lines981_1000.txt
(Takes lines 981-1000 of logfile.log from piping command and outputs them into a new file named 'lines981_1000.txt')
>>: Append Output
Description: Appends the output of a command to an existing file (or new one if it does not exist). This is similar to the '>' command, but is useful when wanted to add data to a file that has already been created in the current directory.
Example Usage:
[user@cluster ~]$ ls >> file_name.txt
(Adds list of files in current directroy from 'ls' to an already existing file named 'file_name.txt' in the same directory)
cat: Concatenate Files
Description: Concatenates contents of several files, usually used in concert with the '*' wildcard to convert several files into one single file, or can be used to concatenate specific files by denoting specific filenames sequentially after the command.
Example Usage:
[user@cluster ~]$ cat *.fasta > allfastafiles.fasta
(Concatenates all files ending with .fasta into one file named 'allfastafiles.fasta'
5. Command Line Shortcuts
The following are useful shortcuts that can be used to optimize typing and end processes during LINUX/UNIX computing.
Ctrl + C: Terminate Current Process
Description: Stops the currently running command or process.
Ctrl + Z: Suspend Current Process
Description: Pauses the currently running command or process, allowing you to resume it later with fg (foreground) or bg (background).
Ctrl + D: Logout or End Input
Description: Closes the current shell session or signals the end of input when typing into a terminal.
Ctrl + L: Clear Terminal
Description: Clears the terminal screen.
Ctrl + A: Move to Beginning of Line
Description: Moves the cursor to the beginning of the current line.
Ctrl + E: Move to End of Line
Description: Moves the cursor to the end of the current line.
Ctrl + K: Cut Text to End of Line
Description: Deletes (cuts) the text from the cursor's current position to the end of the line.
Ctrl + U: Cut Text to Beginning of Line
Description: Deletes (cuts) the text from the cursor's current position to the beginning of the line.
Ctrl + W: Cut Previous Word
Description: Deletes (cuts) the word before the cursor.
Ctrl + Y: Paste Last Cut Text
Description: Pastes the text that was most recently cut with Ctrl + K, Ctrl + U, or Ctrl + W.
Ctrl + R: Search Command History
Description: Allows you to search through previously executed commands in the shell history.
Ctrl + P: Previous Command
Description: Scrolls up through the command history to the previous command.
Ctrl + N: Next Command
Description: Scrolls down through the command history to the next command.
Ctrl + T: Transpose Characters
Description: Swaps the position of the character at the cursor with the character before it.
Esc + B: Move Backward One Word
Description: Moves the cursor one word back.
Esc + F: Move Forward One Word
Description: Moves the cursor one word forward.
Alt + .: Insert Last Argument
Description: Inserts the last argument of the previous command at the cursor's position.
Tab: Autocomplete
Description: Autocompletes file and directory names so you don't have to type the full name manualy. Pressing Tab twice will show all possible completions.
Ctrl + X Ctrl + E: Edit Command in Text Editor
Description: Opens the current command in the default text editor for more complex editing.
Ctrl + G: Cancel Command
Description: Exits the current command or aborts a search.