Bash - 10 - File Viewing and Editing Commands

Filters are Commands which accepts data from standard input, manipulates it and write the result to standard output.
Each performs a simple function, these can be combined with other tools using redirection and piping.


1. File Creation and Editing

Creating a Text File Using the Nano Editor

To create a text file with the nano editor:

nano draft.txt  
# Opens nano for editing

Editing in Nano:

  • Type your content, then save using Ctrl + O, followed by Enter, and exit with Ctrl + X.
  • Note: nano is a simple text editor suitable for plain text files. For more complex editing, consider Emacs, Vim, or graphical editors like Gedit or VS Code.

Alternatives: On Windows, you can use editors like Notepad++ or Notepad.


Creating a Blank File with touch

touch my_file.txt  
# Creates a blank text file

touch primarily modifies the last access or modification time of a file. If the file doesn’t exist, it creates an empty one.

  • -a: Modifies only the access time.
  • -m: Modifies only the modification time.
  • -t: Allows you to specify a custom timestamp in the format [[CC]YY]MMDDhhmm[.ss].

2. Text File Viewing and Analysis

Word Count with wc

The wc command outputs the count of characters (bytes), words (whitespace between characters), and lines (\n) in a text file.

  • -c, -m: Limits count to characters.
  • -l: Line count.
  • -w: Word count.
  • -L: Displays the length of the longest line.
wc -l *.pdb  
# Displays the word count for all .pdb files in the current directory

Piping the result of a command to wc: Counts the number of lines in the output of ls -l

ls -l | wc -l

If wc -l is run without specifying a filename, it waits for input, can exit using Ctrl + C.


Finding Printable Characters with strings

strings sujith.jpeg -n 10
  • strings extracts printable characters from a file, even from binary files like .jpeg.
  • -n number can override the length of strings to search for.
6*&&*6>424>LDDL_Z_||
6*&&*6>424>LDDL_Z_||
Bm&EmvX{2;
!1 0AQ@Paq

3. Text File Navigation Commands

Viewing Files with more, less, and cat

  • cat: Displays the entire content of a file. Best for small files.

  • more: Displays a screen of content at a time. Press <space> to scroll forward, and q to quit.

  • -n: Specify the number of lines to show per screen.

  • +linenumber: Start viewing from a specific line number.

  • less: Similar to more, but allows both forward and backward navigation with arrow keys. To quit, press q.

  • Bonus: less works like a vi browser, allowing vi-style navigation.


Viewing Specific Parts of a File with head and tail

head and tail head: By default, shows the first 10 lines of a file.
We can precede the integer with a minus sign to indicate that the program should skip that number of bytes or lines.

  • head -n 5 file.txt: Displays the first 5 lines.
  • head -n -3 file.txt: Shows all but the last 3 lines.
  • head -c -20 file.txt: Stops at the 21st byte of the file.

tail: By default, shows the last 10 lines of a file.
Precede the integer with plus to indicate the starting point within the file.

  • tail -n 5 file.txt: Displays the last 5 lines.
  • tail -n +12 file.txt: Displays the file starting from line 12.

4. Sorting and Manipulating Files

Sorting Files with sort

The sort command sorts lines of text files / output:

sort lengths.txt  # Sorts alphanumerically by default
sort -n           # Sorts numerically
sort -r           # Sorts in reverse order
  • -r: Reverse order.
  • -f: Ignore case differences.
  • -n: Numeric sorting.

Sort doesn’t change the file, but sends results to screen.
To sort a file and redirect the output to a new file:

sort -n lengths.txt > sorted-lengths.txt

5. File Comparison and Difference Commands

Comparing Files with cmp, comm, and diff

cmp : Comparing two files

cmp: Compares two files byte by byte. It stops when the first difference is found, showing the byte and line number.

cmp file1 file2 -i 100:150 -n 1024

Compares files one and two, starting at 101 of file1 and 151 of file2. Comparing 1024 bytes.

cmp can be forced to skip over a specified number of bytes for each file or stop after reaching a specified limit. If no mismatch then no output. The default counting used with cmp -i (--ignore-initial) is a value in bytes (characters)

When the two files are identical, cmp returns a prompt without any message. This behavior is important as comparison return a true value which is used in shell script to control the flow of the program.


comm : What is Common?

comm: Compares two sorted files and outputs three columns:

  • Column 1: Lines only in the first file.
  • Column 2: Lines common to both files.
  • Column 3: Lines only in the second file.

Options can be used to drop a particular column, they can also be combined. -1: Suppress lines unique to the first file.
-2: Suppress lines unique to the second file.
-3: Suppress lines common to both files.


diff : Converting one file to another.

diff: Compares two files or directories and outputs differences. Useful for checking changes between files.
It also tells which lines have to be changed to make the two files identical using special symbols and instructions to indicate the changes required.

-i: Case-insensitive comparison.
-b: Ignore differences in spaces.
-y: Output in columns for side-by-side comparison.


Removing Duplicate Lines with uniq

It operates on a single file, searching for consecutive duplicate lines. Parameters can be used to remove duplicate lines.
It does not overwrite the file but the output can be can be moved to a new file.

uniq file.txt > file_without_duplicates.txt

-c for counting occurrences,
-d for displaying only duplicate lines.


6. File Manipulation Commands

Joining Files with join

Joins two sorted files based on a common field (default is field 1).

join file1.txt file2.txt

When the two files contain a row that contains that same value, then those two lines are joined together. Lines that do not contain a matching first field are not joined.
(Joining tables using a matching keys)

-1 NUM: Specifies which field to join on in the first file.
-2 NUM: Specifies which field to join on in the second file.
-i: Ignore case differences.
-e uses STRING in place of an empty field
-a 1 or -a 2 outputs lines from the first or second file which did not contain a match to the other file.


Merging Files with paste

paste file1.txt file2.txt

paste merges files line by line without requiring a common field. The first line is appended to the first line of other file.


Splitting Files with split

split -b 1000 file.txt prefix

split divides a large file into smaller files. By default, each file is 1000 bytes.
We specify the file to split and a prefix which is name used for new files.

-b value: Specifies the byte size per file.
-d: Use numeric suffixes (e.g., 00, 01).


Extracting Data with cut

Slitting a file vertically.
The cut command is used to remove or extract specific sections of each line in a file:

cut -d , -f 2 animals.csv
# Extracts the second field from a comma-delimited file

-d: Specifies the delimiter (e.g., comma, space).
-f: Specifies the field(s) to extract.
--complement: Returns everything except the specified fields.

To get the three (fields)columns of data within a table which are delimited by tab

cut -f 3,4,6 file

if the delimiter was space then it has to specified using -d ' '

We can pipe the results of other command to reduce the output.

To remove duplicates from the output, you can pipe cut into sort and uniq:

$ cut -d , -f 2 animals.csv | sort | uniq

Removing the duplicates using uniq
Using uniq -c gives the count of occurrences for each line in input.

ls -l | cut -c 2-10 outputs only the permissions of the files which starts from 2nd character to 10th.

sujith@sujith-Latitude-7490:~/Desktop$ ls -l | cut -c 2-10
otal 56
rw-rw-r--
rwxrwxr-x
rw-rw-r--
rwxr-xr-x
rwxr-xr-x
rwxrwxr-x
rwxrwxr-x
rwxrwxr-x

To get the permissions and the file name (getting the first 2-10 chars and also the 9th field where the names are present, with delimiter being space.) ls -l | cut -f 1,9 -d ' ' won’t work properly due to unevenness.

(awk offers better solution that cut for selecting fields)


Example Workflow

cd nart-pacific-gyre
wc -l *.txt           # Get the word count for all .txt files
wc -l *.txt | sort -n | head -n 5  # Display the first five file line counts
wc -l *.txt | sort -n | tail -n 5  # Display the last five file line counts

Text Editors: vi and vim

vi (or its improved version vim) is the default text editor found in most Linux distributions. It is used for editing text files from the command line. vim (Vi IMproved) offers additional features like syntax highlighting, better search functionalities, and more. While vi is still widely used, vim is recommended due to its enhanced capabilities.