Master the Linux ‘uniq’ Command: A Comprehensive Guide

Peter Hou
2 min readMay 20, 2023

--

This guide provides an in-depth exploration of the Linux ‘uniq’ command, used for identifying and eliminating duplicate lines in a text file. Readers will learn how to use ‘uniq’, along with its most common and useful parameters. The guide also provides real-world use cases, advanced techniques, and critical points to be aware of when using ‘uniq’. With this information, readers can start leveraging ‘uniq’ to more efficiently process text data.

Instructions

In this guide, we will explore the ‘uniq’ command in Linux, which is used to report or filter out repeated lines in a file. This utility can be extremely useful when you need to process large amounts of data.

History

The ‘uniq’ command is part of the GNU core utilities package available on almost all Unix-like operating systems. It was initially designed to deal with the common problem of having duplicate lines in a file.

When and why to use it

The ‘uniq’ command is best used when you need to identify or eliminate repeated lines in a file. This could be useful in a variety of applications such as sorting logs, data processing, or simplifying lists.

How to use it

To use ‘uniq’, you typically pipe the output of a ‘sort’ command to it, as ‘uniq’ only detects repeated lines if they are adjacent.

$ sort file.txt | uniq

The commonly used parameters

  • -d - This option will only print duplicate lines.
$ sort file.txt | uniq -d
  • -u - This option will print only unique lines.
$ sort file.txt | uniq -u

Other supported parameters

  • -i - Ignore differences in case when comparing lines.
  • -c - Prefix lines by the number of occurrences.

Most common use cases

The ‘uniq’ command is often used in log analysis to identify common and unique events. For instance, you can use it to count how many times each IP address appears in an access log.

$ sort access_log | uniq -c

The tricky skills

One unusual but useful technique with ‘uniq’ is to use it with the ‘-f’ option, which skips fields. This could be used to ignore a timestamp when comparing lines in a log file.

$ sort log_file | uniq -f 1

What needs to be noted

The most important thing to note about ‘uniq’ is that it only works with sorted input. If the repeated lines are not adjacent, ‘uniq’ will not identify them as duplicates.

Conclusion

The ‘uniq’ command is a powerful tool for managing and processing data in Linux. While it has a few quirks, such as requiring sorted input, once mastered, it can significantly simplify many tasks involving text files.

--

--

Peter Hou
Peter Hou

Written by Peter Hou

I am a Senior Software Engineer and tech lead in a top tech company.

No responses yet