Regular Expressions in Linux Programming

An Introduction

Regular expressions are quite frequently found by Linux administrators and users. It does not have to be BASH or any other shell for using regular expressions, but you can also use any scripting language on Linux to make use of regular expressions.

The regular expressions match stings of text. Yes, this is all what the regular expressions do. The regular expressions match the strings by characters, words, or patterns. The regular expressions are also known as regex or regexp.

There are three basic types of regular expressions.

  1. Simple regular expressions
  2. POSIX regular expressions
  3. Perl-based regular expressions

Although there are three categories, the basics among the categories are the same. In addition, the Perl-based regular expressions are not only used in Perl, but also used in many other programming and scripting languages such as Java, Python, JavaScript, and Ruby.

The History

Theoretical computer science is the inventor of regular expressions. Formal language theory and automata theory are the two theories that belong to theoretical computer science that introduced regular expressions to the world of computing. First, the mathematician Stephen Kleene introduced the concept of regular sets in 1950s. Although languages such as SNOBOL were available in this era for pattern matching, these languages lacked the properties of regular expressions.

Later, Ken Thopson built Stephen’s model into text editors, so it can be used for pattern matching in text. Further enhancing the model, Ken later added the same but enhanced capability to the popular UNIX editor “ed”. This led to the invention of “grep”. From this era onwards, there have been a number of regular expression varieties available for the UNIX and UNIX-like platforms.

Basic Regular Expressions

Let’s have a look at a few basic regular expressions.

The Dot (.)

This is used as a placeholder for any character. The dot denotes a character variable in a string. As an example, “go.” Would match “goa”, “got”, etc. and “g..d” would match “good”.

The Asterisk (*)

This symbol denotes zero or more previous characters. When the asterisk is mentioned with some previous characters, the previous characters should always be present in a sequence. As an example, “go.*” would match “god”, “gone”, and “gondola”.

The Plus Sign (+)

This sign denotes that the previous characters should be in the sequence. As an example, “fre+..” would match “fresh”, “freak”, and “freeze”. This can be further explained as “f” and “r” following by one or more “e” and following by any two characters.

The Question Mark (?)

This regular expression indicates that there should be one or none of the previous characters in the match. As an example, “gon?e” will match “goe” and “gone”.

Above are the regular expressions for someone to start with. The world of regular expressions is quit wide and deep. The experienced programmers use regular expressions to shorten their source code and to afford the maximum flexibility of the code.

Let’s have a look at some of the reasons for using regular expressions in your program or script code.

Some Reasons for Using Regular Expressions

  1. Regular expressions are available for almost all the programming languages and scripting languages. In addition to that, regular expressions are supported by many Linux editors such as vi. When it comes to UNIX and Linux, grep command is one of the most basic commands available for use with regular expressions. Without regular expressions, no proper use can be made from grep command.
  2. If you pay a close attention, regular expressions can be considered as a language within a language. The regular expressions alone are so powerful, so they constitute a sub-language. If you happen to Google, you will find many books written on mastering regular expressions.
  3. The notations of regular expressions can be considered as a powerful search engines. These parts of the search engine looks for the patterns you want to search in the files.

As an example, the party , which benefits the most from regular expressions, is the financial analysts in Wall Street. Their institutions use regular expressions to scan financial information and extract what they are looking for. Usually, these institutions have many sources that supply financial information in high volumes. The programmers work for these institutions have used regular expressions to scan and search the incoming data for finding the financial information.

Without the use of regular expressions, the Wall Street will not be as efficient as it is now. Regular expressions make the life easier for the people who work for Wall Street.

  1. As long as your search string is simple, you do not have to write complex regular expressions. As an example, if you want to find “tiger”, you just type “tiger”.
  2. Linux is case sensitive by nature. Therefore, there is no exception for regular expressions as well. In case if you want to find “Tiger” and “tiger” in your text, it is just a simple command as below.

[Tt]iger

  1. When it comes to matching strings, you may want to match one thing or another. If this is your requirement, then it is easy as typing the below.

Tiger|Monkey

  1. Regular expressions are a great set of tool that can be used for matching virtually anything. If you know what you are looking for, you can definitely use regular expressions to help you to find what you are looking for. This is quite applicable for programming languages such as Perl, where the use of regular expressions have been expanded and pushed up to their limits.

The regular expressions are useful for your everyday life. Take Google search engine as an example. When you search for something in Google, there are a number of regular expressions searches taking place in the backend of the search engine. These regular expressions search for what you want to find and give you results within second’s time. Although these search engines use regular expressions for finding the results, usually this mechanism is hidden from the end users. But some search engines, such as Google Code Search, allow the users to use regular expressions for searching information.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.