[#]: subject: "Tame your text with Perl" [#]: via: "https://opensource.com/article/22/2/text-based-code-perl" [#]: author: "Hunter Coleman https://opensource.com/users/hunterc" [#]: collector: "lujun9972" [#]: translator: " " [#]: reviewer: " " [#]: publisher: " " [#]: url: " " Tame your text with Perl ====== Use regular expressions to speed up your text-based coding tasks. ![Person using a laptop][1] Although its popularity has been tempered by languages like Python, Lua, and Go, Perl was one of the primary utilitarian languages on Unix and Linux for 30 years. It remains an important and powerful component in many open source systems today. If you haven't used Perl much, then you may be surprised by how helpful it can be for many tasks. This is especially true if you deal with large amounts of text in your day-to-day work. If you need a language that allows you to search and manipulate large volumes of text quickly and easily, Perl is tough to beat. In fact, doing exactly that is what Larry Walls originally built the language for. If you're brand new to Perl, you can read this [quick Perl intro][2] to get a feel for the basics. ### Searching text with regex To get started, here's an example of a simple regular expression (sometimes shortened to "regex") script. Suppose you have a list of names in a file called `names.txt`: ``` Steve Smith Jane Murphy Bobby Jones Elizabeth Arnold Michelle Swanson ``` You want to pull out all the people named Elizabeth. Put the regular expression you're looking for—here it is "Elizabeth"—between forward slashes, and Perl will look at every line following the special DATA token and only print lines that match. ``` use warnings; use strict; [open][3] my $fh, '<:encoding(UTF-8)', "$names.txt" or   [die][4] "Could not read file\n"; while(<$fh>){   [print][5] if /Elizabeth/; } ``` A quick note regarding this code: the regular expression needs to come at the end of the line. So `if /Elizabeth/ print;` will not work. This error is common for new Perl programmers. ### Changing selected words with lookarounds Sometimes you may not want to do something with every instance of a string, but instead make your selections based on what comes either before or after the string. For example, perhaps you want to change the string "Robert" to "Bob" but only if "Robert" is followed by "Dylan." Otherwise, you don't want to change the name. For Perl, this is easy. You can apply this condition with a single line of code directly from your terminal: ``` `perl -i.bkp -pe 's/Robert (?=Dylan)/Bob /g' names.txt` ``` For those new to Perl, this line might seem a bit intimidating at first glance, but it's really quite simple and elegant. The `-i` flag makes the output of the program write back to a file instead of displaying on the terminal screen. You can provide an extension to `-i` to save the input file to a file with the given extension. In other words, I'm creating a backup of the original file with the `.bkp` extension. (Be sure that you do not put a space between `-i` and the extension `.bkp`.) After that, I use the `-pe` options. The `-e` option allows me to run Perl from the command line. The `-p` option causes my code to loop through every line of the file and print the output. After all, I want the new file to contain every name in the original file, not just Mr. Dylan's. Next comes the phrase `s/Robert (?=Dylan)/Bob /g`. Here, I'm substituting (indicated by `s`) what comes between the first two slashes with what comes between the second and third slash. In this case, I want to substitute "Bob" for "Robert" in a specific circumstance. I want to do this for every instance in the file, not just the first one it finds, so I use the `g` flag for _global_ at the end. What about that strange-looking `(?=Dylan)`? This is what's known as a _positive lookahead_ in the world of regular expressions. It's noncapturing, so it won't be replaced by anything (Bob, in this example); instead, the expression narrows down the results that do get changed. I'm looking for the string "Robert" _if and only if_ it is followed (that's a positive lookahead) by the string "Dylan." Otherwise, ignore it. If the name "Robert Smith" is in my list of names, for example, I want to leave that alone and not change it to "Bob Smith." These are the lookarounds available to Perl users: * positive lookahead: `?=pattern` * negative lookahead: `?!pattern` * positive lookbehind: `?<=pattern` * negative lookbehind: `?