awk command to process and analyze text files

awk is a powerful command-line tool that can be used to process and analyze text files. It is particularly useful for extracting and manipulating data from large text files.

Here is a basic overview of how awk works:

  1. awk reads a file or input stream one line at a time.
  2. For each line, awk executes a set of commands that are specified in a script.
  3. The script typically consists of one or more patterns and associated actions.
  4. If a line matches a pattern, awk executes the associated action for that pattern.
  5. If a line does not match any pattern, awk executes a default action (if specified).
  6. awk repeats this process until it has read all of the input.

Here is an example of a simple awk script that counts the number of lines in a file:


{ count++ } END { print count }

This script consists of two actions:

  1. The first action, { count++ }, increments the count variable by 1 for each line of input.
  2. The second action, END { print count }, prints the value of the count variable after all input has been read.

To use this script, you can pass it to awk along with the input file:

$ awk -f script.awk input.txt

awk scripts can also include more complex patterns and actions. For example, you can use awk to extract specific fields from a delimited text file, or to perform calculations on the data.

Here is an example of a more complex awk script that extracts the third field from a tab-delimited file and calculates the sum of all values in that field:

{ sum += $3 } END { print sum }

This script consists of two actions:

  1. The first action, { sum += $3 }, increments the sum variable by the value of the third field for each line of input.
  2. The second action, END { print sum }, prints the value of the sum variable after all input has been read.

To use this script, you can pass it to awk along with the input file:

$ awk -F't' -f script.awk input.txt

Here are a few examples of awk instructions that you can use to process and manipulate text:

  1. Print a specific field: To print a specific field from each line of input, you can use the $ operator followed by the field number. For example, the following awk script would print the second field of each line:
    { print $2 }
  2. Extract a range of fields: To extract a range of fields from each line of input, you can use the $ operator followed by the field range. For example, the following awk script would print the second and third fields of each line:
    { print $2, $3 }
  3. Perform calculations: To perform calculations on the input data, you can use arithmetic operators and built-in functions. For example, the following awk script would calculate the average of the second field of each line:
    { sum += $2; count++ }
    END {
    print sum / count }
  4. Modify the output field separator: By default, awk separates output fields with a single space. You can use the OFS variable to modify the field separator. For example, the following awk script would print the second and third fields separated by a comma:
    BEGIN { OFS = "," }
    {
    print $2, $3 }

awk different command options

The awk command provides a number of flags that can be used to modify its behavior. Here are a few examples of special flags that you can use with awk:

  • -F: This flag specifies the field separator. For example, the following awk command would use a comma as the field separator:
    awk -F',''{ print $1 }'input.txt
  • -v: This flag defines a variable that can be used in the awk script. For example, the following awk command would define a variable x with the value 10:
    awk -v x=10'{ print x }'input.txt
  • -f: This flag specifies a file containing the awk script. For example, the following awk command would execute the script in the file script.awk:
    awk -f script.awk input.txt
  • -OFS: This flag specifies the output field separator. For example, the following awk command would use a comma as the output field separator:
    awk -OFS=',''{ print $1, $2 }'input.txt

awk built-in variables

Here is a list of some of the built-in variables that you can use with the awk command:

  • ERRNO: The system error message.
  • FIELDWIDTHS: A string containing a space-separated list of field widths.
  • FILENAME: The name of the current input file.
  • FNR: The number of the current record in the current input file.
  • NR: The number of the current input record.
  • ORS: The output record separator.
  • RS: The input record separator.
  • NF: The NF variable represents the number of fields in the current input record. You can use it to access the last field of each line, like this:
    { print $NF }
    NF == 3 { print $0 }
    You can also use it to perform actions on lines with a different number of fields, like this:
    NF != 3 { print "Line has incorrect number of fields" }
  • FS: The FS variable represents the input field separator. You can use it to change the field separator, like this:
    BEGIN { FS = "," }
    {
    print $1 }
  • OFMT: The OFMT variable represents the output format for numbers. You can use it to change the number of decimal places, like this:
    BEGIN { OFMT = "%.2f" }
    {
    print $1 / $2 }
  • OFS: The OFS variable represents the output field separator. You can use it to change the field separator, like this:
    BEGIN { OFS = "," }
    {
    print $1, $2 }

awk command exit status codes

The awk command returns an exit status code when it finishes execution. The exit status code indicates whether the command was successful or encountered an error.

Here is a more detailed list of exit status codes that you may see when using the awk command:

  • 0: The command was successful. This exit status code indicates that the awk command completed its execution without encountering any errors.
  • 1: The command encountered a fatal error. This exit status code indicates that the awk command encountered an error that it could not recover from, and it was unable to complete its execution. Examples of fatal errors include syntax errors in the awk script, and attempts to access a file that does not exist.
  • 2: The command encountered a recoverable error. This exit status code indicates that the awk command encountered an error that it was able to recover from, but it was not able to complete its execution as intended. Examples of recoverable errors include attempts to read from a file that has reached the end of its input, and attempts to divide a number by zero.

To get the status of the command after finished execution echo $? will output the status.

$ echo $?
#returns 0 in terms of successful execution of awk command

License

Developers ultimate guide: Linux Bash scripting Copyright © 2022 by Matin Maleki. All Rights Reserved.

Share This Book