awk command to process and analyze text files
awk
is a powerful command-line tool that can be used to process and analyze text files. It is particularly useful for extracting and manipulating data from large text files.
Here is a basic overview of how awk
works:
awk
reads a file or input stream one line at a time.- For each line,
awk
executes a set of commands that are specified in a script. - The script typically consists of one or more patterns and associated actions.
- If a line matches a pattern,
awk
executes the associated action for that pattern. - If a line does not match any pattern,
awk
executes a default action (if specified). awk
repeats this process until it has read all of the input.
Here is an example of a simple awk
script that counts the number of lines in a file:
{ count++ } END { print count }
This script consists of two actions:
- The first action,
{ count++ }
, increments thecount
variable by 1 for each line of input. - The second action,
END { print count }
, prints the value of thecount
variable after all input has been read.
To use this script, you can pass it to awk
along with the input file:
$ awk -f script.awk input.txt
awk
scripts can also include more complex patterns and actions. For example, you can use awk
to extract specific fields from a delimited text file, or to perform calculations on the data.
Here is an example of a more complex awk
script that extracts the third field from a tab-delimited file and calculates the sum of all values in that field:
{ sum += $3 } END { print sum }
This script consists of two actions:
- The first action,
{ sum += $3 }
, increments thesum
variable by the value of the third field for each line of input. - The second action,
END { print sum }
, prints the value of thesum
variable after all input has been read.
To use this script, you can pass it to awk
along with the input file:
$ awk -F't' -f script.awk input
.txt
Here are a few examples of awk
instructions that you can use to process and manipulate text:
- Print a specific field: To print a specific field from each line of input, you can use the
$
operator followed by the field number. For example, the followingawk
script would print the second field of each line:
{ print $2 }
- Extract a range of fields: To extract a range of fields from each line of input, you can use the
$
operator followed by the field range. For example, the followingawk
script would print the second and third fields of each line:
{ print $2, $3 }
- Perform calculations: To perform calculations on the input data, you can use arithmetic operators and built-in functions. For example, the following
awk
script would calculate the average of the second field of each line:
{ sum += $2; count++ }
END { print sum / count } - Modify the output field separator: By default,
awk
separates output fields with a single space. You can use theOFS
variable to modify the field separator. For example, the followingawk
script would print the second and third fields separated by a comma:
BEGIN { OFS = "," }
{ print $2, $3 }
awk different command options
The awk
command provides a number of flags that can be used to modify its behavior. Here are a few examples of special flags that you can use with awk
:
-F
: This flag specifies the field separator. For example, the followingawk
command would use a comma as the field separator:
awk -F',''{ print $1 }'input.txt
-v
: This flag defines a variable that can be used in theawk
script. For example, the followingawk
command would define a variablex
with the value10
:
awk -v x=10'{ print x }'input.txt
-f
: This flag specifies a file containing theawk
script. For example, the followingawk
command would execute the script in the filescript.awk:
awk -f script.awk input.txt
-OFS
: This flag specifies the output field separator. For example, the followingawk
command would use a comma as the output field separator:
awk -OFS=',''{ print $1, $2 }'input.txt
awk built-in variables
Here is a list of some of the built-in variables that you can use with the awk
command:
ERRNO
: The system error message.FIELDWIDTHS
: A string containing a space-separated list of field widths.FILENAME
: The name of the current input file.FNR
: The number of the current record in the current input file.NR
: The number of the current input record.ORS
: The output record separator.RS
: The input record separator.NF
: TheNF
variable represents the number of fields in the current input record. You can use it to access the last field of each line, like this:
{ print $NF }
NF == 3 { print $0 }
You can also use it to perform actions on lines with a different number of fields, like this:
NF != 3 { print "Line has incorrect number of fields" }
FS
: TheFS
variable represents the input field separator. You can use it to change the field separator, like this:
BEGIN { FS = "," }
{ print $1 }OFMT
: TheOFMT
variable represents the output format for numbers. You can use it to change the number of decimal places, like this:
BEGIN { OFMT = "%.2f" }
{ print $1 / $2 }OFS
: TheOFS
variable represents the output field separator. You can use it to change the field separator, like this:
BEGIN { OFS = "," }
{ print $1, $2 }
awk command exit status codes
The awk
command returns an exit status code when it finishes execution. The exit status code indicates whether the command was successful or encountered an error.
Here is a more detailed list of exit status codes that you may see when using the awk
command:
0
: The command was successful. This exit status code indicates that theawk
command completed its execution without encountering any errors.1
: The command encountered a fatal error. This exit status code indicates that theawk
command encountered an error that it could not recover from, and it was unable to complete its execution. Examples of fatal errors include syntax errors in theawk
script, and attempts to access a file that does not exist.2
: The command encountered a recoverable error. This exit status code indicates that theawk
command encountered an error that it was able to recover from, but it was not able to complete its execution as intended. Examples of recoverable errors include attempts to read from a file that has reached the end of its input, and attempts to divide a number by zero.
To get the status of the command after finished execution echo $?
will output the status.
$ echo $?
#returns 0 in terms of successful execution of awk command