awk command to process and analyze text files
awk is a powerful command-line tool that can be used to process and analyze text files. It is particularly useful for extracting and manipulating data from large text files.
Here is a basic overview of how awk works:
awkreads a file or input stream one line at a time.- For each line,
awkexecutes a set of commands that are specified in a script. - The script typically consists of one or more patterns and associated actions.
- If a line matches a pattern,
awkexecutes the associated action for that pattern. - If a line does not match any pattern,
awkexecutes a default action (if specified). awkrepeats this process until it has read all of the input.
Here is an example of a simple awk script that counts the number of lines in a file:
{ count++ } END { print count }This script consists of two actions:
- The first action,
{ count++ }, increments thecountvariable by 1 for each line of input. - The second action,
END { print count }, prints the value of thecountvariable after all input has been read.
To use this script, you can pass it to awk along with the input file:
$ awk -f script.awk input.txt
awk scripts can also include more complex patterns and actions. For example, you can use awk to extract specific fields from a delimited text file, or to perform calculations on the data.
Here is an example of a more complex awk script that extracts the third field from a tab-delimited file and calculates the sum of all values in that field:
{ sum += $3 } END { print sum }
This script consists of two actions:
- The first action,
{ sum += $3 }, increments thesumvariable by the value of the third field for each line of input. - The second action,
END { print sum }, prints the value of thesumvariable after all input has been read.
To use this script, you can pass it to awk along with the input file:
$ awk -F't' -f script.awk input.txtHere are a few examples of awk instructions that you can use to process and manipulate text:
- Print a specific field: To print a specific field from each line of input, you can use the
$operator followed by the field number. For example, the followingawkscript would print the second field of each line:
{ print $2 } - Extract a range of fields: To extract a range of fields from each line of input, you can use the
$operator followed by the field range. For example, the followingawkscript would print the second and third fields of each line:
{ print $2, $3 } - Perform calculations: To perform calculations on the input data, you can use arithmetic operators and built-in functions. For example, the following
awkscript would calculate the average of the second field of each line:
{ sum += $2; count++ }
END { print sum / count } - Modify the output field separator: By default,
awkseparates output fields with a single space. You can use theOFSvariable to modify the field separator. For example, the followingawkscript would print the second and third fields separated by a comma:
BEGIN { OFS = "," }
{ print $2, $3 }
awk different command options
The awk command provides a number of flags that can be used to modify its behavior. Here are a few examples of special flags that you can use with awk:
-F: This flag specifies the field separator. For example, the followingawkcommand would use a comma as the field separator:
awk -F',''{ print $1 }'input.txt-v: This flag defines a variable that can be used in theawkscript. For example, the followingawkcommand would define a variablexwith the value10:
awk -v x=10'{ print x }'input.txt-f: This flag specifies a file containing theawkscript. For example, the followingawkcommand would execute the script in the filescript.awk:
awk -f script.awk input.txt-OFS: This flag specifies the output field separator. For example, the followingawkcommand would use a comma as the output field separator:
awk -OFS=',''{ print $1, $2 }'input.txt
awk built-in variables
Here is a list of some of the built-in variables that you can use with the awk command:
ERRNO: The system error message.FIELDWIDTHS: A string containing a space-separated list of field widths.FILENAME: The name of the current input file.FNR: The number of the current record in the current input file.NR: The number of the current input record.ORS: The output record separator.RS: The input record separator.NF: TheNFvariable represents the number of fields in the current input record. You can use it to access the last field of each line, like this:
{ print $NF }
NF == 3 { print $0 }
You can also use it to perform actions on lines with a different number of fields, like this:
NF != 3 { print "Line has incorrect number of fields" }
FS: TheFSvariable represents the input field separator. You can use it to change the field separator, like this:
BEGIN { FS = "," }
{ print $1 }OFMT: TheOFMTvariable represents the output format for numbers. You can use it to change the number of decimal places, like this:
BEGIN { OFMT = "%.2f" }
{ print $1 / $2 }OFS: TheOFSvariable represents the output field separator. You can use it to change the field separator, like this:
BEGIN { OFS = "," }
{ print $1, $2 }
awk command exit status codes
The awk command returns an exit status code when it finishes execution. The exit status code indicates whether the command was successful or encountered an error.
Here is a more detailed list of exit status codes that you may see when using the awk command:
0: The command was successful. This exit status code indicates that theawkcommand completed its execution without encountering any errors.1: The command encountered a fatal error. This exit status code indicates that theawkcommand encountered an error that it could not recover from, and it was unable to complete its execution. Examples of fatal errors include syntax errors in theawkscript, and attempts to access a file that does not exist.2: The command encountered a recoverable error. This exit status code indicates that theawkcommand encountered an error that it was able to recover from, but it was not able to complete its execution as intended. Examples of recoverable errors include attempts to read from a file that has reached the end of its input, and attempts to divide a number by zero.
To get the status of the command after finished execution echo $? will output the status.
$ echo $?
#returns 0 in terms of successful execution of awk command