DevGuide

The awk Cheatsheet: Process Text in Columns

On this page
  1. Print a column: $1, $3, $NF
  2. Set the field separator: -F
  3. Filter rows by a pattern
  4. Compute sums and averages
  5. BEGIN and END blocks
  6. One-liners for logs and CSV
  7. Where to go from here

You came here to slice a file into columns and pull out the bits you need, probably with awk because someone said it was the tool for the job. They were right. The recipes you actually reach for are below, the most-wanted ones first, so you can copy, swap the field number, and get back to the file in front of you. One thing to know before you scroll: awk reads a file line by line, splits each line into fields on whitespace by default, and numbers them $1, $2, and so on, with $0 being the whole line and $NF being the last field. Almost everything else builds on that.

The short answer

The recipes you reach for, fast: awk '{print $1}' for the first column and $NF for the last, awk -F, '{print $2}' to split a CSV, awk '/error/' file to filter lines like grep, awk '$3 > 100' to keep rows by value, and awk '{sum += $1} END {print sum}' to total a column. BEGIN runs first, END runs last.

$1 ... $NFcolumn one to the last
-F,set the field separator
sum += $1total a column in END
Answer card: print columns with $1 and $NF, change the separator with -F, filter rows by pattern, sum a column with sum plus equals, and run code in BEGIN and END blocks.
Every recipe you reach for, grouped by what you're actually trying to do with the columns. PNG

You came here to slice a file into columns and pull out the bits you need, probably with awk because someone said it was the tool for the job. They were right. The recipes you actually reach for are below, the most-wanted ones first, so you can copy, swap the field number, and get back to the file in front of you.

One thing to know before you scroll. awk reads a file line by line, splits each line into fields on whitespace by default, and numbers them $1, $2, and so on. $0 is the whole line, and $NF is the last field no matter how many there are. Almost everything below builds on that single idea, so once it clicks the rest is just variations.

The one everybody wants first. Wrap the program in single quotes so the shell leaves the $ signs alone, and name the field you want. $1 is the first column, $NF is the last, and $0 is the entire line.

RecipeWhat it does
awk '{print $1}' fileThe first column of every line
awk '{print $3}' fileThe third column
awk '{print $NF}' fileThe last column, whatever its position
awk '{print $(NF-1)}' fileThe second to last column
awk '{print $1, $3}' fileColumns 1 and 3, joined by a space

The comma in print $1, $3 matters. With a comma you get the two values separated by a space (the output field separator, OFS). Drop the comma and write print $1 $3 and awk glues them together with nothing in between, which is almost never what you meant.

Set the field separator: -F

By default awk splits on runs of whitespace, so a CSV or /etc/passwd confuses it. The fix is -F, placed before the program, telling awk what to split on.

RecipeWhat it does
awk -F, '{print $2}' file.csvSplit on commas, print the second field
awk -F: '{print $1}' /etc/passwdSplit on colons, print the usernames
awk -F'\t' '{print $1}' file.tsvSplit on tabs
awk -F'[,;]' '{print $1}' fileSplit on a comma or a semicolon (regex)

The separator can be a single character, several characters, or a regular expression. If you also want your output columns joined by something specific, set OFS in a BEGIN block, as in awk 'BEGIN {FS=","; OFS="\t"} {print $1, $2}', which reads a CSV and writes a TSV.

Filter rows by a pattern

This is where awk starts to feel like grep with a brain. A pattern in front of the action decides which lines the action runs on. Leave the action off and a matching line is printed whole.

RecipeWhat it does
awk '/error/' filePrint lines containing error, like grep
awk '!/debug/' filePrint lines not containing debug
awk '$3 > 100' fileLines where the third field is over 100
awk '$1 == "GET"' access.logLines where the first field equals GET
awk 'NR > 1' fileSkip the header, print from line 2 on
awk 'NF == 0' fileFind the blank lines (zero fields)

The split between pattern and action is the whole model: pattern { action }. Either part is optional. A bare pattern means "print the line", and a bare action with no pattern means "run on every line". So awk '$3 > 100 {print $1}' reads as "on lines where field three beats 100, print field one".

Compute sums and averages

Numbers are where awk earns its keep over a pipe of cut and paste. Variables start at zero, persist across every line, and you print them at the end. NR is the running count of records (lines) awk has read.

RecipeWhat it does
awk '{sum += $1} END {print sum}' fileTotal the first column
awk '{sum += $1} END {print sum / NR}' fileThe average of the first column
awk 'END {print NR}' fileCount the lines (like wc -l)
awk '$1 > max {max = $1} END {print max}' fileThe largest value in column 1
awk '{sum += $3} $3 > 0 {n++} END {print sum/n}' fileAverage ignoring zeros

If you want a per-group total instead of one grand total, use an array keyed by a column. This sums the bytes in field 7 of an access log, grouped by the status code in field 9, and prints each group once at the end:

awk '{bytes[$9] += $7} END {for (code in bytes) print code, bytes[code]}' access.log

Arrays in awk are associative, so the key can be any string. That one pattern, accumulate into array[key] then loop over it in END, covers a surprising amount of real reporting.

BEGIN and END blocks

A full awk program has three parts, and you can use any subset of them. BEGIN runs once before the first line, the middle rules run on each line, and END runs once after the last line. Use BEGIN for headers and setup, END for totals and summaries.

RecipeWhat it does
awk 'BEGIN {print "start"} {print} END {print "done"}'Header, body, footer
awk 'BEGIN {FS=":"} {print $1}' /etc/passwdSet the separator in BEGIN
awk 'END {print NR " lines"}' fileA one-line summary at the end
awk 'BEGIN {OFS="\t"} {print $1, $2}' fileSet the output separator first

Here is the shape of a small report that uses all three, printing a header, the rows it cares about, and a total. Save it or paste it inline:

awk 'BEGIN { print "user  bytes" }
     $9 == 200 { print $1, $7; total += $7 }
     END { print "total", total }' access.log

The fields line up because awk does not care about your whitespace inside the program. You can spread a rule across several lines for readability, which beats cramming a real report onto one line and squinting at it later.

bash
awk -F: '{print $1}' /etc/passwd | sort

That last one is the kind of thing you type without thinking once the model sticks: split /etc/passwd on colons, print the usernames in field one, and pipe them to sort. No temporary file, no editor, just the column you wanted.

One-liners for logs and CSV

The working set, the lines that actually pay rent. These assume an Apache or Nginx access log in the common format (IP first, status in field 9, bytes in field 10) and a comma-separated CSV. Swap the field numbers to match your own layout.

RecipeWhat it does
awk '{print $1}' access.log | sort | uniq -c | sort -rnTop client IPs by hit count
awk '$9 == 404 {print $7}' access.logEvery URL that 404'd
awk '$9 >= 500' access.logAll the server errors (5xx)
awk -F, 'NR > 1 {sum += $3} END {print sum}' data.csvSum a CSV column, skipping the header
`awk -F, 'NR == 1
awk -F, '{print NF; exit}' data.csvHow many columns does this CSV have
awk -F, '!seen[$1]++' data.csvDrop duplicate rows by the first field

That !seen[$1]++ trick is worth a second look, because it shows up everywhere. It uses field one as a key into an array. The first time a value appears, seen[$1] is zero, !0 is true, so the line prints, and ++ then bumps the count to one. Every later copy finds a non-zero count, !1 is false, and the line is skipped. The result is a stable dedupe that keeps the first occurrence and the original order, which sort -u cannot give you.

Terminal showing common awk recipes: print the first column of passwd, split a CSV on commas, sum a column in an END block, and find the top client IPs in an access log.
The most-wanted recipes, first. Copy, swap the field number, and you're done. PNG

Where to go from here

That's the working set. Print a column, set the separator, filter rows, total a column, and the BEGIN/END scaffolding that turns those into a real report. Most of everyday awk is some mix of those, and the deep language features (functions, multidimensional arrays, getline) wait quietly until the rare day you need them.

If your text wrangling goes past columns, the same instinct carries over. Hunting for files by name, size, or age? See the find command cheatsheet, which pairs nicely with awk when you feed it a list of paths. Reaching for search and replace across a stream instead of by field? A good sed reference covers the substitutions awk is clumsy at. Different tools, same habit: stop memorizing, keep a solid cheatsheet within reach.

Frequently asked questions

How do I print a specific column with awk?

Use awk with the field number you want: awk "{print \$1}" file prints the first column, awk "{print \$3}" prints the third, and awk "{print \$NF}" prints the last one whatever its position. Fields are split on runs of whitespace by default and numbered from 1, while \$0 is the entire line. To print two columns with a space between them, list them with a comma: awk "{print \$1, \$3}".

How do I change the field separator in awk?

Pass -F followed by the separator, before the program. For a CSV you would write awk -F, "{print \$2}" file, and for the colon-separated /etc/passwd you would write awk -F: "{print \$1}" /etc/passwd. The separator can be more than one character or a regex, so -F"\t" splits on tabs and -F"[,;]" splits on either a comma or a semicolon. Set OFS the same way if you also want to control how output fields are joined.

How do I sum a column of numbers with awk?

Add the field to a running total on every line, then print it at the end: awk "{sum += \$1} END {print sum}" file. Variables in awk start at zero and persist across lines, so sum keeps growing as each row is read, and the END block runs once after the last line. For an average, also count the rows and divide: awk "{sum += \$1} END {print sum / NR}" file, where NR is the number of records awk has seen.

What are BEGIN and END blocks in awk?

BEGIN runs once before any line is read, and END runs once after the last line. You use BEGIN to print a header or set a variable like the separator, and END to print a total or summary you built up while reading. A program can have all three parts: a BEGIN block, one or more pattern and action rules for the lines in between, and an END block. Either special block is optional, and many one-liners use only END.

How do I filter rows by a pattern in awk?

Put the condition in front of the action, or use it alone. awk "/error/" file prints every line containing error, the same way grep would, while awk "\$3 > 100" file prints lines whose third field is greater than 100. Combine a test and an action like awk "\$1 == \"GET\" {print \$7}" to print the seventh field only on lines where the first is GET. Without an action, a matching line is printed in full.