You came here to slice a file into columns and pull out the bits you need, probably with awk because someone said it was the tool for the job. They were right. The recipes you actually reach for are below, the most-wanted ones first, so you can copy, swap the field number, and get back to the file in front of you. One thing to know before you scroll: awk reads a file line by line, splits each line into fields on whitespace by default, and numbers them $1, $2, and so on, with $0 being the whole line and $NF being the last field. Almost everything else builds on that.
The short answer
The recipes you reach for, fast: awk '{print $1}' for the first column and $NF
for the last, awk -F, '{print $2}' to split a CSV, awk '/error/' file to filter
lines like grep, awk '$3 > 100' to keep rows by value, and awk '{sum += $1} END {print sum}' to total a column. BEGIN runs first, END runs last.
You came here to slice a file into columns and pull out the bits you need, probably with awk because someone said it was the tool for the job. They were right. The recipes you actually reach for are below, the most-wanted ones first, so you can copy, swap the field number, and get back to the file in front of you.
One thing to know before you scroll. awk reads a file line by line, splits each line into fields on whitespace by default, and numbers them $1, $2, and so on. $0 is the whole line, and $NF is the last field no matter how many there are. Almost everything below builds on that single idea, so once it clicks the rest is just variations.
Print a column: $1, $3, $NF
The one everybody wants first. Wrap the program in single quotes so the shell leaves the $ signs alone, and name the field you want. $1 is the first column, $NF is the last, and $0 is the entire line.
| Recipe | What it does |
|---|---|
awk '{print $1}' file | The first column of every line |
awk '{print $3}' file | The third column |
awk '{print $NF}' file | The last column, whatever its position |
awk '{print $(NF-1)}' file | The second to last column |
awk '{print $1, $3}' file | Columns 1 and 3, joined by a space |
The comma in print $1, $3 matters. With a comma you get the two values separated by a space (the output field separator, OFS). Drop the comma and write print $1 $3 and awk glues them together with nothing in between, which is almost never what you meant.
Set the field separator: -F
By default awk splits on runs of whitespace, so a CSV or /etc/passwd confuses it. The fix is -F, placed before the program, telling awk what to split on.
| Recipe | What it does |
|---|---|
awk -F, '{print $2}' file.csv | Split on commas, print the second field |
awk -F: '{print $1}' /etc/passwd | Split on colons, print the usernames |
awk -F'\t' '{print $1}' file.tsv | Split on tabs |
awk -F'[,;]' '{print $1}' file | Split on a comma or a semicolon (regex) |
The separator can be a single character, several characters, or a regular expression. If you also want your output columns joined by something specific, set OFS in a BEGIN block, as in awk 'BEGIN {FS=","; OFS="\t"} {print $1, $2}', which reads a CSV and writes a TSV.
Filter rows by a pattern
This is where awk starts to feel like grep with a brain. A pattern in front of the action decides which lines the action runs on. Leave the action off and a matching line is printed whole.
| Recipe | What it does |
|---|---|
awk '/error/' file | Print lines containing error, like grep |
awk '!/debug/' file | Print lines not containing debug |
awk '$3 > 100' file | Lines where the third field is over 100 |
awk '$1 == "GET"' access.log | Lines where the first field equals GET |
awk 'NR > 1' file | Skip the header, print from line 2 on |
awk 'NF == 0' file | Find the blank lines (zero fields) |
The split between pattern and action is the whole model: pattern { action }. Either part is optional. A bare pattern means "print the line", and a bare action with no pattern means "run on every line". So awk '$3 > 100 {print $1}' reads as "on lines where field three beats 100, print field one".
Compute sums and averages
Numbers are where awk earns its keep over a pipe of cut and paste. Variables start at zero, persist across every line, and you print them at the end. NR is the running count of records (lines) awk has read.
| Recipe | What it does |
|---|---|
awk '{sum += $1} END {print sum}' file | Total the first column |
awk '{sum += $1} END {print sum / NR}' file | The average of the first column |
awk 'END {print NR}' file | Count the lines (like wc -l) |
awk '$1 > max {max = $1} END {print max}' file | The largest value in column 1 |
awk '{sum += $3} $3 > 0 {n++} END {print sum/n}' file | Average ignoring zeros |
If you want a per-group total instead of one grand total, use an array keyed by a column. This sums the bytes in field 7 of an access log, grouped by the status code in field 9, and prints each group once at the end:
awk '{bytes[$9] += $7} END {for (code in bytes) print code, bytes[code]}' access.log
Arrays in awk are associative, so the key can be any string. That one pattern, accumulate into array[key] then loop over it in END, covers a surprising amount of real reporting.
BEGIN and END blocks
A full awk program has three parts, and you can use any subset of them. BEGIN runs once before the first line, the middle rules run on each line, and END runs once after the last line. Use BEGIN for headers and setup, END for totals and summaries.
| Recipe | What it does |
|---|---|
awk 'BEGIN {print "start"} {print} END {print "done"}' | Header, body, footer |
awk 'BEGIN {FS=":"} {print $1}' /etc/passwd | Set the separator in BEGIN |
awk 'END {print NR " lines"}' file | A one-line summary at the end |
awk 'BEGIN {OFS="\t"} {print $1, $2}' file | Set the output separator first |
Here is the shape of a small report that uses all three, printing a header, the rows it cares about, and a total. Save it or paste it inline:
awk 'BEGIN { print "user bytes" }
$9 == 200 { print $1, $7; total += $7 }
END { print "total", total }' access.log
The fields line up because awk does not care about your whitespace inside the program. You can spread a rule across several lines for readability, which beats cramming a real report onto one line and squinting at it later.
awk -F: '{print $1}' /etc/passwd | sort That last one is the kind of thing you type without thinking once the model sticks: split /etc/passwd on colons, print the usernames in field one, and pipe them to sort. No temporary file, no editor, just the column you wanted.
One-liners for logs and CSV
The working set, the lines that actually pay rent. These assume an Apache or Nginx access log in the common format (IP first, status in field 9, bytes in field 10) and a comma-separated CSV. Swap the field numbers to match your own layout.
| Recipe | What it does |
|---|---|
awk '{print $1}' access.log | sort | uniq -c | sort -rn | Top client IPs by hit count |
awk '$9 == 404 {print $7}' access.log | Every URL that 404'd |
awk '$9 >= 500' access.log | All the server errors (5xx) |
awk -F, 'NR > 1 {sum += $3} END {print sum}' data.csv | Sum a CSV column, skipping the header |
| `awk -F, 'NR == 1 | |
awk -F, '{print NF; exit}' data.csv | How many columns does this CSV have |
awk -F, '!seen[$1]++' data.csv | Drop duplicate rows by the first field |
That !seen[$1]++ trick is worth a second look, because it shows up everywhere. It uses field one as a key into an array. The first time a value appears, seen[$1] is zero, !0 is true, so the line prints, and ++ then bumps the count to one. Every later copy finds a non-zero count, !1 is false, and the line is skipped. The result is a stable dedupe that keeps the first occurrence and the original order, which sort -u cannot give you.
Where to go from here
That's the working set. Print a column, set the separator, filter rows, total a column, and the BEGIN/END scaffolding that turns those into a real report. Most of everyday awk is some mix of those, and the deep language features (functions, multidimensional arrays, getline) wait quietly until the rare day you need them.
If your text wrangling goes past columns, the same instinct carries over. Hunting for files by name, size, or age? See the find command cheatsheet, which pairs nicely with awk when you feed it a list of paths. Reaching for search and replace across a stream instead of by field? A good sed reference covers the substitutions awk is clumsy at. Different tools, same habit: stop memorizing, keep a solid cheatsheet within reach.
Frequently asked questions
How do I print a specific column with awk?
Use awk with the field number you want: awk "{print \$1}" file prints the first column, awk "{print \$3}" prints the third, and awk "{print \$NF}" prints the last one whatever its position. Fields are split on runs of whitespace by default and numbered from 1, while \$0 is the entire line. To print two columns with a space between them, list them with a comma: awk "{print \$1, \$3}".
How do I change the field separator in awk?
Pass -F followed by the separator, before the program. For a CSV you would write awk -F, "{print \$2}" file, and for the colon-separated /etc/passwd you would write awk -F: "{print \$1}" /etc/passwd. The separator can be more than one character or a regex, so -F"\t" splits on tabs and -F"[,;]" splits on either a comma or a semicolon. Set OFS the same way if you also want to control how output fields are joined.
How do I sum a column of numbers with awk?
Add the field to a running total on every line, then print it at the end: awk "{sum += \$1} END {print sum}" file. Variables in awk start at zero and persist across lines, so sum keeps growing as each row is read, and the END block runs once after the last line. For an average, also count the rows and divide: awk "{sum += \$1} END {print sum / NR}" file, where NR is the number of records awk has seen.
What are BEGIN and END blocks in awk?
BEGIN runs once before any line is read, and END runs once after the last line. You use BEGIN to print a header or set a variable like the separator, and END to print a total or summary you built up while reading. A program can have all three parts: a BEGIN block, one or more pattern and action rules for the lines in between, and an END block. Either special block is optional, and many one-liners use only END.
How do I filter rows by a pattern in awk?
Put the condition in front of the action, or use it alone. awk "/error/" file prints every line containing error, the same way grep would, while awk "\$3 > 100" file prints lines whose third field is greater than 100. Combine a test and an action like awk "\$1 == \"GET\" {print \$7}" to print the seventh field only on lines where the first is GET. Without an action, a matching line is printed in full.