Log analysis on the command line using standard Linux tools

Part 1: Getting familiar with apache logs and how to do a simple analysis.

Image for post
Image for post
Photo by Derek Oyen on Unsplash

If you are a system administrator, a software developer, or even a power user.. sooner or later you will face a problem that needs troubleshooting, if the problem is not something profound, you will need to troubleshoot further and read log files in order to identify the problem.

The most difficult part of the analysis is “what to search for” especially when you dont have any prior experience with the problem you try to debug.

Log patterns: know your enemy

When software runs smoothly:

When software runs smoothly the log output is also smooth, this means that when software runs without problems the log lines should have a repeatition pattern, lets see the a log line example of the Apache web server.

192.168.2.20 - - [28/Jul/2006:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395

Columns explaination:

  • 1st: The IP address of the client.
  • 4th: The timestamp that the request made.
  • 5th: The HTTP method of the request.
  • 6th: The HTTP Version used.
  • 7th: The HTTP response code from the Apache web server.
  • 8th: The response length.

When an HTTP response is in range of 2XX means that the request processed by the web server successfully and the web server responded with success.

When software does not run smoothly:

A response that would indicate a problem could be in the range of 4XX or 5XX, errors of 4XX means that content could be be found, errors of 5XX is a server side error.

Note that HTTP 4XX reponses are not software errors, most likelly is a user/misconfiguration errors, but still are a good indicator that something is going wrong!

A simple count of log lines grouped by HTTP responses:

To do the examples you will need to download some example apache logs

https://github.com/elastic/examples/blob/master/Common%20Data%20Formats/apache_logs/apache_logs

Now we know what can indicate a possible error lets see how we can get some essential info, like to count the number of log lines grouped by the http responses.

I highly recommend you to run the same commands step by step in order to see the filtering of the data that takes place after each command until we reach to the result we want.

& cat ./apache_logs.txt | tr -s " " | cut -d " " -f9 | sort -n | uniq -c | column -t9126  200
45 206
164 301
445 304
2 403
213 404
2 416
3 500

Lets do a short description of the commands we used:

cat: show the contents of a file, it accepts wildcards and many files as input.

|: a pipe, is used to redirect the output of a command to the input of another.

tr -s: using tr with the -s “ “ arguments, squeezes all repeative spaces to one.

cut -d “ “ -f9: using cut with the -d “ “ -f9 arguments, prints only the 9nth field of the line, the delimiter between the columns is a single space -d “ “.

sort -n: using sort with -n argument, does numerical sorting

uniq -c: using uniq with the -c argument prints how many times incoming data has reapeated, note that if the incoming data are not sorted, the count would be wrong.

column -t: using column with the -t argument prints the output nicely formated.

A more advanced example: group by date and http response code

Based on the previous input we can see that we have some HTTP 4XX and 5XX responses, but the log file spans on many dates, how we can find the date of the those specific responses?

As i said in the previous example i highly recommend you to run the same commands step by step in order to see the filtering of the data that takes place after each command until we reach to the result we want.

$ cat ./apache_logs.txt | tr -s " " | cut -d " " -f4,9 | tr -s ":" " " | cut -d " " -f1,5 | sort -n | uniq -c | sort -n | column -t | grep -i "  4\|  5"1     [18/May/2015  403
1 [20/May/2015 403
1 [20/May/2015 500
2 [18/May/2015 500
2 [19/May/2015 416
30 [17/May/2015 404
56 [20/May/2015 404
63 [18/May/2015 404
64 [19/May/2015 404

Lets do a short description of the commands we used:

I will skip the commands that i have explained in the previous example with the execption of cut and tr which they have different arguments.

cut -d “ “ -f4,9: cut using the -f4,9 arguments selects fields 4,9 from the line.

tr -s “:” “ “: tr using the -s “:” “ “ arguments replaces character “:” with a single space “ “.

grep -i “ 4\| 5”: grep using the -i “ 4\| 5” arguments prints only lines that contain a space followed by 4 or 5 which in our case the lines with http responses of 4XX and 5XX.

I hope you found this as a soft introduction to the logic behind log analysis and you know what tools to use and how to combine them to filter out what you need! in a next part we will go deeper, on not only to do simple filtering but also to do calculations based on input and produce details that can help us more, like percentages of errrors etc.

Written by

DevOps engineer, loves Linux, Python, cats and Amiga computers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store