Quantcast
Channel: Why Measure That? » analytics
Viewing all articles
Browse latest Browse all 9

Using Piwik for web log files analysis

$
0
0

Along the articles I’ve written so far we’ve been always surrounded by the love of Google Analytics (GA). I like GA because it’s so simple (both usage and implementation), powerful and most importantly, FREE. But if I tell you: you cannot install any tracking code on my site; or I forgot to install the GA code, what I got for you is just web server log files – what would you do? Here comes a tool that I am going to introduce in this post – Piwik.

What is Piwik?

piwik

Piwik is an open source web analytics platform. I am not trying to explain all the features it supports; but consider it as something very similar to what Google Analytics offers, with the following differences:

  • With Google Analytics, your data is kept on Google’s server. With Piwik, you can set up and keep data in your own database, or you can also use their hosted version. That actually means you can directly query your data using programs or even database statements directly.
  • Google Analytics and Piwik are using a very similar set of tracking code; however Piwik allows you to import web log files to its database and generate traffic report accordingly.

You can also see the full list of features here.

Installing Piwik

As mentioned you can set up your own Piwik instance and keep the data in your own database. Following are the basic steps on how to install Piwik; if you find it very technical, don’t worry – make a friend with a programmer and I’m sure he / she can help!

  1. Get yourself a web server running PHP 5 and MySQL. If you run it locally like me, you can use WAMP on Windows or MAMP on Mac – handy and quick. Haven’t heard of these? Did I mention you should make friends with programmers first?
  2. Set up a new database (you can name it Piwik) using the phpMyAdmin bundled with WAMP / MAMP. Add user with appropriate rights to the database.
  3. Download the latest version of Piwik here and extract it your web server.
  4. Configure a virtual host for the Piwik application in Apache. Actually I never get to find 1 simple doc that tells me how to do it, so I just share what I’ve done here to save your time:
    • Open the httd.conf; it should be located under \{WAMP installation folder}\bin\apache\Apache x.x.x\conf
    • If you need to run Piwik in a port other than 80, add Listen <port number> under the line Listen 80. For example, I am running the Piwik as http://localhost:7788, then I will add:
      Listen 7788
      
    • Search for the word <VirtualHost>, add the following section (replace the parameters to suit your case):
      
      DocumentRoot "D:\Source Code\piwik"
      
      
    • Then search for the word </Directory>, add the following section after it (replace the parameters to suit your case):
      <Directory "D:\Source Code\piwik">
      AllowOverride None
      Options None
      Require all granted
      </Directory>
      
  5. Now access the site! In my case, visit http://localhost:7788. You will now see the Piwik installation screen.
  6. Follow the installation guide here: http://piwik.org/docs/installation/ and the remaining should be quite straight forward.

Right before you finish the installation wizard, you should see there’re some instructions for you to put JavaScript tags.  If you’re familiar with GA’s JavaScript API, the tracking API of the 2 tools look almost the same. You can implement the tracking to your site according to the instructions, we will skip the details here as our goal here is to do log file analysis.

Importing Log Files

So now you have Piwik running, the next step is to import the log files into Piwik database. To do that Piwik has a log import tool written in Python. Go to http://www.python.org/ and download the latest Python (should be 2.76 at the time I am writing this, I’ve tried the beta version 3.x and it won’t work) and install it in your computer.

After installing Python, open command prompt or terminal and run the following script:

python /path/to/piwik/misc/log-analytics/import_logs.py --url=http://analytics.example.com access.log

Replace the parameters accordingly, in my case:

  • Locate the path of imports_logs.py under the web root folder of Piwik: ‘D:\Source Code\piwik\misc\log-analytics’
  • The url parameter here refers to your Piwik URL, NOT the actual site that you’re tracking. In my case it should be: http://localhost:7788
  • Access.log parameter here refer to the file path of your log files

Still having problems? Read the online Piwik documentation.

Viewing Reports

So the rest is straight forward. After you’ve successfully imported the log files using the Python tool above, you can now go to visit the Piwik site again, and hopefully it should look like this:

piwik interface

Details of the report will highly dependent on the data you’ve collected on log files. For example, if you’re not storing referral domains in the log files, you will not see any referral domain data in Piwik. For example, this is a report that I’ve generated using Apache log files that only possesses data like visitor’s IP, requested path and time:

piwik no data

I am sure there’re Apache or IIS experts who can tell you what parameters you will need to enable on web server in order to get more information from incoming traffic. Not sure what I am saying again? Oh maybe that’s time to make friends with system administrators as well. Or you can spend some time to study the log format from Apache here: http://httpd.apache.org/docs/current/logs.html.

Bonus *

For analytics guru like you, you can actually directly query the Piwik databases using SQL statements! You can find the database schema here: http://developer.piwik.org/guides/persistence-and-the-mysql-backend.

Summary

So this is the quick start guide for using Piwik to do log analysis. It’s difficult to say whether Google Analytics or Piwik is better, but just want to raise 1 more example where I think Piwik would be particularly useful – tracking Intranet activities.  In some companies staff would have no access to the Internet, in that case Google Analytics can’t track users’ activities (as there’s no Internet connection) and Piwik would be the ideal solution to track Intranet activities. And if your company have extremely strict data policies or just simply doesn’t trust Google, you can consider Piwik as well as all data can be stored in your own data warehouse.

I hope you will find this article useful, and share with me your Piwik experiences or use cases that is unique in Piwik but not in any other analytics software in the market!


Viewing all articles
Browse latest Browse all 9

Trending Articles