I am running two small websites, this one and then the other one on digital design. For both of them I want to see basic access statistics (page hit rate and user access count). I was using for a long time "navrcholu.cz" for this, but this service is not available anymore. I was also experimenting with Google Analytics, but, frankly, I am not a fan of cookies.
So I was pondering what solution to choose... I wanted something, which is
Finally, I really like to learn new things and this adventure would bring me into the server-side scripting world. I had the chance to use all what I know about SQL and learn PHP ... - sharpen the tools I do not use in my daily life of a guy dealing with microelectronics.
And in the end I decided to write this brief description of the solution since someone can find it useful or inspiring. Full source codes are available in the end of the document.
Note, that I am not an expert on GDPR. All below is provided "as it is", I am not responsible for any problems which can arise from usage of the whole solution. See also Terms of use below.
User identification is the biggest problem we are facing, here. Counting the users will need to execute the following steps when a web page is being accessed by a browser:
The problem lies in the step IDENT. A simple way how to identify a user is using her or his IP address. However, IP address is a kind of personal information; if we store it into the database, we are dealing with personal data. This I want to definitely avoid.
The solution to the problem is simple: we transform the IP address into another user identifier in an irreversible way. The new identifier shall be unique enough (requirement of uniqueness) to allow counting of the unique users (at least approximatively, I will explain later - "unique") and still fuzzy enough (requirement of fuzziness) not to allow anyone to reconstruct the original IP addresses at all. Let us have a look at these two requirements, now.
Fuzziness: How to calculate such an identifier? Hashing of the IP address can help us. To get the new identifier "fuzzy enough", we have to respect a simple rule: IP address has at least 32 bits (4 billions of addresses); the final hash shall not allow anyone to reconstruct the IP, thus it shall be "narrow enough" in terms of bits. If we choose 8 bit hash, then we reduce 32 bits to 8 bits, reduction by 24 bits. This means that each eight bit hash value will represent about 16 millions of IPs (2^32/2^8 = 2^24). This reduction (aliasing) is very important and actually disallows to reconstruct the original IP from the hash. So the algorithm will be as follows:
Pearson hashing algorithm was chosen for the hashing itself in my implementation.
Uniqueness: We need to monitor the number of unique users, but with 8 bit hash we know, that about 16 millions of IPs are mapped to one value. If we proceed according to the algorithm above, we will be able to identify only at most 256 users and then the system will stop counting. Database will contain all the 256 possible values and no more accessess will be logged. So we will utilize also current time, modified approach follows:
This solution has two limitations.
To conclude, the presented solution is not able to count unique users in the exact sense as e.g., defined here. However, for my website this is a sufficient solution.
The access counter solution is composed of the following components:
For the part counting the page accessess, see the previous post. Only the part related to the user counting is described here.
There is a database behind the counter, created in MySQL, stored on the server side. You need to set it up and choose some login name and password for it. The following SQL code was used to create the database, database schema is apparent from it (and very simple, here):
CREATE TABLE user_counting ( hash smallint NOT NULL PRIMARY KEY, request_time int, visitor_no int );
The table web_counter_table contains the following columns:
The content of the table can look like the following (after a while on my web :-) ):
hash (smallint) | request_time (int) | visitor_no (int) |
---|---|---|
119 | 1639789010 | 00003 |
4 | 1639422684 | 00001 |
97 | 1639432670 | 00002 |
120 | 1639435190 | 00001 |
The following code is handling the user counter, the important part of it is excerpted below. Place the code into a dedicated file like counter.php. The code was simplified before I published it, here, but all necessary is there.
Note:
<?php $C_VISITOR_TIMEOUT = 300; function pearsonHash($input_string) { $f_transposition = array( 1, 87, 49, 12, 176, 178, 102, 166, 121, 193, 6, 84, 249, 230, 44, 163, 14, 197, 213, 181, 161, 85, 218, 80, 64, 239, 24, 226, 236, 142, 38, 200, 110, 177, 104, 103, 141, 253, 255, 50, 77, 101, 81, 18, 45, 96, 31, 222, 25, 107, 190, 70, 86, 237, 240, 34, 72, 242, 20, 214, 244, 227, 149, 235, 97, 234, 57, 22, 60, 250, 82, 175, 208, 5, 127, 199, 111, 62, 135, 248, 174, 169, 211, 58, 66, 154, 106, 195, 245, 171, 17, 187, 182, 179, 0, 243, 132, 56, 148, 75, 128, 133, 158, 100, 130, 126, 91, 13, 153, 246, 216, 219, 119, 68, 223, 78, 83, 88, 201, 99, 122, 11, 92, 32, 136, 114, 52, 10, 138, 30, 48, 183, 156, 35, 61, 26, 143, 74, 251, 94, 129, 162, 63, 152, 170, 7, 115, 167, 241, 206, 3, 150, 55, 59, 151, 220, 90, 53, 23, 131, 125, 173, 15, 238, 79, 95, 89, 16, 105, 137, 225, 224, 217, 160, 37, 123, 118, 73, 2, 157, 46, 116, 9, 145, 134, 228, 207, 212, 202, 215, 69, 229, 27, 188, 67, 124, 168, 252, 42, 4, 29, 108, 21, 247, 19, 205, 39, 203, 233, 40, 186, 147, 198, 192, 155, 33, 164, 191, 98, 204, 165, 180, 117, 76, 140, 36, 210, 172, 41, 54, 159, 8, 185, 232, 113, 196, 231, 47, 146, 120, 51, 65, 28, 144, 254, 221, 93, 189, 194, 139, 112, 43, 71, 109, 184, 209); $curr_hash = 0; for ($index = 0; $index < strlen($input_string); $index++) { $curr_char = $input_string[$index]; $curr_hash = $f_transposition[$curr_hash ^ ord($curr_char)]; } return $curr_hash; } $dbc = mysqli_connect('localhost', 'username_for_the_db', 'password_for_the_db', 'web_counter_database'); #====================================================================== # user-based access counting #====================================================================== # Compute hash of the IP address $user_id = pearsonHash($_SERVER["REMOTE_ADDR"]); $result = mysqli_query($dbc, "START TRANSACTION"); # Is it in the table from the last five minutes? #if it is in the hashed_accesses table in the last five minutes --> update the time in the table #if it is not in the hashed_accesses table in the last five minutes --> add it to the table or update the record which is there, increase unique visitors's count in the hash record $query = "SELECT hash,request_time,visitor_no FROM user_counting WHERE hash=\"".$user_id."\""; $result = mysqli_query($dbc, $query); $num_rows = mysqli_num_rows($result); if ((!$result)||($num_rows==0)) { $query = "INSERT INTO user_counting (hash, request_time, visitor_no) VALUES (".$user_id.",".$_SERVER["REQUEST_TIME"].",1)"; } else { $row = mysqli_fetch_row($result); if ( ($_SERVER["REQUEST_TIME"]-$row[1]) > $C_VISITOR_TIMEOUT ) { $query = "UPDATE user_counting SET visitor_no = visitor_no + 1, request_time = ".$_SERVER["REQUEST_TIME"]." WHERE hash=".$user_id; } else { $query = "UPDATE user_counting SET request_time = ".$_SERVER["REQUEST_TIME"]." WHERE hash=".$user_id; } } $result = mysqli_query($dbc, $query); $result = mysqli_query($dbc, "COMMIT"); mysqli_close($dbc); ?> <body> <html>
I wrote a simple PHP code reporting the counter statistics in a table. You can place it e.g., in counter_report.php, simplified code follows below. Note that anyone can open this page in the browser, there is no security. As it is, counter values can be publicly accessed.
<table style="border: 1px solid black"> <tr> <th>Hash</th> <th>Timestamp</th> <th>Request time</th> <th>Visitor no.</th> </tr> <?php $dbc = mysqli_connect('localhost', 'username_for_the_db', 'password_for_the_db', 'web_counter_database'); $query = "SELECT * FROM user_counting"; $result = mysqli_query($dbc, $query); if ($debug) { print_r($result); }; $total = 0; $num_rows = mysqli_num_rows($result); if ((!$result)||($num_rows==0)) { debug_echo("NOT FOUND<br>"); } else { while ($row = mysqli_fetch_row($result)) { printf("<tr><td>%3d</td><td>%d</td><td>%s</td><td>%05d</td></tr>\n", $row[0], $row[1], date("d.m.y H:i:s",$row[1]), $row[2]); $total = $total + $row[2]; } } printf("<tr><td><b>Total</b></td><td></td><td></td><td><b>%05d</b></td></tr>", $total); mysqli_close($dbc); ?>
The report can then look like the following
Hash | Timestamp | Request time | Visitor no. |
---|---|---|---|
119 | 1639789010 | 18.12.21 01:56:50 | 00003 |
4 | 1639422684 | 13.12.21 20:11:24 | 00001 |
97 | 1639432670 | 13.12.21 22:57:50 | 00002 |
Total | 00006 |
The solution has some limitations and some potential improvements are also obvious and possible. Selected items from my "todo list" for the future development follows:
To conclude, the presented solution is not able to count unique users in the exact sense as e.g., defined here. The first limitation is brought by the timeout, after which we count the visitor as a new one. The second limitation lies in the hashing which can sometimes result into counting two otherwise distinctive users as only one. However, for my website this is a sufficient solution.
The whole package with the PHP source codes and list of SQL command to create the database schema is available here. In addition to what is described above, the package contains also user trace monitoring: reporting of the sequences of web pages visited by the unqiue "hashes". Using it I can see how the user went through the web and optimize the pages accordingly. Again, the monitoring works only with hashes, no user data are stored, there.
And, by the way, you are visitor no.
991 since the last moment I reset the counter.I am neither an expert on GDPR, nor on development for web. So by using information at this site, you agree that the author is not liable to you or to any other party for any direct, indirect, incidental, special, consequential or exemplary damages, including but not limited to, damages for loss of profits, goodwill, use, data, or other intangible losses, regardless of whether the author was advised of the possibility of such damage. In no event shall author's liability exceed zero U.S. dollar (USD 0.00) in aggregate. In the case that applicable law may not allow the limitation or exclusion of liability or incidental or consequential damages, the above limitation or exclusion may not apply to you, although author's liability will be limited to the fullest extent permitted by applicable law.