Attacks on my website by country of origin
I just migrated my humble website from Wordpress to a new platform (Next.js) and switched to hosting it on my Linux home server. I intended my first post after this change to be about my reasons and the technology used (and I will write about that). But I just gained some baffling insights about the traffic to my website, and I'd like to share my findings, since this might be of general interest. In particular, it would be interesting how my statistics compare to those of other people with an web server on a static IP address.
My website has Google Analytics, a standard way of monitoring how much interest it gains. But the surprises where elsewhere: in the access log of my web server. That log consists of many lines like the following, which happens to be caused by a bot accessing one of my posts:
22.214.171.124 - - [28/Mar/2021:21:03:32 +0200] "GET /foray-physics/ HTTP/1.1" 301 169 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)"
First in the line comes the IP address of the requester. Later in the line we see, among other things, the HTTP verb, which is
GET here, followed by the relative URL of my post,
/foray-physics/ and some more information.
The first thing that struck me is that my HTTP server log (which covers the last 12 months) is gargantuan. It's much bigger than my Google Analytics suggests for the time span of one year. So what is all this traffic not picked up by Google Analytics?
The second thing that struck me is the log contains a considerable share of
POST requests. That is, requests as in the line below, where the HTTP verb is
126.96.36.199 - - [28/Mar/2021:21:42:11 +0200] "POST /_ignition/execute-solution HTTP/1.1" 301 169 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0) Gecko/20100101 Firefox/78.0"
This instantly raises red flags:
- My website is pure static and works without POST request. (Its Wordpress predecessor allowed POSTs for an administrator, but that was only me, and I filtered myself from the server log. )
- The URL in the above request (and many others) has nothing to do with my blog.
- POST request are unlikely to happen by accident, since standard browser access of an URL creates a GET. Only willfully sending data creates a POST.
So such POST requests are at best accidental malpractice, at worst attempted attacks. Seeing so many POSTs, I got curious about the demographics of the malpractitioners.
Remark: POSTs are not the only candidate signal for malpractice, for example:
So I'll focus on POSTs.
My server log contains are around 14554 POSTs from Chinese IP addresses, 2117 POSTs from Russian IP addresses, and so on. The relative shares are in the figure below.
To determine the country of an IP address, I used the command
whois <IP address>
on Linux. Its output contains a country code (
CN for China,
RU for Russia, and so on). For completeness, I must say that in 1447 cases I got the country code
EU # Country is really world wide which can mean a number of countries including Russia, as I found out with the command
curl ipinfo.io/<IP address>/country
I left that EU country code out of the statistics to avoid muddying the waters. (The reason why I did not always use "ipinfo.io" instead of
whois is that the "ipinfo.io" server has a daily request limit.)
I am surprised by the appearance of Romania and South Korea. And by the absence of the population giant India, in particular since quite a few people from there download my computer science lectures.
The above figures might be unfair to countries with large populations, since more people are likely to commit more mischief. To address this issue, I made the following chart, which ranks the above countries by POSTs in my log per 100000 people. (Yes, one can do such statistics for other things than Corona.)
So, as far as my website is concerned, there is remarkably much malpractice per capita from Hong Kong and Romania. I don't know why. Maybe each country just has some particularly active script kiddies who ran into my IP address. Maybe there are systemic reason.
So, dear reader, what do you think of this? Do you also have some figures? For any feedback, please reply to my corresponding post on Twitter:
Carsten Führmann on Twitter
Or email me at
cfuhrmann at gmail dot com
Remark: I'm currently handcrafting a comment system for my website, but I'm not done yet. Therefore, discussion is only Twitter and email for now.