Skip to main content

AI hammering GFF

When the site became slow, I asked ChatGPT to find the reason. Its reply? ChatGPT was the cause!

Nerd alert
This is NOT about fly fishing or fly tying, but about site development and nerdy stuff.
No comments yet

During late November 2025 this site was severely slowed down by a sudden increase in traffic. The incoming requests probably rose by a factor 10 overnight and for a couple of weeks the traffic continued on this level, and was blocking the server creating all kinds of trouble.

I had constant high CPU-loads, the server totally filling up with cache files, really slow response times, the database constantly clogging up and hanging, and other annoying effects.

Traffic
Traffic
Martin Joergensen

I asked Linode, the company that provides the server, if they could see anything, and they saw a rise in traffic, but no attacks as such, like a DDOS-attack or the like. Apart from that they couldn’t help. Since it was within what can normally be expected, the problem was mine to solve.

So I turned to ChatGPT and asked how I would diagnose the problem.

The AI led me through a bunch of fairly complex Linux-commands and analyzed each, leading me on to new ones, and could finally draw a conclusion.

And the conclusion?
Bot traffic, mainly from ChatGPT itself!

Bot traffic, mainly from ChatGPT itself!

Server metrics
Server metrics
Martin Joergensen
Pagespeed
Pagespeed
Martin Joergensen

The site was being visited by the bots that gather information for the LLM’s (Large Language Models) that drive various AI’s including ChatGPT’s own, Anthropic's (Claude), Meta's (Facebook) and more.
These bots were fetching pages on GFF at a rate of dozens per second, which is way too often. Good bot behavior is of course going slow, and not weighing down sites!

My way out of this was to block them before they could get to any content, and that was quite efficient. I already had them blocked in the robots.txt file on the site, but this is just a request to stay out – not a technical blocking.
Now they are blocked from entering, and the major AI’s are no longer training on GFF content.

AI is indeed a double edged sword, and this incident clearly shows the problem: on one hand they can solve problems, that some of us humans might have trouble handling. I could not have done this without a Linux-expert by my side. On the other hand they do this by utilizing all the knowledge out there already, and that obviously sometimes means over-utilizing.

But ChatGPT solved my problems by shutting out itself, and the site is now as fast – and probably even faster – than ever, and I’m happy.

The way there

And for those (few, I hope) of you facing - and having to handle - similar problems, I asked the AI to make me a summary of what it made me do, and this was it:

  1. Identified abnormal connection behavior
    • Used ss -tnp to inspect current TCP connections.
    • Observed many CLOSE-WAIT sockets with large queues, indicating connections not being closed properly and workers being held too long.
  2. Analyzed traffic volume and pattern
    • Parsed Apache access logs to compute requests per hour.
    • Found spikes around ~100,000 requests per hour, confirming a significant traffic surge.
  3. Discovered bot-dominated traffic as the main driver
    • Extracted top User-Agent strings from Apache logs.
    • Found very high counts from crawlers such as:
      • GPTBot (OpenAI) ~866k requests
      • meta-externalagent (Meta crawler) ~103k requests
      • Baiduspider family ~77k requests
      • Claude-SearchBot (Anthropic), Applebot, DataForSeo, BLEXBot, etc.
    • Concluded that the spike was primarily caused by AI/search/SEO bots, not human visitors.
  4. Implemented bot blocking
    • Updated robots.txt to explicitly disallow known AI, SEO, and aggressive international crawlers.
    • Added Apache .htaccess rules using mod_rewrite to return 403 Forbidden to:
      • GPTBot, Claude-SearchBot, meta-externalagent
      • Baiduspider, Bytespider, SemrushBot, DataForSeoBot, BLEXBot, and others.
    • Observed an immediate reduction in abusive bot traffic.
  5. Diagnosed Apache worker saturation
    • Checked /var/log/apache2/error.log for MPM errors.
    • Found repeated messages:
      • server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
    • Confirmed that Apache’s mpm_prefork worker pool was hitting its concurrency limit and causing slow responses.
  6. Measured Apache process memory usage and considered available RAM
    • Used ps -ylC apache2 --sort:rss to inspect Apache child memory usage.
    • Found each worker using roughly 16–18 MB RSS.
    • Verified with free -m that the server has 16 GB RAM with plenty of free and cached memory.
  7. Tuned Apache prefork MPM settings
    • Edited mpm_prefork.conf to better match available resources.
    • Adjusted key values to:
      • MaxRequestWorkers 250 (up from 150, leveraging 16 GB RAM safely)
      • MaxConnectionsPerChild 2000 to periodically recycle workers and avoid memory bloat
    • Aim: allow more concurrent requests without hitting the worker ceiling or causing swap thrashing.
  8. Optimized connection and timeout behavior
    • Added/tuned global connection settings in apache2.conf:
      • KeepAlive On
      • MaxKeepAliveRequests 100
      • KeepAliveTimeout 2 (reduced from 5 to free workers faster)
      • Timeout 60 (reduced from the much higher default)
    • Enabled and configured mod_reqtimeout to protect against slowloris-style attacks and very slow clients:
      • RequestReadTimeout header=10-20,MinRate=500 body=10,MinRate=500
    • Goal: ensure workers are not held for excessive time by idle or misbehaving clients.
  9. Verified overall resource health and stability
    • Confirmed that memory usage remained healthy and stable with:
      • High available memory (~13+ GB)
      • Minimal and stable swap usage (~253 MB)
    • Concluded that, with increased worker limits and stricter timeouts, Apache can now handle legitimate traffic surges more gracefully.
Image gallery for AI hammering GFF

Since you got this far …

A money box
The GFF money box

… I have a small favor to ask.

Long story short

Support the Global FlyFisher through several different channels, including PayPal.

Long story longer

The Global FlyFisher has been online since the mid-90's and has been free to access for everybody since day one – and will stay free for as long as I run it.
But that doesn't mean that it's free to run.
It costs money to drive a large site like this.
See more details about what you can do to help in this blog post.