Skip to content
0
  • Home
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
  • Home
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Sketchy)
  • No Skin
Collapse

Wandering Adventure Party

  1. Home
  2. Uncategorized
  3. I have a problem, which is: My websites (a #Wordpress site and a #MediaWiki installation) are slow as hell.'nSo I need to identify the cause.

I have a problem, which is: My websites (a #Wordpress site and a #MediaWiki installation) are slow as hell.'nSo I need to identify the cause.

Scheduled Pinned Locked Moved Uncategorized
wordpressmediawiki
12 Posts 4 Posters 14 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Jürgen HubertJ Jürgen Hubert

    2/ Okay, I think I might already have some ideas.

    My latest #Apache log has 26,694 lines.

    In these 26.694 lines, I have:

    - 10,724 access requests from "https://developers.facebook.com/docs/sharing/webmasters/crawler"
    - 4.562 access requests from "https://developer.amazon.com/support/amazonbot"
    - 3.316 access requests from "https://openai.com/gptbot"

    So yeah, I suspect these are the #LLM crawling bots from #Facebook , #Amazon , and #OpenAI who jointly make up for more than half the traffic - and they are hogging the more resource intensive functions, like "Recent Changes" on my wiki.

    Fuck those fuckers for causing outages on my websites.

    And any suggestions on how to block them (no snark, please - I _am_ new at this.)

    samir, the brown sheepS This user is from outside of this forum
    samir, the brown sheepS This user is from outside of this forum
    samir, the brown sheep
    wrote last edited by
    #3

    @juergen_hubert This might be a good place to start:

    Link Preview Image
    GitHub - ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.

    A list of AI agents and robots to block. Contribute to ai-robots-txt/ai.robots.txt development by creating an account on GitHub.

    favicon

    GitHub (github.com)

    I am not an expert, but I am happy to try and answer any questions you might have.

    Jürgen HubertJ 1 Reply Last reply
    0
    • samir, the brown sheepS samir, the brown sheep

      @juergen_hubert This might be a good place to start:

      Link Preview Image
      GitHub - ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.

      A list of AI agents and robots to block. Contribute to ai-robots-txt/ai.robots.txt development by creating an account on GitHub.

      favicon

      GitHub (github.com)

      I am not an expert, but I am happy to try and answer any questions you might have.

      Jürgen HubertJ This user is from outside of this forum
      Jürgen HubertJ This user is from outside of this forum
      Jürgen Hubert
      wrote last edited by
      #4

      @samir

      Thanks! I will fiddle around with those and see if anything works.

      1 Reply Last reply
      0
      • Jürgen HubertJ Jürgen Hubert

        1/ I have a problem, which is: My websites (a #Wordpress site and a #MediaWiki installation) are slow as hell.

        So I need to identify the cause. The problem is that I don't know nearly as much about website administration as I ought to be.

        I contacted the support people at my website provider, who looked at my (Apache) logs and suggested that my Wordpress site might suffer from a "pingback xmlrpc attack". I did the proposed remedy, which made things a little better. But I don't know enough about reading website logs to identify such problems myself, which I ought to.

        So what I am trying to say is: Is there some kind of beginners guide for reading website logs, identifying malicious traffic, and what to do about it?

        Femme MalheureuseF This user is from outside of this forum
        Femme MalheureuseF This user is from outside of this forum
        Femme Malheureuse
        wrote last edited by
        #5

        @juergen_hubert Wonder if you're getting scraped by AI harvesting bots. Can your site host tell you if you are/are not? And if it's AI bots scraping for LLMs, is the host doing anything to block them?

        Jürgen HubertJ 1 Reply Last reply
        0
        • Femme MalheureuseF Femme Malheureuse

          @juergen_hubert Wonder if you're getting scraped by AI harvesting bots. Can your site host tell you if you are/are not? And if it's AI bots scraping for LLMs, is the host doing anything to block them?

          Jürgen HubertJ This user is from outside of this forum
          Jürgen HubertJ This user is from outside of this forum
          Jürgen Hubert
          wrote last edited by
          #6

          @femme_mal I took a closer look, and I am _definitely_ scraped by AI harvesting bots.

          Jürgen Hubert (@juergen_hubert@mementomori.social)

          2/ Okay, I think I might already have some ideas. My latest #Apache log has 26,694 lines. In these 26.694 lines, I have: - 10,724 access requests from "https://developers.facebook.com/docs/sharing/webmasters/crawler" - 4.562 access requests from "https://developer.amazon.com/support/amazonbot" - 3.316 access requests from "https://openai.com/gptbot" So yeah, I suspect these are the #LLM crawling bots from #Facebook , #Amazon , and #OpenAI who jointly make up for more than half the traffic - and they are hogging the more resource intensive functions, like "Recent Changes" on my wiki. Fuck those fuckers for causing outages on my websites. And any suggestions on how to block them (no snark, please - I _am_ new at this.)

          favicon

          Memento mori (mementomori.social)

          1 Reply Last reply
          0
          • Jürgen HubertJ Jürgen Hubert

            2/ Okay, I think I might already have some ideas.

            My latest #Apache log has 26,694 lines.

            In these 26.694 lines, I have:

            - 10,724 access requests from "https://developers.facebook.com/docs/sharing/webmasters/crawler"
            - 4.562 access requests from "https://developer.amazon.com/support/amazonbot"
            - 3.316 access requests from "https://openai.com/gptbot"

            So yeah, I suspect these are the #LLM crawling bots from #Facebook , #Amazon , and #OpenAI who jointly make up for more than half the traffic - and they are hogging the more resource intensive functions, like "Recent Changes" on my wiki.

            Fuck those fuckers for causing outages on my websites.

            And any suggestions on how to block them (no snark, please - I _am_ new at this.)

            Jason LefkowitzJ This user is from outside of this forum
            Jason LefkowitzJ This user is from outside of this forum
            Jason Lefkowitz
            wrote last edited by
            #7

            @juergen_hubert It's hard to say if that's the culprit without knowing more. 17,000 requests from a bot sounds bad, but if they're spread out over a week or a month or whatever, they may not be enough to be causing performance problems. (You'll usually see performance problems from volumes of requests at consistently high levels over a sustained period.)

            There are tips I could give you for hardening WordPress against these types of requests that wouldn't require any sysadmin work. But if you want to harden multiple separate applications, like WP and MediaWiki, that gets more complicated.

            (1/?)

            Jason LefkowitzJ 1 Reply Last reply
            0
            • Jason LefkowitzJ Jason Lefkowitz

              @juergen_hubert It's hard to say if that's the culprit without knowing more. 17,000 requests from a bot sounds bad, but if they're spread out over a week or a month or whatever, they may not be enough to be causing performance problems. (You'll usually see performance problems from volumes of requests at consistently high levels over a sustained period.)

              There are tips I could give you for hardening WordPress against these types of requests that wouldn't require any sysadmin work. But if you want to harden multiple separate applications, like WP and MediaWiki, that gets more complicated.

              (1/?)

              Jason LefkowitzJ This user is from outside of this forum
              Jason LefkowitzJ This user is from outside of this forum
              Jason Lefkowitz
              wrote last edited by
              #8

              @juergen_hubert If you can download your access logs from your hosting provider, GoAccess (https://goaccess.io/) is a handy free tool for analyzing them quickly. It can put together simple charts that show you who's hitting your site, when, and from where. These can be useful for identifying spikes in traffic from different sources, which you can then block.

              (2/?)

              Jason LefkowitzJ 1 Reply Last reply
              0
              • Jason LefkowitzJ Jason Lefkowitz

                @juergen_hubert If you can download your access logs from your hosting provider, GoAccess (https://goaccess.io/) is a handy free tool for analyzing them quickly. It can put together simple charts that show you who's hitting your site, when, and from where. These can be useful for identifying spikes in traffic from different sources, which you can then block.

                (2/?)

                Jason LefkowitzJ This user is from outside of this forum
                Jason LefkowitzJ This user is from outside of this forum
                Jason Lefkowitz
                wrote last edited by
                #9

                @juergen_hubert If you want to block traffic to multiple applications on a single machine, you're either going to need to be able to modify your web server software's configuration (which many hosts don't allow), or software called a "web application firewall" (WAF).

                A WAF sits between the public web and your applications, filtering and throttling traffic before it reaches them. It gives you one central way to block or rate-limit entire domains.

                Many hosting companies integrate with Cloudflare, which offers a basic, free WAF as a service. So that might be something to talk to your host about.

                Link Preview Image
                Web application firewall - Wikipedia

                favicon

                (en.wikipedia.org)

                (3/?)

                Jason LefkowitzJ 1 Reply Last reply
                0
                • Jason LefkowitzJ Jason Lefkowitz

                  @juergen_hubert If you want to block traffic to multiple applications on a single machine, you're either going to need to be able to modify your web server software's configuration (which many hosts don't allow), or software called a "web application firewall" (WAF).

                  A WAF sits between the public web and your applications, filtering and throttling traffic before it reaches them. It gives you one central way to block or rate-limit entire domains.

                  Many hosting companies integrate with Cloudflare, which offers a basic, free WAF as a service. So that might be something to talk to your host about.

                  Link Preview Image
                  Web application firewall - Wikipedia

                  favicon

                  (en.wikipedia.org)

                  (3/?)

                  Jason LefkowitzJ This user is from outside of this forum
                  Jason LefkowitzJ This user is from outside of this forum
                  Jason Lefkowitz
                  wrote last edited by
                  #10

                  @juergen_hubert On Apache, for a quick fix, if your host gives you the ability to use .htaccess files (which modify Apache's configuration), you could put lines in each site's .htaccess like

                  Require not host <host.example.com>

                  The downside is that it's on you to keep up with the domains the crawlers are coming from, and they change. A WAF lets you just say "throttle anyone who shows up too much."

                  You'd also have to keep up your list in two places, the .htaccess for WP, and the one for MediaWiki.

                  .htaccess syntax is also finicky. If you don't know what you're doing, I wouldn't mess with it.

                  Link Preview Image
                  Access Control - Apache HTTP Server Version 2.4

                  favicon

                  (httpd.apache.org)

                  (4/?)

                  Jason LefkowitzJ 1 Reply Last reply
                  0
                  • Jason LefkowitzJ Jason Lefkowitz

                    @juergen_hubert On Apache, for a quick fix, if your host gives you the ability to use .htaccess files (which modify Apache's configuration), you could put lines in each site's .htaccess like

                    Require not host <host.example.com>

                    The downside is that it's on you to keep up with the domains the crawlers are coming from, and they change. A WAF lets you just say "throttle anyone who shows up too much."

                    You'd also have to keep up your list in two places, the .htaccess for WP, and the one for MediaWiki.

                    .htaccess syntax is also finicky. If you don't know what you're doing, I wouldn't mess with it.

                    Link Preview Image
                    Access Control - Apache HTTP Server Version 2.4

                    favicon

                    (httpd.apache.org)

                    (4/?)

                    Jason LefkowitzJ This user is from outside of this forum
                    Jason LefkowitzJ This user is from outside of this forum
                    Jason Lefkowitz
                    wrote last edited by
                    #11

                    @juergen_hubert There's probably more to be said, but I've gone on for far too long already 😆

                    Hope this was at least helpful. If you want to talk further, feel free to @ me either here or in DMs. Can't promise I can solve your problem, but I'm happy to help however I can.

                    ~ fin ~

                    (5/5)

                    Jürgen HubertJ 1 Reply Last reply
                    0
                    • Jason LefkowitzJ Jason Lefkowitz

                      @juergen_hubert There's probably more to be said, but I've gone on for far too long already 😆

                      Hope this was at least helpful. If you want to talk further, feel free to @ me either here or in DMs. Can't promise I can solve your problem, but I'm happy to help however I can.

                      ~ fin ~

                      (5/5)

                      Jürgen HubertJ This user is from outside of this forum
                      Jürgen HubertJ This user is from outside of this forum
                      Jürgen Hubert
                      wrote last edited by juergen_hubert@mementomori.social
                      #12

                      @jalefkowit

                      Thanks - you have given me a lot to think about!

                      1 Reply Last reply
                      0

                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Login or register to search.
                      Powered by NodeBB Contributors
                      • First post
                        Last post