Update on my website maintenance woes: My websites had been down for the last 12 hours or so, so I decided to look on some way to restart them.
-
Update on my website maintenance woes: My websites had been down for the last 12 hours or so, so I decided to look on some way to restart them. Then I found a button in my provider's administration interface that purged all objects in my "Varnish Cache". I remembered that the "Varnish Cache" came up from time to time in the error messages, so I clicked it.
And my websites were suddenly online again.
This is an obvious improvement to my prior situation, and gives me hope that this trick will work in the future as well. But I would like to understand what the actual problem was (beyond it having to do with the Varnish Cache), so that maybe I can prevent from reoccurring in the future in the first place.
I don't want this to become another "sacrifice a black goat at midnight" thing, like with certain aspects of my #TeXLaTeX installation - where I know that some things will work, but not _why_ they work.

-
Update on my website maintenance woes: My websites had been down for the last 12 hours or so, so I decided to look on some way to restart them. Then I found a button in my provider's administration interface that purged all objects in my "Varnish Cache". I remembered that the "Varnish Cache" came up from time to time in the error messages, so I clicked it.
And my websites were suddenly online again.
This is an obvious improvement to my prior situation, and gives me hope that this trick will work in the future as well. But I would like to understand what the actual problem was (beyond it having to do with the Varnish Cache), so that maybe I can prevent from reoccurring in the future in the first place.
I don't want this to become another "sacrifice a black goat at midnight" thing, like with certain aspects of my #TeXLaTeX installation - where I know that some things will work, but not _why_ they work.

@juergen_hubert How often does it happen? Can you switch off the Varnish cache at Gandi? Tip: Monitoring free of charge with e.g. pingtide.com (Hosted in Germany).
-
@juergen_hubert How often does it happen? Can you switch off the Varnish cache at Gandi? Tip: Monitoring free of charge with e.g. pingtide.com (Hosted in Germany).
@crash_bandicoot The last time it happened was about two weeks or so.
-
Update on my website maintenance woes: My websites had been down for the last 12 hours or so, so I decided to look on some way to restart them. Then I found a button in my provider's administration interface that purged all objects in my "Varnish Cache". I remembered that the "Varnish Cache" came up from time to time in the error messages, so I clicked it.
And my websites were suddenly online again.
This is an obvious improvement to my prior situation, and gives me hope that this trick will work in the future as well. But I would like to understand what the actual problem was (beyond it having to do with the Varnish Cache), so that maybe I can prevent from reoccurring in the future in the first place.
I don't want this to become another "sacrifice a black goat at midnight" thing, like with certain aspects of my #TeXLaTeX installation - where I know that some things will work, but not _why_ they work.

@juergen_hubert
You need to know why clearing the cache helped.often its a bot scraping everything on your site and leaving the cache full of less useful pages.
Can you see stats like hit vs miss and backend timeouts / errors vs responses?
There's usually a ramp up of something, (likely misses) and you can use that ramp up to find the time to look for in the logs for sus behaviour.
-
@juergen_hubert
You need to know why clearing the cache helped.often its a bot scraping everything on your site and leaving the cache full of less useful pages.
Can you see stats like hit vs miss and backend timeouts / errors vs responses?
There's usually a ramp up of something, (likely misses) and you can use that ramp up to find the time to look for in the logs for sus behaviour.
Hmmm... comparing my error log with my access log, it seems that the most likely culprit is the "blex-crawler" - at least, it made a _lot_ of requests right around the time the latest error wave started.