[om-infra] Monitor is UP: OMV_gallery
Wayne Sallee
Wayne at WayneSallee.com
Mon Dec 12 12:04:03 EST 2016
Looks like a quick fix would be to set the vm to power off for the backup, and then power back on after the backup. If
that change would be easy to implement.
Then when the suspend problem is fixed, change it back.
Wayne Sallee
Wayne at WayneSallee.com
http://www.WayneSallee.com
On 12/12/2016 07:26 AM, Jean-Claude Vanier via OM-Infra wrote:
> Reading the error/warn logs of nginx, beside regular attacks mainly against WP,
> I have found something interesting around the time where the CMSes where down.
> See the attachment:
> -- text coloured in orange: errors related to pagespeed
> -- text coloured in violet: first error after the CMSes where up again
> -- after, we have error about pagespeed again
> -- not shown: the following errors don't show pagespeed errors anymore
> but the logs show that the "Slow ReadFile operation" error occurs
> rather often. I will try extract a readable statistic.
>
> @Raphaël: wdyt, should we disable pagespeed or modify its config?
>
>
> 2016-12-12 10:16 GMT+01:00 Jean-Claude Vanier <jclvanier at gmail.com>:
>> Hi,
>> once again, some of our web services were down during about 4 hours
>> between midnight and 04:00 AM (UTC).
>>
>> Last time, Raphaël suggested that the issue could have been due to the
>> backup in proxmox.
>> I started an investigation this morning and here are my first findings
>> and interrogations:
>> -- the robot monitors 5 web services, all hosted in jasper
>> -- only 3 of them was down this night
>> -- the backup was made on garnet
>> -- garnet was said to be suspended during about 30 minutes
>> -- the compression lasted 4 hours for 23 GiB
>> I have summarized the time chart in the attachment:
>> -- the time given has been adjusted to UTC + 1 (according to the
>> turquoise's time)
>> -- be aware that the aspect of the doc is non linear
>> -- for one CMS (gallery), the down time started before the compression
>> of the archive
>> -- I highlighted some warnings in the excerpt of the backup log
>>
>> Finally, I'm not really sure that the backup and the down time of the
>> CMSes are directly connected. However, if they are, I see don't how
>> exactly.
>> The common point between the downed CMSes seems to be nginx.
>> More investigations are needed (the analisis of the logs).
>>
>> If you have an idea, please, enlighten me :)
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Uptime Robot <alert at uptimerobot.com>
>> Date: 2016-12-12 5:00 GMT+01:00
>> Subject: Monitor is UP: OMV_gallery
>> To: jclvanier at gmail.com
>>
>>
>> Hi,
>>
>> The monitor OMV_gallery (https://gallery.openmandriva.org/) is back UP
>> (HTTP 200 - OK) (It was down for 4 hours, 20 minutes and 11 seconds).
>>
>>
>> Have a great day,
>>
>> Uptime Robot
>> http://uptimerobot.com
>> http://twitter.com/uptimerobot
>> http://facebook.com/uptimerobot
>>
>> P.S. Get notified of downtime faster (1-minute checks) with the Pro
>> Plan for only $4.5/month (for details:
>> http://uptimerobot.com/pricing).
>>
>>
>> _______________________________________________
>> OM-Infra mailing list
>> OM-Infra at ml.openmandriva.org
>> http://ml.openmandriva.org/mailman/listinfo/om-infra_ml.openmandriva.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.openmandriva.org/mailman/private/om-infra_ml.openmandriva.org/attachments/20161212/6c9b3b5a/attachment.html>
More information about the OM-Infra
mailing list