[om-infra] Monitor is UP: OMV_gallery

Jean-Claude Vanier jclvanier at gmail.com
Mon Dec 12 17:09:44 EST 2016


In fact, the suspend isn't really a problem. At least, this is what I
think for now but I'm not sure.
The problem is the "looong" time taken by the compression.


I have made some diagrams showing the number of pagespeed errors
during the last 12 months (from the nginx error log):
1- by months
2- by days
3- the average for each hour from 2016-09-01 (about 100 days)

1 and 2 have a very similar shape. The number of errors is
significantly higher during the first 6 months (from 2015-12 to
2016-05). There are very few errors during the summer.
My guess is that the number of error is linked to website traffic (see
the summaries given by piwik).

3 shows a strange shape: there are 4 peaks during the first occuring
evry 6 hour from midnight.
The taller peak (midnight) seems to be linked to the daily backup of a
VM but I cannot explain the 3 others.




2016-12-12 18:04 GMT+01:00 Wayne Sallee via OM-Infra
<om-infra at ml.openmandriva.org>:
> Looks like a quick fix would be to set the vm to power off for the backup,
> and then power back on after the backup. If that change would be easy to
> implement.
> Then when the suspend problem is fixed, change it back.
>
> Wayne Sallee
> Wayne at WayneSallee.com
> http://www.WayneSallee.com
>
> On 12/12/2016 07:26 AM, Jean-Claude Vanier via OM-Infra wrote:
>
> Reading the error/warn logs of nginx, beside regular attacks mainly against
> WP,
> I have found something interesting around the time where the CMSes where
> down.
> See the attachment:
> -- text coloured in orange: errors related to pagespeed
> -- text coloured in violet: first error after the CMSes where up again
> -- after, we have error about pagespeed again
> -- not shown: the following errors don't show pagespeed errors anymore
> but the logs show that the "Slow ReadFile operation" error occurs
> rather often. I will try extract a readable statistic.
>
> @Raphaël: wdyt, should we disable pagespeed or modify its config?
>
>
> 2016-12-12 10:16 GMT+01:00 Jean-Claude Vanier <jclvanier at gmail.com>:
>
> Hi,
> once again, some of our web services were down during about 4 hours
> between midnight and 04:00 AM (UTC).
>
> Last time, Raphaël suggested that the issue could have been due to the
> backup in proxmox.
> I started an investigation this morning and here are my first findings
> and interrogations:
> -- the robot monitors  5 web services, all hosted in jasper
> -- only 3 of them was down this night
> -- the backup was made on garnet
> -- garnet was said to be suspended during about 30 minutes
> -- the compression lasted 4 hours for 23 GiB
> I have summarized the time chart in the attachment:
> -- the time given has been adjusted to UTC + 1 (according to the
> turquoise's time)
> -- be aware that the aspect of the doc is non linear
> -- for one CMS (gallery), the down time started before the compression
> of the archive
> -- I highlighted some warnings in the excerpt of the backup log
>
> Finally, I'm not really sure that the backup and the down time of the
> CMSes are directly connected. However, if they are, I see don't how
> exactly.
> The common point between the downed CMSes seems to be nginx.
> More investigations are needed (the analisis of the logs).
>
> If you have an idea, please, enlighten me :)
>
>
>
> ---------- Forwarded message ----------
> From: Uptime Robot <alert at uptimerobot.com>
> Date: 2016-12-12 5:00 GMT+01:00
> Subject: Monitor is UP: OMV_gallery
> To: jclvanier at gmail.com
>
>
> Hi,
>
> The monitor OMV_gallery (https://gallery.openmandriva.org/) is back UP
> (HTTP 200 - OK) (It was down for 4 hours, 20 minutes and 11 seconds).
>
>
> Have a great day,
>
> Uptime Robot
> http://uptimerobot.com
> http://twitter.com/uptimerobot
> http://facebook.com/uptimerobot
>
> P.S. Get notified of downtime faster (1-minute checks) with the Pro
> Plan for only $4.5/month (for details:
> http://uptimerobot.com/pricing).
>
>
>
> _______________________________________________
> OM-Infra mailing list
> OM-Infra at ml.openmandriva.org
> http://ml.openmandriva.org/mailman/listinfo/om-infra_ml.openmandriva.org
>
>
>
> _______________________________________________
> OM-Infra mailing list
> OM-Infra at ml.openmandriva.org
> http://ml.openmandriva.org/mailman/listinfo/om-infra_ml.openmandriva.org
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 01_err_by_months.png
Type: image/png
Size: 19749 bytes
Desc: not available
URL: <http://ml.openmandriva.org/mailman/private/om-infra_ml.openmandriva.org/attachments/20161212/58754e60/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 02_err_by_days.png
Type: image/png
Size: 119751 bytes
Desc: not available
URL: <http://ml.openmandriva.org/mailman/private/om-infra_ml.openmandriva.org/attachments/20161212/58754e60/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 03_err_hourly_average.jpg
Type: image/jpeg
Size: 23405 bytes
Desc: not available
URL: <http://ml.openmandriva.org/mailman/private/om-infra_ml.openmandriva.org/attachments/20161212/58754e60/attachment-0001.jpg>


More information about the OM-Infra mailing list