[App_rpt-users] hub crash help

Wed Nov 15 23:54:22 UTC 2017

I have long complained about the uptime of App_Rpt. I've identified two
repeatable problems.

First one is if a node is connected more than about 650 hours or so there
is a buffer overflow. I notice this most prominently with my RTCMs that are
all on the same server connected to a hub on the same server. Allmon will
show an negative number of connected hours followed shortly by a crash.
Apparently this not just an RTCM problem, it just shows up there more
readily because the nodes on the same server and aren't disconnecting due
to network issues.

The second problem can be recreated in two ways. Set up a bunch of nodes to
repeatedly connect and disconnect to a hub with some simple bash scripting.
After a random time there will be a crash. The second way is to crank in
simulated packet loss with the IAX2 test command. It takes more time and
patients for it to crash under this test. I believe both these tests
demonstrate the same problem. There is something flaky in the area of IAX
links within App_Rpt or Asterisk

The Ham VoIP guys say they have really improved the uptime of their distro.
I think that's great but I find it terribly upsetting that the DIAL folks
don't get these fixes.

On Wed, Nov 15, 2017 at 8:36 AM, Peter <g7rpg at hotmail.com> wrote:

> Hi...
>
> I'm one the minions that help run M0HOY's Allstar hubs in the UK. Node
> 41522 & 41223, There is an ongoing problem where the hub will crash
> randomly when a someone comes on with a flaky internet connection, the
> crash is associated with segfault in app_rpt.so and and usually one of
> the following in the logs:
>
>
> Undecodable frame received from 'xxx.xxx.xxx.xxx'
> PBX may not have been terminated properly on
> 'IAX2/xxx.xxx.xxx.xxx:4572-5435'
> lots of.... Max retries exceeded to host xxx.xxx.xxx.xxx
>
>
> Sometimes the crash is a 'general protection ip' in app_rpt.so but
> usually nothing in the logs just before to give any clues.
>
> Average uptime is 7 - 14 days, a flaky incoming connection will kill it
> within minutes dropping all the connections.
>
> Both of the hubs are hosted in a DC on VPS running on Debian 7.4, 3.2.81
> X64, built with source from the github repo.
>
> There tends to be on average 60 - 80 ASL connections spread across the
> two hubs to share the load.
>
> Recently we've had a few users on the hub form the US and Australia
> repeatedly crashing the hubs sometimes more than once a day.
>
>
> Some of the ideas we've done/are looking at are:
>
> More hubs to spread the load and minimise the disruption
> A script to keep a check on the running process and restart asterisk
> QUICKLY
> A scripts to alert us of potential problems
> On some occasions have had to use iptables to block the host causing the
> problem were we've not been able to make contact.
>
>
> I wonder if anyone has any suggestions?
>
> TIA
>
> Peter
> G7RPG
>
>
>
> _______________________________________________
> App_rpt-users mailing list
> App_rpt-users at lists.allstarlink.org
> http://lists.allstarlink.org/cgi-bin/mailman/listinfo/app_rpt-users
>
> To unsubscribe from this list please visit http://lists.allstarlink.org/
> cgi-bin/mailman/listinfo/app_rpt-users and scroll down to the bottom of
> the page. Enter your email address and press the "Unsubscribe or edit
> options button"
> You do not need a password to unsubscribe, you can do it via email
> confirmation. If you have trouble unsubscribing, please send a message to
> the list detailing the problem.

-- 
--
Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.keekles.org/pipermail/app_rpt-users/attachments/20171115/f0d38e0a/attachment.html>