<div dir="ltr">I have long complained about the uptime of App_Rpt. I've identified two repeatable problems. <div><br></div><div>First one is if a node is connected more than about 650 hours or so there is a buffer overflow. I notice this most prominently with my RTCMs that are all on the same server connected to a hub on the same server. Allmon will show an negative number of connected hours followed shortly by a crash. Apparently this not just an RTCM problem, it just shows up there more readily because the nodes on the same server and aren't disconnecting due to network issues.</div><div><br></div><div>The second problem can be recreated in two ways. Set up a bunch of nodes to repeatedly connect and disconnect to a hub with some simple bash scripting. After a random time there will be a crash. The second way is to crank in simulated packet loss with the IAX2 test command. It takes more time and patients for it to crash under this test. I believe both these tests demonstrate the same problem. There is something flaky in the area of IAX links within App_Rpt or Asterisk</div><div><br></div><div>The Ham VoIP guys say they have really improved the uptime of their distro. I think that's great but I find it terribly upsetting that the DIAL folks don't get these fixes. </div><div><br></div><div><br></div><div><br></div><div> </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 15, 2017 at 8:36 AM, Peter <span dir="ltr"><<a href="mailto:g7rpg@hotmail.com" target="_blank">g7rpg@hotmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi...<br>
<br>
I'm one the minions that help run M0HOY's Allstar hubs in the UK. Node<br>
41522 & 41223, There is an ongoing problem where the hub will crash<br>
randomly when a someone comes on with a flaky internet connection, the<br>
crash is associated with segfault in app_rpt.so and and usually one of<br>
the following in the logs:<br>
<br>
<br>
Undecodable frame received from 'xxx.xxx.xxx.xxx'<br>
PBX may not have been terminated properly on<br>
'IAX2/xxx.xxx.xxx.xxx:4572-<wbr>5435'<br>
lots of.... Max retries exceeded to host xxx.xxx.xxx.xxx<br>
<br>
<br>
Sometimes the crash is a 'general protection ip' in app_rpt.so but<br>
usually nothing in the logs just before to give any clues.<br>
<br>
Average uptime is 7 - 14 days, a flaky incoming connection will kill it<br>
within minutes dropping all the connections.<br>
<br>
Both of the hubs are hosted in a DC on VPS running on Debian 7.4, 3.2.81<br>
X64, built with source from the github repo.<br>
<br>
There tends to be on average 60 - 80 ASL connections spread across the<br>
two hubs to share the load.<br>
<br>
Recently we've had a few users on the hub form the US and Australia<br>
repeatedly crashing the hubs sometimes more than once a day.<br>
<br>
<br>
Some of the ideas we've done/are looking at are:<br>
<br>
More hubs to spread the load and minimise the disruption<br>
A script to keep a check on the running process and restart asterisk QUICKLY<br>
A scripts to alert us of potential problems<br>
On some occasions have had to use iptables to block the host causing the<br>
problem were we've not been able to make contact.<br>
<br>
<br>
I wonder if anyone has any suggestions?<br>
<br>
TIA<br>
<br>
Peter<br>
G7RPG<br>
<br>
<br>
<br>
______________________________<wbr>_________________<br>
App_rpt-users mailing list<br>
<a href="mailto:App_rpt-users@lists.allstarlink.org">App_rpt-users@lists.<wbr>allstarlink.org</a><br>
<a href="http://lists.allstarlink.org/cgi-bin/mailman/listinfo/app_rpt-users" rel="noreferrer" target="_blank">http://lists.allstarlink.org/<wbr>cgi-bin/mailman/listinfo/app_<wbr>rpt-users</a><br>
<br>
To unsubscribe from this list please visit <a href="http://lists.allstarlink.org/cgi-bin/mailman/listinfo/app_rpt-users" rel="noreferrer" target="_blank">http://lists.allstarlink.org/<wbr>cgi-bin/mailman/listinfo/app_<wbr>rpt-users</a> and scroll down to the bottom of the page. Enter your email address and press the "Unsubscribe or edit options button"<br>
You do not need a password to unsubscribe, you can do it via email confirmation. If you have trouble unsubscribing, please send a message to the list detailing the problem. </blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>--<br></div><div>Tim</div></div></div></div></div>
</div>