1. When making a thread, please tag your thread accordingly using the menu to the left of the textfield where you name your thread where applicable. Server Advertisements and Mod Releases should be contained to their respective subforums.

Server Help [Linux] Kernel Panic caused by starbound_server?

Discussion in 'Multiplayer' started by Bacon, Feb 11, 2015.

  1. Hello there.

    So at times our linux machine has been rebooting from sudden (this after the Giraffe Update), today i was checking logs and i find this:
    Code:
    ERROR: apport (pid 9043) Tue Feb 10 13:32:58 2015: called for pid 17158, signal 11, core limit 0
    ERROR: apport (pid 9043) Tue Feb 10 13:32:58 2015: executable: /home/starbound/starbound/sb/linux64/starbound_server (command line "./starbound_server")
    ERROR: apport (pid 9043) Tue Feb 10 13:32:58 2015: executable does not belong to a package, ignoring
    The Signal 11 is Kernel Panic (Invalid Memory Reference) that might make linux to reboot to avoid hardware damage.
    I never before the update our machine did rebooted due kernel panic. This on apport.log file.

    Does anybody has knowledge to know what may be causing this, because i don't understand the "executable does not belong to a package", so its likely something i can fix.
     
  2. Nordan

    Nordan Big Damn Hero

    Seems it's logging the signal 11, but you're not getting an actual crash report since Starbound isn't part of a package, in the context of a package manager (yum, apt, etc). This is most likely a hardware of driver issue. As to why it it's only happening now, you might want to check your hardware logs to see if there's overheating or faulty memory. It happening since the update could be a coincidence.
     
  3. Yup @Nordan also wondering, one thing about linux servers, it logs everything. But on this case the syslogs, the kernel logs, it doesn't log anything, it will show the boot info only.
    This is the reboots we're suffering: http://pastebin.com/CCuPAkKG

    I'm reaching the point of lack of knowledge to troubleshoot something like this, well ubuntu server does have a repair option if anything with drivers / OS. Now if it comes to hardware, eh!
     
  4. Nordan

    Nordan Big Damn Hero

    I'm not sure if you're running Ubuntu with a GUI or only a command shell. There are packages you can find for both GUI and command line to troubleshoot hardware issues or test/benchmark your hardware. Most server grade equipment has these built into the motherboard as well on the BIOS or on a baseboard controller. Even most consumer level BIOS images have a memory tester included. Windows has a memory tester you can launch from the boot loader. I'm by no means a Linux expert so I don't know off the top of my head packages you can get to find your issue, but the answer likely won't be too hard to find on your favorite search engine.
     
  5. I have found one starbound_server message on the kernel log:
    Code:
    Jan 30 18:11:33 ubuntu kernel: [2365957.622441] traps: NetSocket::read[16930] general protection ip:ddac24 sp:7f83d87f7d88 error:0 in starbound_server[400000+e0f000]
    
    @Nordan yeah doing what i can, it's incredible complex and related to any minor hardware thing that just overloads me to understand.
     
  6. Well bad news.
    After one OS Re-install and 8 hours of host testing the machine hardware, the reboots still happen and there's nothing wrong on OS/Machine.

    This is leaving me with the option i don't want to, Starbound causing Kernel Panic :facepalm:
    Will test take down one of the servers and see the machine behavior.
     
  7. renojonathanr

    renojonathanr Scruffy Nerf-Herder

    My advice -- for now, avoid running LAN servers if you use Linux on the desktop.
     
  8. It's the actual linux server software, everything done via terminal. But i have did changes to the machine to see if the fact running to servers at same time can cause this to happen.
     
  9. Dunto

    Dunto Guest

    If you have bad RAM then you'll get all sorts of weirdness, including crashes. Run a memtest86+ (it shows up as an option in GRUB after you install it so you'll either need IPMI or KVMoIP access if it's a rented box so you can see the bootup process). If it passes, then your RAM is good and we can start looking at other things. (Good news is, if you have a bad RAM segment you can set your boot options to patch over it and the Linux kernel will simply skip over it during allocations and such, so you can still use the box until they replace the faulty hardware. Be sure to note down any failed sections if they pop up during the memtest.)

    Edit: Derp, didn't see that you ran tests for 8 hours. What tests did you run exactly? Also, do you still get reboots when running everything BUT Starbound?
    (I'm running a Linux server with no issues other than the occasional Starbound server process crashing, but that doesn't bring down the whole system of course.)
     
  10. @Dunto it has been done memtests, to the hard drive, also tested the own power supply to could cause reboots, on those all tests passed, they had disabled CPU C-States that could cause unexpected behavior on a SSD and till now, no reboot to notice, early to tell still. But i have this idea that 2 starbound servers on the machine (even with it being powerful enough) could cause problems so i switched the smaller one to linux32.
     
  11. Dunto

    Dunto Guest

    I assume you're using 2 different sbboot.config files pointing to different directories to prevent potential file locking problems and other issues?
     
  12. Yes 2 different directories, starbound_server renamed to sb_server. It always worked.
     
  13. Dunto

    Dunto Guest

    That's not quite what I meant. Are you using two separate complete sets of mods, universe, etc folders? Can you provide more details of your current setup (directory layout, config settings - both sbboot.config and starbound.config, etc)? What you said sounds like you're having both binaries access the same stuff, which may be an issue.
     
  14. Nordan

    Nordan Big Damn Hero

    If there were two instances trying to access the same files, one would error out when it couldn't get a lock on the universe directory. Not likely it'd go as far as cause a kernel level error and make the system reboot. I'm fairly certain this is a hardware issue based on the symptoms. Memory tests aren't 100% conclusive as testing scenarios don't simulate memory utilization in the same manner as real world applications. Have you checked your hardware temperatures? It's not uncommon for servers to perform a thermal shutdown when temperatures get out of control.
     
  15. Heya, after all the tests around the host had disable CPU C-States that said could conflict with SSD's. After this, the machine hasn't rebooted on the past 2 days. Both of the servers are separate installations, as well i run on from linux64 and other from linux32.

    Now i'm focusing into the own starbound stability, the timeout issues of the process running non-responsive can cause huge up-downtimes, wish someone had built a script around it.
     

Share This Page