1. When making a thread, please tag your thread accordingly using the menu to the left of the textfield where you name your thread where applicable. Server Advertisements and Mod Releases should be contained to their respective subforums.

Server Help [CRASH BUG] Unreleased Socket Files / Too Many Open Files

Discussion in 'Multiplayer' started by Seriallos, Dec 17, 2013.

  1. Thanks man for the insight ,appreciated
     
  2. //Edit; Since I have changed my local scripts to connect in UDP instead of TCP I don't get any more logging so let's see if its more stable like that. I suspect it crashed on a 127.0.0.1 TCP connexion.


    What I don't get is that with the newly patched lib it is a lot more stable I loaded on a test server 10 000 fast new connexions and it did not crashed on a test server

    And today my public server randomly crashed with the log below

    Don't you get these crashes you too ?

    Code:
    Info: accept from 37.187.79.210:37224 (18)
    Info: Connection received from: 37.187.79.210:37224
    Info: UniverseServer: client connection made from 37.187.79.210:37224
    Info: closing 37.187.79.210:37224 (18)
    Warn: UniverseServer: client connection aborted
    Info: Reaping client <673> (37.187.79.210:37224) connection
    Info: accept from 127.0.0.1:48871 (7)
    Info: Connection received from: 127.0.0.1:48871
    Info: accept from 127.0.0.1:48872 (15)
    Info: closing 127.0.0.1:48871 (7)
    Info: Connection received from: 127.0.0.1:48872
    Info: UniverseServer: client connection made from 127.0.0.1:48871
    Warn: UniverseServer: client connection aborted
    Info: Reaping client <674> (127.0.0.1:48871) connection
    Info: closing 127.0.0.1:48872 (15)
    Info: UniverseServer: client connection made from 127.0.0.1:48872
    Warn: UniverseServer: client connection aborted
    Info: Reaping client <675> (127.0.0.1:48872) connection
    Info: closing 127.0.0.1:48872 (15)
    Error: WorldServerThread exception caught: IOException: Seek error: Bad file descriptor
    ./starbound_server(_ZN4Star13StarExceptionC2ERKNS_6StringE+0x105) [0xaad725]
    ./starbound_server() [0xa1d311]
    ./starbound_server(_ZN4Star4File5fseekEP8_IO_FILElNS_8IODevice8SeekModeE+0x87) [0xab0227]
    ./starbound_server(_ZN4Star9BlockFile9readBlockEmmPcm+0x5b) [0xa0fe2b]
    ./starbound_server(_ZN4Star13BTreeDatabaseINS_9ByteArrayES1_E16startReadingLeafEm+0x48) [0x6364f8]
    ./starbound_server(_ZN4Star13BTreeDatabaseINS_9ByteArrayES1_E8loadLeafERKm+0x95) [0x63a245]
    ./starbound_server() [0xa7b8e9]
    ./starbound_server() [0xa7bafb]
    ./starbound_server() [0xa7ca15]
    ./starbound_server(_ZN4Star14SimpleDatabase6insertERKNS_9ByteArrayES3_+0x5d) [0xa7526d]
    ./starbound_server(_ZN4Star12WorldStorage12unloadSectorERKNS_6VectorImLm2EEEb+0x3d2) [0x5c7ba2]
    ./starbound_server(_ZN4Star12WorldStorage6updateEv+0x20e) [0x5c93be]
    ./starbound_server(_ZN4Star11WorldServer6updateEv+0x1452) [0x59a822]
    ./starbound_server() [0x5c4e15]
    ./starbound_server() [0x5c5620]
    ./starbound_server() [0xab3b61]
    /lib64/libpthread.so.0(+0x79d1) [0x7f86d59549d1]
    /lib64/libc.so.6(clone+0x6d) [0x7f86d4cfab6d]
    
    Error: World thread has died, removing world alpha:-8373592:-14046803:-12839514:11:9
     
    Last edited: Dec 25, 2013
  3. furrycat

    furrycat Aquatic Astronaut

    If you can reproduce the problem, hacking seek or running strace may provide a clue as to what the server is trying to do.
     
  4. Good idea, as far I have checked yet I'm doing a on a empty server strace -f with a script that is set to connect infinitely and so far I haven't caught a single seek function, I only see the common socket functions below

    But that's after connecting I see lseek operations, so I will probably have to play on it to catch the issue

    Code:
    mmap(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7febbb078000
    mprotect(0x7febbb078000, 4096, PROT_NONE) = 0
    clone(child_stack=0x7febbba77fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7febbba789d0, tls=0x7febbba78700, child_tidptr=0x7febbba789d0) = 9047
    open("/proc/self/task/0/comm", O_RDWR)  = -1 ENOENT (No such file or directory)
    futex(0x7acb514, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7acb510, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
    futex(0x77cee60, FUTEX_WAKE_PRIVATE, 1) = 0
    accept(8, {sa_family=AF_INET, sin_port=htons(19752), sin_addr=inet_addr("192.168.1.101")}, [16]) = 7
    futex(0x2ca0d20, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
    write(1, "Info: accept from 192.168.1.101:"..., 42) = 42
    write(3, "Info: accept from 192.168.1.101:"..., 42) = 42
    futex(0x2ca0d20, FUTEX_WAKE_PRIVATE, 1) = 1
    setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
    setsockopt(7, SOL_SOCKET, SO_RCVTIMEO, "<\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
    setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, "<\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
    futex(0x2ca0d20, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
    write(1, "Info: Connection received from: "..., 52) = 52
    write(3, "Info: Connection received from: "..., 52) = 52
    futex(0x2ca0d20, FUTEX_WAKE_PRIVATE, 1) = 1
    futex(0x77cee60, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
    clone(child_stack=0x7febb981ffd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7febb98209d0, tls=0x7febb9820700, child_tidptr=0x7febb98209d0) = 9048
    open("/proc/self/task/9048/comm", O_RDWR) = 10
    write(10, "NetSocket::read", 15)        = 15
    close(10)                               = 0
    clone(child_stack=0x7febbba77fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7febbba789d0, tls=0x7febbba78700, child_tidptr=0x7febbba789d0) = 9049
    open("/proc/self/task/0/comm", O_RDWR)  = -1 ENOENT (No such file or directory)
    futex(0x7acb514, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7acb510, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
    futex(0x77cee60, FUTEX_WAKE_PRIVATE, 1) = 0
     
  5. furrycat

    furrycat Aquatic Astronaut

    You're right of course that the library function is lseek not seek. My mistake for posting in a hurry. I had some time to look again at the issue today.

    I ran your crash dump through c++filt to demangle the class names. It looks like this:
    Code:
    ./starbound_server(Star::StarException::StarException(Star::String const&)+0x105) [0xaad725]
    ./starbound_server() [0xa1d311]
    ./starbound_server(Star::File::fseek(_IO_FILE*, long, Star::IODevice::SeekMode)+0x87) [0xab0227]
    ./starbound_server(Star::BlockFile::readBlock(unsigned long, unsigned long, char*, unsigned long)+0x5b) [0xa0fe2b]
    ./starbound_server(Star::BTreeDatabase<Star::ByteArray, Star::ByteArray>::startReadingLeaf(unsigned long)+0x48) [0x6364f8]
    ./starbound_server(Star::BTreeDatabase<Star::ByteArray, Star::ByteArray>::loadLeaf(unsigned long const&)+0x95) [0x63a245]
    ./starbound_server() [0xa7b8e9]
    ./starbound_server() [0xa7bafb]
    ./starbound_server() [0xa7ca15]
    ./starbound_server(Star::SimpleDatabase::insert(Star::ByteArray const&, Star::ByteArray const&)+0x5d) [0xa7526d]
    ./starbound_server(Star::WorldStorage::unloadSector(Star::Vector<unsigned long, 2ul> const&, bool)+0x3d2) [0x5c7ba2]
    ./starbound_server(Star::WorldStorage::update()+0x20e) [0x5c93be]
    ./starbound_server(Star::WorldServer::update()+0x1452) [0x59a822]
    That would suggest that it was trying to read a world file and isn't related to the network or this hack at all.
     
    class101 likes this.
  6. wolvern

    wolvern Orbital Explorer

    Code:
    admin@starbound1:/usr/local/starbound/linux64$ sudo gcc -fPIC -shared -ldl -o starbound.so starbound.c
    starbound.c: In function ‘setsockopt’:
    starbound.c:34:5: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 5 has type ‘const void *’ [-Wformat=]
         printf("Warn: Ignoring error: setsockopt(%d, %d, %d, %x, %d): %s\n", sockfd, level, optname, optval, optlen, strerror(errno));
         ^
    
    i kinda see this when using the fix furrycat did.... i'm using the first one currently which didn't issue any errors
     
  7. Just a warning you can ignore
     
  8. geokhentix

    geokhentix Scruffy Nerf-Herder

    This is an excellent thread. I've applied the fix and just have one question. Instead of creating an alternate script, could one just modify the oriignal launch_starbound_server.sh and add the LD_PRELOAD=$PWD/starbound.so? I did this, and the server starts fine, but echo $LD_PRELOAD doesn't return anything, so I'm curious if it's even working.
     
  9. sure you can but is always cleaner to use your script so you can still start it the normal way to check if it is patched, nor steam won't attempt to revert your change in a custom file.

    If you still want to check it works that quite simple you start the server with strace -ff -o strace.txt at the beginning of your command line and notice with the vanilla server a call to socket() is made after each connection, with the patched version this call is made only once for all connections
     
  10. bartwe

    bartwe Code Cowboy

    Hi, thanks for explaining this bug :)

    I fixed it in starbound, should be live with the next patch.

    Thanks to Crashdoom for pointing me to this thread.
     
    Seriallos and class101 like this.
  11. Seriallos

    Seriallos Space Penguin Leader

    :up: This will be great for server monitoring without needing to be a proxy!
     
  12. @bartwe

    By the way if you can fix this too thank you, not causing crash but huge amount of log entries

    Code:
    Error: Exception while invoking lua method 'main'. LuaException: [string "/scripts/vec2.lua"]:32: attempt to index local 'vector' (a nil value)
    ./starbound_server(_ZN4Star13StarExceptionC2ERKNS_6StringE+0x105) [0xaad725]
    ./starbound_server() [0xa40e6c]
    ./starbound_server() [0xa43318]
    ./starbound_server() [0xa434ce]
    ./starbound_server() [0xa43550]
    ./starbound_server(_ZN4Star3Npc10tickMasterEv+0xa61) [0x7ad481]
    ./starbound_server(_ZN4Star11WorldServer6updateEv+0x395) [0x599765]
    ./starbound_server() [0x5c4e15]
    ./starbound_server() [0x5c5620]
    ./starbound_server() [0xab3b61]
    /lib64/libpthread.so.0(+0x79d1) [0x7fd789f579d1]
    /lib64/libc.so.6(clone+0x6d) [0x7fd7892fdb6d]
    
     

Share This Page