Skrevet av Emne: A method for tile based path caching in Screeps  (Lest 21639 ganger)

Utlogget Floyd-ATC

  • Livstidsdiktator
  • Administrator
  • Guru
  • *****
  • Innlegg: 542
  • Karma: +12/-0
    • MSN Messenger - floyd@atc.no
    • Vis profil
    • floyd.atc.no
    • E-post
A method for tile based path caching in Screeps
« på: 12. ſeptember 2016, 19:51 pm »
  • [applaud]0
  • [smite]0

  • Because path finding is one of the most CPU intensive ("expensive") operations in Screeps, the need to re-use paths is a well known topic. Generally speaking, the following factors must be considered when choosing a cache strategy:

    - CPU cost of learning new paths (insert)
    - CPU cost of changing existing paths (update)
    - CPU cost of looking up paths (select)
    - CPU cost of pruning outdated entries (delete)
    - Storage space

    For any caching scheme to worthwhile, we must assume that each path in the cache should ideally be inserted once and selected as many times as possible. Because changes are inevitable and there is no absolute way to determine how often an individual path will be selected, the scheme must allow update and delete operations to be carried out with reasonable performance.

    Most scripting languages offer built-in data types that fulfill all of these requirements but in Screeps there is another factor which severely limits the options: Storage performance.

    Because all information in Screeps Memory must be serialized before storage and deserialized before use, the total number of objects in Memory soon becomes an important factor in addition to the already tight restrictions on storage space. Add to this the fact that data structures like arrays and hash tables introduce storage overhead when serialized, and it soon becomes obvious that storing fewer and bigger strings is favorable over smaller and more numerous variables.


    What's in a path

    An important realization is that an individual path from Pos1 to Pos2 contains much more information than just how to get from Pos1 to Pos2. Given a cluster 10 positions around Pos1 in one room and a second cluster of 10 positions around Pos2 in another room, it is fair to assume that the majority of the steps in each of the 10*10=100 possible paths between the two clusters will actually be the same. This is especially true if there is a road between the two clusters. Storing multiple instances of the same information is not only a waste of storage space (and therefore a waste of CPU resources to serialize/deserialize) but also means more information has to be processed when performing insert/update/delete and select operations.


    Possible solutions

    This leads to the obvious solution that information should be stored per tile so that each tile contains information about how to reach other tiles from there. At this point one might think in terms of a simple nested hash:

       cache[from_tile][to_tile] = direction

    There is a number of problems with this approach. First, let's say we have 10 rooms, each room may have up to 2500 tiles (of which atleast 1% are in use). A very conservative estimate of 250 first level entries with 250 second level entries means we have 62500 objects in memory that need to be serialized and deserialized each tick. It soon becomes obvious that with all the information needed to uniquely identify each position this puts us well above the 2048K limit even before we put in any directions. One might therefore consider normalizing the information by separating information per room:
     
       cache[from_room][from_tile][to_room][to_tile] = direction

    This eliminates the need to spell out each room name 250+250 times but we still have over 62500 objects in memory and use a non-trivial amount of storage space just to store structural characters such as "{},:".


    Custom serialization

    In response to this, a common idea is to consider other ways to serialize and deserialize Memory. I do not suggest doing this, because of several reasons:

    - Performance is unlikely to be that much better unless you place severe limitations on what can be stored in memory.
    - The built-in Memory access scheme relies on the possibility to access Memory in a certain way. We must either accomodate this or reimplement everything, hopefully without hurting performance.
    - It is much more beneficial to address the fundamental problem and reduce the number of tiny objects we need to store. In a way, this means applying our own idea of serialization but targeting only a specific part of the data structure.


    Packing information as densely as possible

    One critical bit of information about the Memory object in Screeps is that it can handle UTF-16 characters. At first glance, this may seem irrelevant because room names and tile coordinates do not contain special characters, but coupled with Javascript's built-in methods for storing and retrieving UTF-16 character codes this gives us a rather convenient way to store and retrieve 16-bit values (albeit with a few important restrictions, google UTF BOM for details.)

    Why is this useful? Consider the fact that a single tile coordinate can be expressed using 6+6 bits (remember, 6 bits can store a single number from 0 to 63) and a direction can be expressed using 3 bits. We have to keep in mind that certain special character codes in UTF-16 are invalid but it turns out that we can actually use the 16th bit as long as we keep those 4 highest bits in the range between 0 and 8 inclusive.

    This puts us at 16 bits (2 bytes) per tile even with the direction, but we can do even better than that. If a tile happens to contain directions to multiple consecutive destinations we can choose to store them as a "span", similar to how Run Length Encoding works. Distinguishing between a span and an individual destination can be done using a "magic" direction of 0:

       code1 = x1 + (y1*50) | (1<<12) // 0x1000   Single destination with direction=1

       code1 = x1 + (y1*50) | 0       // 0x0000   Span begins at x1,y1
       code2 = x2 + (y2*50) | (3<<12) // 0x3000   Span ends at x2,y2 with direction=3

    Strings of these codes can be stored for each tile, effectively reducing the number of objects to serialize/deserialize by atleast a factor of 250. For rooms with a large amount of walkable tiles, this factor can be even higher.

    An alternative to the x+50y approach is to use bit shifting exclusively:

       code1 = x1 | (y1<<6) | (1<<12)

    I have not bothered with benchmarking these to see which one is faster. My guess is that the difference in terms of CPU time is negligible.


    Implementing access

    Even if the problem of storage space and serialization/deserialization performance has been solved, there is still the non-trivial matter of implementing efficient methods to insert/update/select and delete information. These are our findings:

    - The codes should always be sorted by tile address (x+50y). This eliminates the need to search the entire string to the end in both the case of a cache miss and a new insert, because the search can be aborted as soon as a higher address is encountered. The search is always carried out on the packed data structure to search the shortest string possible.

    - Only when an insert/update is needed, the packed format (with spans) should be unpacked (without spans) so individual addresses can be inserted, updated or appended as needed. The unpacked data structure must be sparse, otherwise the compression operation will require a constant loop of 2500 iterations even for a string with no actual information in it.

    - Only when all modifications have been made, the unpacked structure should be packed (with spans). This is the most expensive algorithm of the three in terms of CPU time because the entire unpacked string must be processed. It is therefore beneficial to put as much information into the string at once rather than making small, incremental changes.


    Pseudocode for learning

    Kode: [Velg]
    for each tile T1 in path including start(1)
      stop if T1 is in a different room from start(2)
      get next direction as D
      fetch routing table for tile T1
      for each tile T2 in path including target(3)
        update table T1; move in direction D to reach T2
      store routing table for tile T1

    1) The path will typically start with the first move, not the first tile. We need to put information into the first tile because that's where creeps will (hopefully) start later too.
    2) The pathfinder may provide suboptimal results for rooms where we do not have vision. Instead of trying to deal with this in each case, simply learn the path leading to the next room and recalculate the path once we get there and see the roads etc. This also serves as a safeguard against the script spending ages on learning just one very long path.
    3) The creep will probably want a path leading to the target later on, not the tiles in between, so don't forget this crucial little detail. The information about the tiles in between are just a nice bonus. You may want to try reducing the cache size and learning time by only learning the ultimate destination but in my experience this reduces the overall efficiency by causing more path searches. (See "What's in a path".)


    Actual results

    My current implementation runs at GCL4 with path information for 11 rooms and 40+ creeps. My total Memory is around 200 Kbytes total but I can pretty much decide how much memory to use by adjusting the cache age. My current setting is 1800 ticks, which seems to work pretty well. According to screeps-profiler, the average time to move creeps is 0.370ms, this includes pathfinding, learning, routing and the occasional fallback to moveTo/ignoreCreeps:false for collision avoidance. I use the new Pathfinder and typically see between 0 and 3 path searches per tick. Take info account that I am relatively inexperienced with both Javascript and Screeps so your results may differ. Anyway, this is what my current data structure looks like:

    Kode: [Velg]
    Memory
      rooms
        r Container object for routing information
          E##N## One hash per room
            xxyy Tile coordinates formatted as 4 digits, one hash per tile
              mru Game.time of last access
              local String of UTF16 characters for this room
              E##N## String of UTF16 characters for destination room
              E##N## String of UTF16 characters for destination room


    Possible improvements

    - This is just one possible storage structure, but I strongly recommended to store one string per combination of rooms, together with cache control data such as a MRU timestamp. This makes it trivial to discard data for any single room or tile when needed.

    - Cache invalidation such as handling blocked paths resulting from new structures or using new roads is a huge topic and the full discussion is well beyond the scope of this document. One simple strategy is to introduce a small random chance to recalculate paths even if cached information exists. Another strategy is to drop random tiles from the cache to trigger recalculation.

    - The suggested concept of spans form horizontal "stripes" of adjacent tiles. One possible improvement might be to form rectangular areas, either by using a more complicated algorithm to begin with or by splitting and combining adjacent "stripes" in a second pass. It is unclear if this added complexity would provide a net savings in terms of CPU performance and/or storage space.

    - It might be worthwhile to store other information (such as path cost) together with the routing information in the form of trailing characters. Although the non-trivial increase in complexity and memory use would require serious consideration, this could be helpful if one wanted to write a modified pathfinder capable of taking cached information into account. 


    I'm sharing this in the hope that you will find it useful. If you discover a cool way to improve the efficiency or find a stupid fault in my logic, your in-game feedback would be greatly appreciated. Preferably in the form of a message, not an invasion.


    Cloulez aka. FloydATC


    -Floyd.

    --
    Det finnes 10 typer mennesker;
    de som forstår binærtall, de som ikke gjør det, og de som forstår Grey code.