Aug 14, 2020

DnD Map Editor - Compression

Part 3 of infinity.

Introduction

A while back I was making a map in my DnD Map Editor and I ran into an issue - at a certain point, no updates were passing between clients. As previously noted, I use SignalR to pass the map between clients, and I don’t do anything fancy - each update is literally the full map, serialized into a JSON format that I’ve been tweaking over time.

Now, apparently the default max packet size in SignalR is 32k. And I hit it. So the first thing I did was increase that a bunch - honestly I don’t really care if these messages get kinda big - but on the other hand, if you look at the past screenshots of the maps in the editor, you might well ask, “Why are these taking up so much space? They don’t have very much information in them.” And indeed, you raise a valid point, imaginary person.

Let’s take a look at a snippet of the map JSON to get a better idea:

{
   "width":20,
   "height":20,
   "verticalWallsMap":[
      {
         "x":5,
         "y":-1,
         "z":1,
         "wallNumber":1
      },
      {
         "x":5,
         "y":-2,
         "z":1,
         "wallNumber":1
      },
      {
         "x":3,
         "y":-1,
         "z":1,
         "wallNumber":1
      },
      // ...
   ]
}

So, yeah. Each single 1-space line of a wall is represented by a minimum of 34 characters. If I have, say, a 9x9 box, that’s 40 segments, so we’re already over 1kb. (I mean, that’s a pretty boring map. Probably there’s some other stuff inside the box.)

Much the same thing applies to the “painted areas” I use to indicate water, hazardous terrain, etc. - at the moment each single square is represented much like the above. (Although, with no “wallNumber”, unsurprisingly - instead they’re grouped under the colors. Probably I should do that to the walls so I can save that part as well.)

Anyway, this seems like an obvious case where I could alter how I store this data to make it take up less space, i.e., a compression algorithm.

One quick note - I wrote most of this post in the middle of working on this, so if I pause and reverse course on some stuff, that’s just how it went in practice.

Designing a (Very Simple) Compression Algorithm

My first thought is that pretty much everything in these maps is already super-conveniently arranged in lines and rectangles, so probably the easiest thing to do is to express locations for walls/painted areas/fog as ranges rather than individual numbers and thus collapse it.

Probably the most straightforward and trivial part of this is the walls, which really don’t need ranges in more than one dimension - they’re largely going to be either vertical walls with a y-index range or horizontal walls with an x-index range. Even if technically they might occasionally span other dimensions (maybe a bit commonly z for multi-story buildings), most of the advantage is turning long lines into ranges.

Now, on the flip side, this does already cause some challenges (especially if I don’t want to break any of my existing maps, which I don’t). The first of these is that I have an interface for each of my data types that gets saved, and then a class that implements that interface and can be populated from it. [Consider including code.] However, I don’t really have a burning desire to have the in-memory form of wall locations include ranges. I just want the saved version to. So this unfortunately breaks some convenient symmetry and forces me to have some additional conversion steps, but I don’t think that can be helped. Here’s some quick code for uncompressing this:

export function ToRange(text: string|number): ReadonlyArray<number> {
    let results: Array<number> = [];

    // I feel like this use of "toString" is silly, but I'm 
    // currently too lazy to determine the idiomatic solution.
    if (text.toString().indexOf(':') > 0) {
        let parts = text.toString().split(':');
        let start = parseInt(parts[0]);
        let end = parseInt(parts[1]);

        for (let i = start; i <= end; i++) {
            results.push(i);
        }
    } else {
        results = [parseInt(text.toString())];
    }

    return results;
}

export function ToLocations(locationData: LocationData): ReadonlyArray<Location> {
    let xRange = ToRange(locationData.x);
    let yRange = ToRange(locationData.y);
    let zRange = ToRange(locationData.z);

    let results: Array<Location> = [];

    for (let z of zRange) {
        for (let y of yRange) {
            for (let x of xRange) {
                results.push(new Location(x, y, z));
            }
        }
    }

    return results;
}

I feel like there’s probably some iterator mechanism that lets me produce a range of numbers and locations better than this. On the flip side this took me about 2 minutes to write.

…

Hm. I might have lied earlier. At this point I really just need to hook this together with my existing logic by making a method that converts a bunch of locations to a bunch of these LocationData items. It might not be worth trying to do this separately for walls and areas.

As a side note, I always find it really gratifying when I’m performing a refactoring and I get to the part where I make an incompatible change to some type definition, and I know as soon as I make all of the type checks pass, my refactoring is complete. I wonder if that’s a universal joy amongst programmers. (At least, the ones who aren’t mostly working in Python, Javascript, etc.)

…

Okay, I realized I was mistaken earlier and/or right the first time - the save format for walls includes what is effectively the Location type - (x, y, z) coordinates - and a fourth value, the “wallNumber” (which signifies e.g., “a wall” vs. “a door”), so I do in fact need to write slightly separate logic - maybe - for walls vs. other types of purely area things. This contrasts a bit with, say, painted areas, where I made the collection of painted areas each have one sort of ‘key’ in the form of the color and a ‘value’ in the form of a list of areas. As I mentioned above, in hindsight it would have been better if I’d done the same thing with walls, making the wall type number the ‘key’ and the list of wall locations the ‘value’. Probably at some point I should refactor this. For now I will just try to make the WallIndex and WallData types be based on the Location/LocationData types in case that helps.

First attempt to convert to WallData:

private ToWallData(wallMap: ReadonlyMap<WallIndex, number>, type: WallType): ReadonlyArray<WallData> {
        // Map of z index -> [primary index] -> [secondary index]
        // where the primary index is the one where things with matching values are in a line
        // (x for vertical, y for horizontal)
        // and the secondary index is the one used to determine if things in that line are adjacent
        // (y for vertical, x for horizontal)
        // The innermost value is the wall number (which indicates rendering, e.g. door vs. wall vs. dotted line).
        let wallsMap = new Map<number, Map<number, Map<number, number>>>();

        let results = new Array<WallData>();

        let getPrimaryIndex: (ind: WallIndex) => number;
        let getSecondaryIndex: (ind: WallIndex) => number;
        let newWallIndex: (primary: number, secondary: number, z: number) => WallIndex;
        let newWallData: (primary: number, min: number, max: number, z: number, wallNumber: number) => WallData;

        if (type == WallType.Horizontal) {
            getPrimaryIndex = ind => ind.y;
            getSecondaryIndex = ind => ind.x;
            newWallIndex = (primary, secondary, z) => new WallIndex(secondary, primary, z, type);
            newWallData = (primary, min, max, z, wallNumber) => {
                return {
                    x: (min == max) ? min.toString() : `${min}-${max}`,
                    y: primary.toString(),
                    z: z.toString(),
                    wallNumber: wallNumber
                };
            };
        } else {
            getPrimaryIndex = ind => ind.x;
            getSecondaryIndex = ind => ind.y;
            newWallIndex = (primary, secondary, z) => new WallIndex(primary, secondary, z, type);
            newWallData = (primary, min, max, z, wallNumber) => {
                return {
                    x: primary.toString(),
                    y: (min == max) ? min.toString() : `${min}-${max}`,
                    z: z.toString(),
                    wallNumber: wallNumber
                };
            };
        }

        // This will hold the walls we've added to the output so far
        let handledSet = new Set<string>();

        for (let wallIndex of wallMap.keys()) {
            let primary = getPrimaryIndex(wallIndex);
            let secondary = getSecondaryIndex(wallIndex);

            if (!wallsMap.has(wallIndex.z)) {
                wallsMap.set(wallIndex.z, new Map<number, Map<number, number>>());
            }

            if (!wallsMap.get(wallIndex.z)!.has(getPrimaryIndex(wallIndex))) {
                wallsMap.get(wallIndex.z)!.set(primary, new Map<number, number>());
            }

            wallsMap.get(wallIndex.z)!.get(primary)!.set(secondary, wallMap.get(wallIndex)!);
        }

        for (let wallIndex of wallMap.keys()) {
            if (handledSet.has(wallIndex.toString())) {
                continue;
            }

            handledSet.add(wallIndex.toString());

            let primary = getPrimaryIndex(wallIndex);
            let secondary = getSecondaryIndex(wallIndex);
            let wallNumber = wallsMap.get(wallIndex.z)!.get(primary)!.get(secondary)!;

            let min = secondary;
            let max = secondary;

            // I'm reasonably sure I don't need to check for duplicates here.
            while (wallsMap.get(wallIndex.z)!.get(primary)!.get(max + 1) == wallNumber) {
                max += 1;
                handledSet.add(newWallIndex(primary, max, wallIndex.z).toString());
            }

            while (wallsMap.get(wallIndex.z)!.get(primary)!.get(min - 1) == wallNumber) {
                min -= 1;
                handledSet.add(newWallIndex(primary, min, wallIndex.z).toString());
            }

            results.push(newWallData(primary, min, max, wallIndex.z, wallNumber));
        }

        return results;
    }

Hm. So, to my amusement, this ended up making things worse on the first map I tested - it went from 17.5k to 23.1k. On closer examination, though, that’s probably because all of the numbers are saved as strings - including quotes around them - whether or not they represent ranges.

(Also, I only wrote compression for the walls, but the other location-based stuff has also been changed to strings, so the completely-uncompressed painted areas are only worse now. It’s pretty trivial to verify that this is the problem - removing the “compression” part for walls brings it up to 28.3k, so I am saving quite a bit, I just have raised my baseline as well.)

I’m not immediately sure what will happen if I just try to change the saved data to be of type “number | string” (I really love Union types; this is one of the rare times where I’m working in a non-C# language and actually happier because of it). If I’m lucky, it will just work and I’ll get the benefits of compression with no drawback. I do need to modify the code slightly so it still stores numbers where appropriate, e.g.:

newWallData = (primary, min, max, z, wallNumber) => {
                return {
                    x: (min == max) ? min : `${min}-${max}`,
                    y: primary,
                    z: z,
                    wallNumber: wallNumber
                };
            };

First thing’s first - that did work. Now it saves single locations as numbers and in-a-row lines compressed into strings. Second, it’s back down to 18.5k. Third, I just realized I’m basing this off of an older save file that uses a completely different method for storing walls - it’s storing them as, basically, a block of text with one character per wall, which manages to be a little more efficient on space at the expense of 1. my sanity over time and 2. being able to store values at “negative” x/y coordinates (which I did so I can “scroll around” a large map without worrying about it having a hard edge somewhere).

Comparing to a more recent (and larger) map, it goes from 120k to 108k, which is not bad. (The fact that I have a 120k or even a 108k map is perhaps itself horrifying. Hopefully solving for the other location-based save data will drop that down a lot. This particular map actually doesn’t even really have that much in terms of walls.)

Come to think of it, I do have a wall-heavy map in progress, and this did make it go from 61.4k down to 28.7k, so that’s great! Less great was that when I reloaded it some of the walls were missing, so I will now pause for some debugging.

…

Okay, I made a very obvious mistake (in hindsight, anyway). I coded the “ranges” so that they would be, like:

"x": "1-9"

Which is great - you just split along the dash and take the two numbers. Except, as I mentioned above, I specifically want to allow negative numbers, too, so, uh:

"x":"-1-9"

Oops. Technically I could parse for this, but maybe I should just use colons for ranges, instead.

So, having done that, now I just need to see what I can do with the areas. This gets a little more interesting - walls, we just compress into lines, and I’m pretty sure my approach above is O(n). Areas are probably better compressed in two dimensions - greater savings, and a lot of the time they’re actually literally rectangles. Sometimes they aren’t, though, so I need to come up with an algorithm that breaks them down into rectangles (preferably, as few rectangles as possible). There’s probably some literature on this already, but since this is more of a learning project than a professional one, I’ll just see what I can come up with.

It seems like one really basic approach would be to just, when given an unaccounted-for location, see if it can be expanded in either the x- or y-dimension to cover more matching, unaccounted-for spaces. (I would argue this is perhaps the most basic way to solve this problem.) I can see this possibly giving some weird or sub-optimal results if you have oddly-shaped spaces, but, well, I mostly don’t.

One possible twist on this might also be whether we should stop if we hit an already-accounted-for space - you can easily imagine, for example, a couple of overlapping squares that would be best treated as “two squares” rather than “one square, a line, and a smaller square”. So perhaps I should make it expand the range in either direction if it is valid to do so, it doesn’t contradict anything, and it adds some number of squares to what we have now accounted for, even if there’s overlap.

I feel like I’m getting ahead of myself, though. Let’s just do the most basic possible version of this and see where it goes.

export function ToLocationData(locations: ReadonlyArray<Location>): ReadonlyArray<LocationData> {
    let unhandledSpaces = new Set<string>(locations.map(l => l.toString()));
    let results = new Array<LocationData>();

    for (let location of locations) {
        if (!unhandledSpaces.has(location.toString())) {
            continue;
        }

        let z = location.z;
        let minX = location.x;
        let maxX = location.x;
        let minY = location.y;
        let maxY = location.y;

        let checkRight = true;
        let checkLeft = true;
        let checkUp = true;
        let checkDown = true;

        while (checkRight || checkLeft || checkUp || checkDown) {
            if (checkRight) {
                for (let y = minY; y <= maxY; y++) {
                    let rightLocation = new Location(maxX + 1, y, z);

                    if (!unhandledSpaces.has(rightLocation.toString())) {
                        checkRight = false;
                        break;
                    }
                }

                if (checkRight) {
                    maxX += 1;
                }
            }

            if (checkLeft) {
                for (let y = minY; y <= maxY; y++) {
                    let leftLocation = new Location(minX - 1, y, z);

                    if (!unhandledSpaces.has(leftLocation.toString())) {
                        checkLeft = false;
                        break;
                    }
                }

                if (checkLeft) {
                    minX -= 1;
                }
            }

            if (checkUp) {
                for (let x = minX; x <= maxX; x++) {
                    let upLocation = new Location(x, minY - 1, z);

                    if (!unhandledSpaces.has(upLocation.toString())) {
                        checkUp = false;
                        break;
                    }
                }

                if (checkUp) {
                    minY -= 1;
                }
            }

            if (checkDown) {
                for (let x = minX; x <= maxX; x++) {
                    let downLocation = new Location(x, maxY + 1, z);

                    if (!unhandledSpaces.has(downLocation.toString())) {
                        checkDown = false;
                        break;
                    }
                }

                if (checkDown) {
                    maxY += 1;
                }
            }
        }

        for (let x = minX; x <= maxX; x++) {
            for (let y = minY; y <= maxY; y++) {
                unhandledSpaces.delete(new Location(x, y, z).toString());
            }
        }

        results.push(
            {
                x: minX == maxX ? minX : `${minX}:${maxX}`,
                y: minY == maxY ? minY : `${minY}:${maxY}`,
                z: z
            });
    }

    return results;
}

This is pretty clunky - mostly in terms of its repetition, but also that it basically just tries each direction in order with no regard for whether one might be better than another. It does seem to work, though, and it doesn’t seem like it has terrible performance characteristics.

(I think it might still be O(n) but I’m not sure. I’m using a hashset to track handled spaces, and I feel like the worst-case scenario is that we do a wasteful check around the perimeter of each of the final rectangles, which at least feels like it can’t be more than like 4 * sqrt(m) for a given rectangle of m squares, which is sub-linear. If anybody can think of a pathological case that would perform much worse, let me know.)

Also, it took the was-120k-then-108k map down to 14.7k! That’s much better. (Perhaps unsurprisingly, it did very little to my “mostly walls” map, but I guess now I can load that one up with some colored areas or whatever.)

Judging from the numbers on all of this, I don’t get to safely return to a 32k limit on my SignalR messages (not that I super cared), but at the very least it’s helping. Also given that I use a pretty aggressive backup system of “save the whole map to file every single time it changes”, at least I won’t be writing quite so much to my temporary folder now.

Remaining Issues

As noted above, I could probably avoid repeating the “wallNumber” entries if instead I grouped my walls in the save file by “wallNumber”, which is already how I handle the color-painted areas. Given that that’s a minimum of 14 characters in a ~34-character entry, that’s a pretty big savings (~41%) on wall entries.

I think there are quite a few other things I could do to save space as well. For example, a “map” also contains a set of “entities” in it, but that means I often save the list of players (along with their HP, any current initiative rolls, etc.), which is actually counterproductive if they go between two maps in a session - really I want to save everything other than the players to file. At least, sort of. If the editor crashes I want to recover the whole state including the player entities. It’s complicated. Maybe at some point I should be splitting the players into one save file and the maps into another?

As noted previously I could also easily list dozens of other features I want to add to this, time permitting.