Entity Manager is a highly generic tool that enables efficient put, get, and query operations across many entity relationships, indexes, and sharded partitions.

Entity Manager was designed to operate within the context of AWS DynamoDB, but should work equally well with any sufficiently similar NoSQL platform.

This page is under construction! The Typescript refactor is nearly complete, and I’m busy building the demo & syncing up this documentation. Please check back soon for updates and drop me a note with any questions or ideas!.

To accomplish this, Entity Manager needs to know:

  • Which data types are indexable on your data platform, and how to represent those types within a compound index.

  • Your entities, their properties and related types, and which properties will be generated by Entity Manager.

  • The structures of your generated properties and indexes.

  • Your partition sharding strategy for each entity.

This documentation takes a Typescript-first approach! All discussions & code examples will assume you are using Typescript, and we will call out Javascript-specific considerations where appropriate.

Generated Properties & Transcodes

As discussed in detail in Evolving a NoSQL Database Schema, Entity Manager indexes are supported by special generated properties.

Generated properties always have a string type. Within the Entity Manager config object, an entity generated property is specified by a simple array of its component property names. These components can be any non-generated property of the same entity, so long as that property is supported by a transcode.

Transcodes

A transcode is a pair of functions that convert a property value to and from a string type, such that the resulting strings are guaranteed to sort in the same order as the original values.

Transcodes and related mechanisms are actually defined in the @karmaniverous/entity-tools package, which is a dependency of both Entity Manager and the @karmaniverous/mock-db package used to test Entity Manager. For developer convenience these are re-exported from the @karmaniverous/entity-manager package.

For example, here is the definition of the timestamp transcode, one of the default transcodes provided by entity-tools:

import { isInt, isString } from 'radash';

import { type DefaultTranscodeMap, type Transcodes } from `@karmaniverous/entity-manager`;

const defaultTranscodes: Transcodes<DefaultTranscodeMap> = {
  ..., // other transcodes

  timestamp: {
    encode: (value) => {
      if (!isInt(value) || value < 0 || value > 9999999999999)
        throw new Error('invalid timestamp');

      return value.toString().padStart(13, '0');
    },
    decode: (value) => {
      if (!isString(value) || !/^[0-9]{13}$/.test(value))
        throw new Error('invalid encoded timestamp');

      return Number(value);
    },
  },
};

radash is a key Entity Manager dependency, which provides a set of type-safe utility functions for working with data.

The purpose of this transcode is to convert a Unix timestamp (which is always a 13-digit integer) into a 13-character numerical string and back. The transcode’s encode and decode functions contain some type validation to catch invalid values in either direction.

timestamp is a simple case: because a Unix timestamp is an unsigned integer and always has the same length, its string representation will always sort properly.

fix6 is another default transcode that presents a more complex case: it handles a signed, fixed-point number with 6 decimal places.

import { isNumber, isString } from 'radash';

import { type DefaultTranscodeMap, type Transcodes } from `@karmaniverous/entity-manager`;

const defaultTranscodes: Transcodes<DefaultTranscodeMap> = {
  ..., // other transcodes

  fix6: {
    encode: (value) => {
      if (
        !isNumber(value) ||
        value > Number.MAX_SAFE_INTEGER / 1000000 ||
        value < Number.MIN_SAFE_INTEGER / 1000000
      )
        throw new Error('invalid fix6');

      const [prefix, abs] = value < 0 ? ['n', -value] : ['p', value];

      return `${prefix}${abs.toFixed(6).padStart(17, '0')}`;
    },
    decode: (value) => {
      if (!isString(value) || !/^[np][0-9]{10}\.[0-9]{6}$/.test(value))
        throw new Error('invalid encoded fix6');

      return (value.startsWith('n') ? -1 : 1) * Number(value.slice(1));
    },
  },
  ..., // other transcodes
};

fix6 uses the following techniques to meet transcode requirements:

  • range checking (not required for timestamp because Unix timestamps are always positive and have a fixed length), and

  • sign handling (prefixes positive numbers with p and negative numbers with n to ensure proper alpha sort), and

  • zero-padding of small values (again to ensure proper alpha sort).

Entity Manager offers the following set of default transcodes:

Transcode Description
bigint20 BigInt value with a maximum of 20 digits
boolean boolean value
fix6 signed, fixed-point number with 6 decimal places
int signed integer
string string value
timestamp Unix timestamp

Click here to review these default transcode definitions.

Note that, like the defaultTranscodes object (which defines multiple transcodes), each example transcode object above has a type of Transcodes. This special type ensures that the defined object…

  • has the correct keys (the transcode names), and

  • defines a correctly-typed encode and decode function for each key.

This is guaranteed by the Transcodes type’s single type parameter, which is a TranscodeMap.

The TranscodeMap Type

A TranscodeMap is a simple Record type that defines:

  • the name of each transcode, and

  • the type each transcode encodes into or decodes from a string value.

For example, here is the definition of the DefaultTranscodeMap type, which drives the defaultTranscodes object:

import { type TranscodeMap } from `@karmaniverous/entity-manager`;

interface DefaultTranscodeMap extends TranscodeMap {
  bigint20: bigint;
  boolean: boolean;
  fix6: number;
  int: number;
  string: string;
  timestamp: number;
}

An object of type Transcodes<DefaultTranscodeMap> must…

  • have the same keys as DefaultTranscodeMap (these are the transcode names), and

  • provide an encode function that converts a value of the corresponding type to a string, and

  • provide a decode function that converts a string to a value of the corresponding type.

If any of these conditions are not met, TypeScript will throw a type error.

If you are only using Entity Manager’s default transcodes, you can skip the next section. But if you have reason to define your own transcodes, read on!

Custom Transcodes

Let’s say you are building a high-precision navigation application.

Latitide & longitide values require three digits to the left of the decimal point, so a 64-bit signed number leaves room for 13 digits of decimal precision. You will want to define a custom transcode for this data type. Let’s extend the existing transcode naming convention and call this transcode fix13.

The first step in defining fix13 will be to extend DefaultTranscodeMap:

import { type DefaultTranscodeMap } from `@karmaniverous/entity-manager`;

interface MyTranscodeMap extends DefaultTranscodeMap {
  fix13: number;
}

Next, you can define a new transcodes object that includes fix13, using the fix8 transcode definition as a template:

import { isNumber, isString } from 'radash';

import { defaultTranscodes, type Transcodes } from `@karmaniverous/entity-manager`;

const myTranscodes: Transcodes<MyTranscodeMap> = {
  ...defaultTranscodes, // reuse default transcodes

  fix13: {
    encode: (value) => {
      if (
        !isNumber(value) ||
        value > Number.MAX_SAFE_INTEGER / 10000000000000 ||
        value < Number.MIN_SAFE_INTEGER / 10000000000000
      )
        throw new Error('invalid fix13');

      const [prefix, abs] = value < 0 ? ['n', -value] : ['p', value];

      return `${prefix}${abs.toFixed(13).padStart(17, '0')}`;
    },
    decode: (value) => {
      if (!isString(value) || !/^[np][0-9]{3}\.[0-9]{13}$/.test(value))
        throw new Error('invalid encoded fix13');

      return (value.startsWith('n') ? -1 : 1) * Number(value.slice(1));
    },
  },
};

The Entity Type

The Entity type is a simple Record type that defines the properties of an entity. A type extending the Entity type should follow these conventions:

  • Each key is a property name. All Entity properties should be represented, including generated properties and those with complex types.

  • All generated properties should have a type of never.

For example, here are the definitions of the Email and User types discussed in Evolving a NoSQL Database Schema:

import { type Entity } from `@karmaniverous/entity-manager`;

interface Email extends Entity {
  created: number;
  email: string;
  userId: string;

  // generated properties
  userHashKey: never;
}

interface User extends Entity {
  beneficiaryId: string;
  created: number;
  firstName: string;
  firstNameCanonical: string;
  lastName: string;
  lastNameCanonical: string;
  phone?: string;
  updated: number;
  userId: string;

  // generated properties
  firstNameRangeKey: never;
  lastNameRangeKey: never;
  userBeneficiaryHashKey: never;
  userHashKey: never;
}

The EntityMap Type

The EntityMap type is a simple Record type that defines the entities in your data model and assigns their respective Entity types. An EntityMap type should follow these conventions:

  • Each key is the token by which an Entity will be referenced throughout your configuration. All entities should be represented.

  • Each property type is the corresponding Entity type.

For example, the following MyEntityMap type would support the User service table discussed in Evolving a NoSQL Database Schema, which includes the Email and User entities:

import { type EntityMap } from `@karmaniverous/entity-manager`;

interface MyEntityMap extends EntityMap {
  email: Email;
  user: User;
}

The Config Type

The EntityManager class constructor takes a single argument of the Config type.

Config is a highly complex type, which encapsulates numerous rules whose net effect is to prevent the developer from creating an invalid Entity Manager configuration.

This section will cover each element of the Config type in depth. First, though, here is an example Config object that…

  • implements the example summarized at the end of Evolving a NoSQL Database Schema, and

  • for clarity, expresses and identifies all default values (normally default values can be omitted).

import {
  defaultTranscodes,
  type Config,
  type DefaultTranscodeMap
} from `@karmaniverous/entity-manager`;

const config: Config<
  MyEntityMap,
  'hashKey',           // default value
  'rangeKey',          // default value
  DefaultTranscodeMap, // default value
> = {
  // Common hash & range key properties for all entities. Must
  // exactly match HashKey & RangeKey type params.
  hashKey: 'hashKey',            // default value
  rangeKey: 'rangeKey',          // default value

  // Delimiters for generated properties & shard keys. Generated
  // property elements should not contain these characters!
  generatedKeyDelimiter: '|',    // default value
  generatedValueDelimiter: '#',  // default value
  shardKeyDelimiter: '!',        // default value

  // Maximum number of shard queries executed in parallel.
  throttle: 10,                  // default value

  // Transcode functions for generated properties & page keys.
  transcodes: defaultTranscodes, // default value

  // Entity-specific configs. Keys must exactly match those of
  // MyEntityMap.
  entities: {
    // Email entity config.
    email: {
      // Source property for the Email entity's hash key.
      uniqueProperty: 'email',

      // Source property for timestamp used to calculate Email
      // shard key.
      timestampProperty: 'created',

      // Default shard bump schedule if not specified. All hash
      // keys will have a zero-length shard key and be effectively
      // unsharded.
      shardBumps: [
        { charBits: 1, chars: 0, timestamp: 0 },
      ],

      // Email entity generated properties. These keys must match
      // the ones with never types in the Email interface defined
      // above, and are marked with a ⚙️ in the table design.
      generated: {
        userHashKey: {
          // When true, if any element is undefined or null, the
          // generated property will be undefined. When false,
          // undefined or null elements will be rendered as an
          // empty string.
          atomic: true,

          // Elements of the generated property. These MUST be
          // ungenerated properties (i.e. not marked with never
          // in the Email interface) and MUST be included in the
          // entityTranscodes object below. Elements are applied
          // in order.
          elements: ['userId'],

          // When this value is true, the generated property will
          // be sharded.
          sharded: true,
        },
      },

      // Indexes for the Email entity as specified in the index
      // design.
      indexes: {
        // An index hashKey must be either the global hash key or a
        // sharded generated property. Its rangeKey must be either
        // the global range key, an ungenerated scalar property, or
        // an unsharded generated property. Any ungenerated
        // properties used MUST be included in the entityTranscodes
        // object below.
        created: { hashKey: 'hashKey', rangeKey: 'created' },
        userCreated: { hashKey: 'userHashKey', rangeKey: 'created' },
      },

      // Transcodes for ungenerated properties used as generated
      // property elements or index components. Transcode values
      // must be valid config transcodes object keys. Since this
      // config does not define a transcodes object it uses
      // defaultTranscodes exported by @karmaniverous/entity-tools.
      elementTranscodes: {
        created: 'timestamp',
        userId: 'string',
      },
    },
    // User entity config.
    user: {
      uniqueProperty: 'userId',
      timestampProperty: 'created',

      // User entity's shard bump schedule. Hash keys created
      // before the timestamp are unsharded (1 possible shard key).
      // Hash keys created afterward have a 1-char, 2-bit shard key
      // (4 possible shard keys).
      shardBumps: [{ timestamp: 1730617827000, charBits: 2, chars: 1 }],

      generated: {
        firstNameRangeKey: {
          elements: ['firstNameCanonical', 'lastNameCanonical', 'created'],
        },
        lastNameRangeKey: {
          elements: ['lastNameCanonical', 'firstNameCanonical', 'created'],
        },
        userBeneficiaryHashKey: {
          atomic: true,
          elements: ['beneficiaryId'],
          sharded: true,
        },
        userHashKey: {
          atomic: true,
          elements: ['userId'],
          sharded: true,
        },
      },
      indexes: {
        created: { hashKey: 'hashKey', rangeKey: 'created' },
        firstName: { hashKey: 'hashKey', rangeKey: 'firstNameRangeKey' },
        lastName: { hashKey: 'hashKey', rangeKey: 'lastNameRangeKey' },
        phone: { hashKey: 'hashKey', rangeKey: 'phone' },
        updated: { hashKey: 'hashKey', rangeKey: 'updated' },
        userBeneficiaryCreated: {
          hashKey: 'userBeneficiaryHashKey',
          rangeKey: 'created',
        },
        userBeneficiaryFirstName: {
          hashKey: 'userBeneficiaryHashKey',
          rangeKey: 'firstNameRangeKey',
        },
        userBeneficiaryLastName: {
          hashKey: 'userBeneficiaryHashKey',
          rangeKey: 'lastNameRangeKey',
        },
        userBeneficiaryPhone: {
          hashKey: 'userBeneficiaryHashKey',
          rangeKey: 'phone',
        },
        userBeneficiaryUpdated: {
          hashKey: 'userBeneficiaryHashKey',
          rangeKey: 'updated',
        },
      },
      elementTranscodes: {
        beneficiaryId: 'string',
        created: 'timestamp',
        firstNameCanonical: 'string',
        lastNameCanonical: 'string',
        phone: 'string',
        updated: 'timestamp',
        userId: 'string',
      },
    },
  },
};

Type Parameters & Global Keys

Together with its intrinsic shape, the Config type’s four type parameters work together to determine what constitutes a valid Config object.

We’ll go into some detail about each in the following sections, but here they are in brief:

Parameter Type Default Description
M EntityMap - The map of entities in your data model
HashKey string 'hashKey' The hash key property name shared across all entities
RangeKey string 'rangeKey' The range key property name shared across all entities
T TranscodeMap DefaultTranscodeMap The map of transcodes used in your configuration

In a simple Entity Manager configuration restricted to default transcodes, only the first type parameter is required!

The Config object also contains the hashKey and rangeKey properties, whose values must exactly match those of the HashKey and RangeKey type parameters, respectively. This is an unavoidable redundancy: while the type parameters help ensure a valid Entity Manager configuration, the corresponding config properties play an important role at runtime.

The property names specified in hashKey and rangeKey must not conflict with any entity property name in the M type parameter. If they do, TypeScript will throw a type error.

Entity Configurations

The entities property of the Config object is a Record-type object whose keys exactly match the keys of the M type parameter, i.e. the configuration’s EntityMap. Missing or extra keys will cause a type error.

The value associated with each key is the configuration object for that entity. The following sections describe the properties of that object.

Query Limits

As described here, a key Entity Manager feature is that it simplifies complex, cross-shard, multi-index search operations by automatically decomposing these into a batch of single-shard, single-index queries that are conducted in parallel.

Within this context, pageSize indicates the maximum number of items per data page returned by one of these simplified internal queries, and limit indicates the minimum paging threshold of the combined result.

Subject to implementation requirements, pageSize and limit can be set on an individual query. Failing that, their default values for a given entity are set here. If one of these values is not set in the config object, its value defaults to 10.

throttle is the maximum number of queries that can be conducted in parallel. If not set in the config object, its value defaults to 10.

Indexes

The indexes configuration defines the indexes associated with the entity. This configuration serves several purposes:

  • Permits dehydration & rehydration of page keys in support of cross-shard, multi-index query operations.

  • Permits automatic generation of platform-specific data definitions, for example DynamoDB table definitions via the generateTableDefinition in the entity-client-dynamodb package.

In the future, we will exploit this configuration at the command line to generate platform-specific data definitions, e.g. the CloudFormation specification of a DynamoDB table.

For now, these indexes are used to specify managed queries and to dehydrate & rehydrate the associated page key maps.

The indexes property is an object whose keys are the index names, and whose values are objects with the following properties:

Property Type Description
hashKey ConfigEntityIndexComponent The hash key component of the index. Must be either the global hash key or a sharded generated property.
rangeKey ConfigEntityIndexComponent The range key component of the index. Must be either the global range key, a scalar ungenerated property, or an unsharded generated property.
[projections] string[] Properties to project from the index, including those not otherwise part of the configuration. May not include the global or index hash or range key. If omitted, all properties will be projected.

See here for a discussion of special index structure considerations within the DynamoDB context.

Generated Properties

The generated configuration defines each of an entity’s generated properties, which are indicated in the entity’s entry in the config’s M (EntityMap) type parameter by a never type.

All such properties must be represented by a key in the generated object. If any are missing, or if any extra keys are present, TypeScript will throw a type error.

The value associated with each key is a configuration object that defines the generated property’s structure. This object has the following properties:

Property Type Default Description
atomic boolean false If true, any missing component results in an undefined generated value. If false, missing components are rendered as an empty string.
components string[]   An array of ungenerated property names that will be used to generate the property. Order matters, and only properties defined in Element Transcodes may be used!
sharded boolean false If true, the generated property will be sharded. If false, it will not.

Element Transcodes

The elementTranscodes configuration determines:

  • which of an entity’s properties can be used as an element of an index or a generated property, and

  • which transcode should be used to encode that property from its native type to a string, and decode it from a string back to its native type.

The elementTranscodes property is a Record-type object whose keys are each a property name of the entity, and whose values are the transcode to be associated with that property. It must follow these rules:

  • Each key must be a property name of the entity.

  • The entity property type of a given key (as expressed in the config’s M type parameter) must match the transcode type of the corresponding value (as expressed in the config’s T type parameter).

In other words, every elementTranscodes property must NOT be a generated property, and either…

In practice, the best way to determine which properties should be included in elementTranscodes is to create your generated properties & indexes first. Your elementTranscodes will be:

These rules will be validated at runtime when you parse your configuration object. See above for examples.

Sharding Strategy

This configuration defines the sharding strategy for the entity. See here for an introduction to the rationale behind sharding and structure of a shard key.

Every record created by Entity Manager will have a shard key embedded in its hash key. By default, this shard key is an empty string, resulting in a single data partition across the entity.

A record’s shard key is assigned at the time of record creation and does not change for the life of the record. Its value is determined by the following entity configurations:

  • shardBumps is an array of objects that defines a sharding schedule: how the number of partition shards scales over time.

  • timestampProperty is the name of the entity property that will be used to determine which specific shard bump applies to a given record. This in turn determines the number of available shards at that point in time and the length of the shard key. The record’s creation timestamp is the best choice for this property.

  • uniqueProperty is the name of the entity property that will be hashed in order to determine the shard key for the record. It should be the record’s unique identifier.

The entity properties named in timestampProperty and uniqueProperty should be populated on all records for all entities represented by the configuration.

A shardBump object has the following properties:

Property Type Description
charBits number The number of bits to use for the shard key.
chars number The number of characters to use for the shard key.
timestamp number The timestamp at which this shard bump takes effect.

If no shardBumps are defined for an entity, its shardBumps property will default to the following value:

[{ charBits: 1, chars: 0, timestamp: 0 }];

The effect of this is an empty shard key and a single data partition for all records.

If shardBumps are defined but contains no record with a zero timestamp value, the above value will be prepended to the array. In any case, the array will be sorted by timestamp value on parsing.

shardBumps must obey the following rules:

  • chars must be a non-negative integer valued from 0 to 40 inclusive. Its value must increase monotonically with timestamp.

  • charBits must be an integer valued from 1 to 5 inclusive.

  • timestamp must be a non-negative integer, and duplicate timestamp values are not allowed.

For any shardBump, the number of available shards is given by chars * (2 ** charBits). At the outer limit, this works out to 26,241 shards across all time, or a total of over 656 million records even at the maximum DynamoDB record size of 400 KB. Most implementations should not approach this limit, as querying this many shards in parallel is bound to impact performance, but it should be plain that the design is scalable enough for any application.

Queries that include a date range may cross shardBump boundaries, in which case all relevant shards will be searched. The default case is no timestamp constraint, meaning that all shards up to and including the current shardBump will be searched.

This design permits a staged scaling strategy. When the number of records for a given entity is low, it makes sense to have only one shard. As the number of records increases, the number of shards can be increased in a controlled manner.

So long as a new shardBump is only added with a future timestamp, your sharding strategy can be scaled gracefully, with no interruptions and without any need to migrate data following shard bumps.

Delimiters

The Entity Manager config defines the following special characters as delimiters, to be used when composing generated property values, including the global hashKey and rangeKey:

Property Type Default Description
generatedKeyDelimiter string '│' Separates key-value pairs in a generated property.
generatedValueDelimiter string '#' Separates key from value in a generated property key-value pair.
shardKeyDelimiter string '!' Separates entity token from shard key in a sharded generated property.

An entity property used as a generated property element or an index component should never contain any of these delimiter characters! If this is unavoidable, use these configurations to define an alternate delimiter that will not collide with your entity data.

Config Transcodes

The transcodes configuration property has a type of Transcodes<T>, where T is the TranscodeMap type parameter of the Config type.

By default, T is DefaultTranscodeMap and transcodes is defaultTranscodes. See Custom Transcodes to learn how to define custom transcodes for your configuration.

T and the transcodes property must be compatible or Typescript will throw a type error! In general this means that if you customize one, you must customize the other. The exception is that you are free to override existing default transcodes so long as you maintain the existing type signatures.

Runtime Validation

Typescript will enforce the structure of your Config object at compile time. This will largely keep you out of trouble, but there are some validity checks than can only be performed at runtime (e.g. validating that an entity’s shardBumps.chars value increases monotonically with timestamp).

Also, some developers will choose to write their code in Javascript and will not be able to leverage compile-time validation at all.

To satisfy both of these cases, the EntityManager constructor leverages zod to validate the Config object at runtime. If the object is invalid, the constructor will throw an error with a detailed message explaining the problem.

The ItemMap Type

The Entity and EntityMap types described above effectively help you define the structure of your data model.

Unfortunately, your Entity types are not suitable for representing actual data objects in your application because:

  • they use the convention of identifying generated properties with a never type, and

  • they do not include the hashKey and rangeKey properties identified elsewhere in your config.

The ItemMap type takes the same type arguments as your Config object (except for the final TranscodeMap argument, which is not relevant), and returns a type that looks just like your EntityMap except that:

  • all generated properties initially typed as never are replaced with string types, and

  • string-valued hash and range key properties are added with names specified in the HashKey and RangeKey type parameters, respectively.

Here is an example of how to exploit the ItemMap type within the context of the MyEntityMap type defined above:

type MyItemMap = ItemMap<MyEntityMap, "hashKey", "rangeKey">;

type EmailItem = MyItemMap["email"];
// {
//   created: number;
//   email: string;
//   userId: string;
//   hashKey: string;
//   rangeKey: string;
//   userHashKey: string;
// }

type UserItem = MyItemMap["user"];
// {
//   beneficiaryId: string;
//   created: number;
//   firstName: string;
//   firstNameCanonical: string;
//   lastName: string;
//   lastNameCanonical: string;
//   phone?: string;
//   updated: number;
//   userId: string;
//   firstNameRangeKey: string;
//   hashKey: string;
//   lastNameRangeKey: string;
//   rangeKey: string;
//   userBeneficiaryHashKey: string;
//   userHashKey: string;
// }

Javascript

If you are working in Javascript, you can still use Entity Manager! Just be aware that you will not benefit from the compile-time validation that Typescript provides.

When defining custom transcodes, you will do so without reference to types, so the example above would look like this:

import { defaultTranscodes } from `@karmaniverous/entity-manager`;

const myTranscodes = {
  ...defaultTranscodes, // reuse default transcodes

  fix13: { /* same as above */ },
};

Your configuration object will also be identical to the Typescript version, but without the type annotations:

import { defaultTranscodes } from `@karmaniverous/entity-manager`;

const config = { /* same as above */ };

The EntityManager constructor will still validate your configuration at runtime, so you can be confident that it is correct before proceeding.

Having said that: if you are working in Javascript, you should really consider switching to Typescript! The benefits are enormous, and the learning curve is not as steep as you might think.