Entity Manager is a highly generic tool that enables efficient put, get, and query operations across many entity relationships, indexes, and sharded partitions.
Entity Manager was designed to operate within the context of AWS DynamoDB, but should work equally well with any sufficiently similar NoSQL platform.
To accomplish this, Entity Manager needs to know:
-
Which data types are indexable on your data platform, and how to represent those types within a compound index.
-
Your entities, their properties and related types, and which properties will be generated by Entity Manager.
-
The structures of your generated properties and indexes.
-
Your partition sharding strategy for each entity.
This documentation takes a Typescript-first approach! All discussions & code examples will assume you are using Typescript, and we will call out Javascript-specific considerations where appropriate.
Generated Properties & Transcodes
As discussed in detail in Evolving a NoSQL Database Schema, Entity Manager indexes are supported by special generated properties.
Generated properties always have a string
type. Within the Entity Manager config object, an entity generated property is specified by a simple array of its component property names. These components can be any non-generated property of the same entity, so long as that property is supported by a transcode.
Transcodes
A transcode is a pair of functions that convert a property value to and from a string
type, such that the resulting strings are guaranteed to sort in the same order as the original values.
Transcodes and related mechanisms are actually defined in the @karmaniverous/entity-tools
package, which is a dependency of both Entity Manager and the @karmaniverous/mock-db
package used to test Entity Manager. For developer convenience these are re-exported from the @karmaniverous/entity-manager
package.
For example, here is the definition of the timestamp
transcode, one of the default transcodes provided by entity-tools
:
import { isInt, isString } from 'radash';
import { type DefaultTranscodeMap, type Transcodes } from `@karmaniverous/entity-manager`;
const defaultTranscodes: Transcodes<DefaultTranscodeMap> = {
..., // other transcodes
timestamp: {
encode: (value) => {
if (!isInt(value) || value < 0 || value > 9999999999999)
throw new Error('invalid timestamp');
return value.toString().padStart(13, '0');
},
decode: (value) => {
if (!isString(value) || !/^[0-9]{13}$/.test(value))
throw new Error('invalid encoded timestamp');
return Number(value);
},
},
};
radash
is a key Entity Manager dependency, which provides a set of type-safe utility functions for working with data.
The purpose of this transcode is to convert a Unix timestamp (which is always a 13-digit integer) into a 13-character numerical string and back. The transcode’s encode
and decode
functions contain some type validation to catch invalid values in either direction.
timestamp
is a simple case: because a Unix timestamp is an unsigned integer and always has the same length, its string representation will always sort properly.
fix6
is another default transcode that presents a more complex case: it handles a signed, fixed-point number with 6 decimal places.
import { isNumber, isString } from 'radash';
import { type DefaultTranscodeMap, type Transcodes } from `@karmaniverous/entity-manager`;
const defaultTranscodes: Transcodes<DefaultTranscodeMap> = {
..., // other transcodes
fix6: {
encode: (value) => {
if (
!isNumber(value) ||
value > Number.MAX_SAFE_INTEGER / 1000000 ||
value < Number.MIN_SAFE_INTEGER / 1000000
)
throw new Error('invalid fix6');
const [prefix, abs] = value < 0 ? ['n', -value] : ['p', value];
return `${prefix}${abs.toFixed(6).padStart(17, '0')}`;
},
decode: (value) => {
if (!isString(value) || !/^[np][0-9]{10}\.[0-9]{6}$/.test(value))
throw new Error('invalid encoded fix6');
return (value.startsWith('n') ? -1 : 1) * Number(value.slice(1));
},
},
..., // other transcodes
};
fix6
uses the following techniques to meet transcode requirements:
-
range checking (not required for
timestamp
because Unix timestamps are always positive and have a fixed length), and -
sign handling (prefixes positive numbers with
p
and negative numbers withn
to ensure proper alpha sort), and -
zero-padding of small values (again to ensure proper alpha sort).
Entity Manager offers the following set of default transcodes:
Transcode | Description |
---|---|
bigint20 |
BigInt value with a maximum of 20 digits |
boolean |
boolean value |
fix6 |
signed, fixed-point number with 6 decimal places |
int |
signed integer |
string |
string value |
timestamp |
Unix timestamp |
Click here to review these default transcode definitions.
Note that, like the defaultTranscodes
object (which defines multiple transcodes), each example transcode object above has a type of Transcodes
. This special type ensures that the defined object…
-
has the correct keys (the transcode names), and
-
defines a correctly-typed
encode
anddecode
function for each key.
This is guaranteed by the Transcodes
type’s single type parameter, which is a TranscodeMap
.
The TranscodeMap
Type
A TranscodeMap
is a simple Record type that defines:
-
the name of each transcode, and
-
the type each transcode encodes into or decodes from a string value.
For example, here is the definition of the DefaultTranscodeMap
type, which drives the defaultTranscodes
object:
import { type TranscodeMap } from `@karmaniverous/entity-manager`;
interface DefaultTranscodeMap extends TranscodeMap {
bigint20: bigint;
boolean: boolean;
fix6: number;
int: number;
string: string;
timestamp: number;
}
An object of type Transcodes<DefaultTranscodeMap>
must…
-
have the same keys as
DefaultTranscodeMap
(these are the transcode names), and -
provide an
encode
function that converts a value of the corresponding type to a string, and -
provide a
decode
function that converts a string to a value of the corresponding type.
If any of these conditions are not met, TypeScript will throw a type error.
If you are only using Entity Manager’s default transcodes, you can skip the next section. But if you have reason to define your own transcodes, read on!
Custom Transcodes
Let’s say you are building a high-precision navigation application.
Latitide & longitide values require three digits to the left of the decimal point, so a 64-bit signed number
leaves room for 13 digits of decimal precision. You will want to define a custom transcode for this data type. Let’s extend the existing transcode naming convention and call this transcode fix13
.
The first step in defining fix13
will be to extend DefaultTranscodeMap
:
import { type DefaultTranscodeMap } from `@karmaniverous/entity-manager`;
interface MyTranscodeMap extends DefaultTranscodeMap {
fix13: number;
}
Next, you can define a new transcodes object that includes fix13
, using the fix8
transcode definition as a template:
import { isNumber, isString } from 'radash';
import { defaultTranscodes, type Transcodes } from `@karmaniverous/entity-manager`;
const myTranscodes: Transcodes<MyTranscodeMap> = {
...defaultTranscodes, // reuse default transcodes
fix13: {
encode: (value) => {
if (
!isNumber(value) ||
value > Number.MAX_SAFE_INTEGER / 10000000000000 ||
value < Number.MIN_SAFE_INTEGER / 10000000000000
)
throw new Error('invalid fix13');
const [prefix, abs] = value < 0 ? ['n', -value] : ['p', value];
return `${prefix}${abs.toFixed(13).padStart(17, '0')}`;
},
decode: (value) => {
if (!isString(value) || !/^[np][0-9]{3}\.[0-9]{13}$/.test(value))
throw new Error('invalid encoded fix13');
return (value.startsWith('n') ? -1 : 1) * Number(value.slice(1));
},
},
};
The Entity
Type
The Entity
type is a simple Record type that defines the properties of an entity. A type extending the Entity
type should follow these conventions:
-
Each key is a property name. All Entity properties should be represented, including generated properties and those with complex types.
-
All generated properties should have a type of
never
.
For example, here are the definitions of the Email
and User
types discussed in Evolving a NoSQL Database Schema:
import { type Entity } from `@karmaniverous/entity-manager`;
interface Email extends Entity {
created: number;
email: string;
userId: string;
// generated properties
userHashKey: never;
}
interface User extends Entity {
beneficiaryId: string;
created: number;
firstName: string;
firstNameCanonical: string;
lastName: string;
lastNameCanonical: string;
phone?: string;
updated: number;
userId: string;
// generated properties
firstNameRangeKey: never;
lastNameRangeKey: never;
userBeneficiaryHashKey: never;
userHashKey: never;
}
The EntityMap
Type
The EntityMap
type is a simple Record type that defines the entities in your data model and assigns their respective Entity
types. An EntityMap
type should follow these conventions:
-
Each key is the token by which an Entity will be referenced throughout your configuration. All entities should be represented.
-
Each property type is the corresponding
Entity
type.
For example, the following MyEntityMap
type would support the User service table discussed in Evolving a NoSQL Database Schema, which includes the Email
and User
entities:
import { type EntityMap } from `@karmaniverous/entity-manager`;
interface MyEntityMap extends EntityMap {
email: Email;
user: User;
}
The Config Type
The EntityManager
class constructor takes a single argument of the Config
type.
Config
is a highly complex type, which encapsulates numerous rules whose net effect is to prevent the developer from creating an invalid Entity Manager configuration.
This section will cover each element of the Config
type in depth. First, though, here is an example Config
object that…
-
implements the example summarized at the end of Evolving a NoSQL Database Schema, and
-
for clarity, expresses and identifies all default values (normally default values can be omitted).
import { defaultTranscodes, type Config } from `@karmaniverous/entity-manager`;
const config: Config<
MyEntityMap,
'hashKey', // default value
'rangeKey', // default value
DefaultTranscodeMap, // default value
> = {
entities: {
email: {
defaultLimit: 10, // default value
defaultPageSize: 10, // default value
elementTranscodes: {
created: 'timestamp',
email: 'string',
userId: 'string',
},
generated: {
userHashKey: {
atomic: true,
components: ['userId'],
sharded: true,
}
},
indexes: {
created: ['created', 'hashKey', 'rangeKey'],
userCreated: ['created', 'hashKey', 'rangeKey', 'userHashKey'],
},
shardBumps: [ // default value
{ charBits: 1, chars: 0, timestamp: 0 },
],
timestampProperty: 'created',
uniqueProperty: 'email',
},
user: {
defaultLimit: 10, // default value
defaultPageSize: 10, // default value
elementTranscodes: {
beneficiaryId: 'string',
created: 'timestamp',
firstName: 'string',
firstNameCanonical: 'string',
lastName: 'string',
lastNameCanonical: 'string',
phone: 'string',
updated: 'timestamp',
userId: 'string',
},
indexes: {
created: ['created', 'hashKey', 'rangeKey'],
firstName: ['firstNameRangeKey', 'hashKey', 'rangeKey'],
lastName: ['hashKey', 'lastNameRangeKey', 'rangeKey'],
phone: ['hashKey', 'phone', 'rangeKey'],
updated: ['hashKey', 'rangeKey', 'updated'],
userBeneficiaryCreated: ['created', 'hashKey', 'rangeKey', 'userBeneficiaryHashKey'],
userBeneficiaryFirstName: ['firstNameRangeKey', 'hashKey', 'rangeKey', 'userBeneficiaryHashKey'],
userBeneficiaryLastName: ['hashKey', 'lastNameRangeKey', 'rangeKey', 'userBeneficiaryHashKey'],
userBeneficiaryPhone: ['hashKey', 'phone', 'rangeKey', 'userBeneficiaryHashKey'],
userBeneficiaryUpdated: ['hashKey', 'rangeKey', 'updated', 'userBeneficiaryHashKey'],
userCreated: ['created', 'hashKey', 'rangeKey', 'userHashKey'],
userUpdated: ['hashKey', 'rangeKey', 'updated', 'userHashKey'],
},
generated: {
firstNameRangeKey: {
atomic: true,
elements: ['firstNameCanonical', 'lastNameCanonical', 'created'],
sharded: false,
},
lastNameRangeKey: {
atomic: true,
elements: ['lastNameCanonical', 'firstNameCanonical', 'created'],
sharded: false,
},
userBeneficiaryHashKey: {
atomic: true,
components: ['beneficiaryId'],
sharded: true,
},
userHashKey: {
atomic: true,
components: ['userId'],
sharded: true,
},
},
timestampProperty: 'created',
uniqueProperty: 'userId',
},
},
generatedKeyDelimiter: '|', // default value
generatedValueDelimiter: '#', // default value
hashKey: 'hashKey', // default value
rangeKey: 'rangeKey', // default value
shardKeyDelimiter: '!', // default value
throttle: 10, // default value
transcodes: defaultTranscodes, // default value
};
Type Parameters & Global Keys
Together with its intrinsic shape, the Config
type’s four type parameters work together to determine what constitutes a valid Config
object.
We’ll go into some detail about each in the following sections, but here they are in brief:
Parameter | Type | Default | Description |
---|---|---|---|
M |
EntityMap |
- | The map of entities in your data model |
HashKey |
string |
'hashKey' |
The hash key property name shared across all entities |
RangeKey |
string |
'rangeKey' |
The range key property name shared across all entities |
T |
TranscodeMap |
DefaultTranscodeMap |
The map of transcodes used in your configuration |
In a simple Entity Manager configuration restricted to default transcodes, only the first type parameter is required!
The Config object also contains the hashKey
and rangeKey
properties, whose values must exactly match those of the HashKey
and RangeKey
type parameters, respectively. This is an unavoidable redundancy: while the type parameters help ensure a valid Entity Manager configuration, the corresponding config properties play an important role at runtime.
The property names specified in hashKey
and rangeKey
must not conflict with any entity property name in the M
type parameter. If they do, TypeScript will throw a type error.
Entity Configurations
The entities
property of the Config
object is a Record-type object whose keys exactly match the keys of the M
type parameter, i.e. the configuration’s EntityMap
. Missing or extra keys will cause a type error.
The value associated with each key is the configuration object for that entity. The following sections describe the properties of that object.
Query Limits
As described here, a key Entity Manager feature is that it simplifies complex, cross-shard, multi-index search operations by automatically decomposing these into a batch of single-shard, single-index queries that are conducted in parallel.
Within this context, pageSize
indicates the maximum number of items per data page returned by one of these simplified internal queries, and limit
indicates the minimum paging threshold of the combined result.
Subject to implementation requirements, pageSize
and limit
can be set on an individual query. Failing that, their default values for a given entity are set here. If one of these values is not set in the config object, its value defaults to 10
.
throttle
is the maximum number of queries that can be conducted in parallel. If not set in the config object, its value defaults to 10
.
Element Transcodes
The elementTranscodes
configuration determines:
-
which of an entity’s properties can be used as an element of an index or a generated property, and
-
which transcode should be used to encode that property from its native type to a string, and decode it from a string back to its native type.
The elementTranscodes
property is a Record-type object whose keys are each a property name of the entity, and whose values are the transcode to be associated with that property. It must follow these rules:
-
Each key must be a property name of the entity.
-
The entity property type of a given key (as expressed in the config’s
M
type parameter) must match the transcode type of the corresponding value (as expressed in the config’sT
type parameter).
See above for examples.
Generated Properties
The generated
configuration defines each of an entity’s generated properties, which are indicated in the entity’s entry in the config’s M
(EntityMap
) type parameter by a never
type.
All such properties must be represented by a key in the generated
object. If any are missing, or if any extra keys are present, TypeScript will throw a type error.
The value associated with each key is a configuration object that defines the generated property’s structure. This object has the following properties:
Property | Type | Default | Description |
---|---|---|---|
atomic |
boolean |
false |
If true , any missing component results in an undefined generated value. If false , missing components are rendered as an empty string. |
components |
string[] |
An array of ungenerated property names that will be used to generate the property. Order matters, and only properties defined in Element Transcodes may be used! |
|
sharded |
boolean |
false |
If true , the generated property will be sharded. If false , it will not. |
Indexes
The indexes
configuration defines the indexes associated with the entity.
In the future, we will exploit this configuration at the command line to generate platform-specific data definitions, e.g. the CloudFormation specification of a DynamoDB table.
For now, these indexes are used to specify managed queries and to dehydrate & rehydrate the associated page key maps.
The indexes
property is a Record-type object whose keys are the index names, and whose values are arrays of entity property names that act as index components.
To articulate a managed query, index names will be passed to a bespoke ShardQueryFunction
. This function may translate the passed index name to a name used internally by the database, but as a matter of simplicity it makes sense to use the same name for both.
To be valid for inclusion, an index component must be:
-
a generated property, or
-
an ungenerated property included in the entity’s
elementTranscodes
configuration.
The order of index components is not significant. See here for a discussion of special index structure considerations within the DynamoDB context.
Sharding Strategy
These configuration define the sharding strategy for the entity. See here for an introduction to the rationale behind sharding and structure of a shard key.
Every record created by Entity Manager will have a shard key embedded in its hash key. By default, this shard key is an empty string, resulting in a single data partition across the entity.
A record’s shard key is assigned at the time of record creation and does not change for the life of the record. Its value is determined by the following entity configurations:
-
shardBumps
is an array of objects that defines a sharding schedule: how the number of partition shards scales over time. -
timestampProperty
is the name of the entity property that will be used to determine which specific shard bump applies to a given record. This in turn determines the number of available shards at that point in time and the length of the shard key. The record’s creation timestamp is the best choice for this property. -
uniqueProperty
is the name of the entity property that will be hashed in order to determine the shard key for the record. It should be the record’s unique identifier.
The entity properties named in timestampProperty
and uniqueProperty
should be populated on all records for all entities represented by the configuration.
A shardBump object has the following properties:
Property | Type | Description |
---|---|---|
charBits |
number |
The number of bits to use for the shard key. |
chars |
number |
The number of characters to use for the shard key. |
timestamp |
number |
The timestamp at which this shard bump takes effect. |
If no shardBumps
are defined for an entity, its shardBumps property will default to the following value:
[{ charBits: 1, chars: 0, timestamp: 0 }];
The effect of this is an empty shard key and a single data partition for all records.
If shardBumps
are defined but contains no record with a zero timestamp
value, the above value will be prepended to the array. In any case, the array will be sorted by timestamp
value on parsing.
shardBumps
must obey the following rules:
-
chars
must be a non-negative integer valued from0
to40
inclusive. Its value must increase monotonically withtimestamp
. -
charBits
must be an integer valued from1
to5
inclusive. -
timestamp
must be a non-negative integer, and duplicatetimestamp
values are not allowed.
For any shardBump
, the number of available shards is given by chars * (2 ** charBits)
. At the outer limit, this works out to 26,241 shards across all time, or a total of over 656 million records even at the maximum DynamoDB record size of 400 KB. Most implementations should not approach this limit, as querying this many shards in parallel is bound to impact performance, but it should be plain that the design is scalable enough for any application.
Queries that include a date range may cross shardBump
boundaries, in which case all relevant shards will be searched. The default case is no timestamp constraint, meaning that all shards up to and including the current shardBump
will be searched.
This design permits a staged scaling strategy. When the number of records for a given entity is low, it makes sense to have only one shard. As the number of records increases, the number of shards can be increased in a controlled manner.
So long as a new shardBump
is only added with a future timestamp, your sharding strategy can be scaled gracefully, with no interruptions and without any need to migrate data following shard bumps.
Delimiters
The Entity Manager config defines the following special characters as delimiters, to be used when composing generated property values, including the global hashKey
and rangeKey
:
Property | Type | Default | Description |
---|---|---|---|
generatedKeyDelimiter |
string |
'│' |
Separates key-value pairs in a generated property. |
generatedValueDelimiter |
string |
'#' |
Separates key from value in a generated property key-value pair. |
shardKeyDelimiter |
string |
'!' |
Separates entity token from shard key in a sharded generted property. |
An entity property used as a generated property element or an index component should never contain any of these delimiter characters! If this is unavoidable, use these configurations to define an alternate delimiter that will not collide with your entity data.
Config Transcodes
The transcodes
configuration property has a type of Transcodes<T>
, where T
is the TranscodeMap
type parameter of the Config
type.
By default, T
is DefaultTranscodeMap
and transcodes
is defaultTranscodes
. See Custom Transcodes to learn how to define custom transcodes for your configuration.
T
and the transcodes
property must be compatible or Typescript will throw a type error! In general this means that if you customize one, you must customize the other. The exception is that you are free to override existing default transcodes so long as you maintain the existing type signatures.
Runtime Validation
Typescript will enforce the structure of your Config
object at compile time. This will largely keep you out of trouble, but there are some validity checks than can only be performed at runtime (e.g. validating that an entity’s shardBumps.chars
value increases monotonically with timestamp
).
Also, some developers will choose to write their code in Javascript and will not be able to leverage compile-time validation at all.
To satisfy both of these cases, the EntityManager
constructor leverages zod
to validate the Config
object at runtime. If the object is invalid, the constructor will throw an error with a detailed message explaining the problem.
The ItemMap
Type
The Entity
and EntityMap
types described above effectively help you define the structure of your data model.
Unfortunately, your Entity
types are not suitable for representing actual data objects in your application because:
-
they use the convention of identifying generated properties with a
never
type, and -
they do not include the
hashKey
andrangeKey
properties identified elsewhere in your config.
The ItemMap
type takes the same type arguments as your Config
object (except for the final TranscodeMap
argument, which is not relevant), and returns a type that looks just like your EntityMap
except that:
-
all generated properties initially typed as
never
are replaced withstring
types, and -
string
-valued hash and range key properties are added with names specified in theHashKey
andRangeKey
type parameters, respectively.
Here is an example of how to exploit the ItemMap
type within the context of the MyEntityMap
type defined above:
type MyItemMap = ItemMap<MyEntityMap, 'hashKey', 'rangeKey'>;
type EmailItem = MyItemMap['email'];
// {
// created: number;
// email: string;
// userId: string;
// hashKey: string;
// rangeKey: string;
// userHashKey: string;
// }
type UserItem = MyItemMap['user'];
// {
// beneficiaryId: string;
// created: number;
// firstName: string;
// firstNameCanonical: string;
// lastName: string;
// lastNameCanonical: string;
// phone?: string;
// updated: number;
// userId: string;
// firstNameRangeKey: string;
// hashKey: string;
// lastNameRangeKey: string;
// rangeKey: string;
// userBeneficiaryHashKey: string;
// userHashKey: string;
// }
Javascript
If you are working in Javascript, you can still use Entity Manager! Just be aware that you will not benefit from the compile-time validation that Typescript provides.
When defining custom transcodes, you will do so without reference to types, so the example above would look like this:
import { defaultTranscodes } from `@karmaniverous/entity-manager`;
const myTranscodes = {
...defaultTranscodes, // reuse default transcodes
fix13: { /* same as above */ },
};
Your configuration object will also be identical to the Typescript version, but without the type annotations:
import { defaultTranscodes } from `@karmaniverous/entity-manager`;
const config = { /* same as above */ };
The EntityManager
constructor will still validate your configuration at runtime, so you can be confident that it is correct before proceeding.
Having said that: if you are working in Javascript, you should really consider switching to Typescript! The benefits are enormous, and the learning curve is not as steep as you might think.