Stuff to know before getting started
The collins data model
Collins was designed from the beginning to represent assets in the simplest way possible. This simplicity makes for an efficient data-model and allows for a large degree of flexibility. Collins really only knows about a few different kinds of things. Collins has:
That's it. Everything you need to know about collins has to do with one of these types of things.
An asset in collins describes a thing, usually a piece of hardware or a configuration.
Assets themselves are very simple internally and only have a handful of meaningful attributes. An asset consists of a:
New asset types are fairly easy to create but can not be created via the API (although this is on the roadmap). Status values are not generally added/changed, typically a new state is used instead.
Status and state describe the current phase of the lifecycle of an asset.
The lifecycle (from birth to death) of an asset are described in terms of its status. The possible status values are fixed and can not be managed via the API. While all available status values are listed below, the descriptions given are primarily indiciative of their meanings for a server. A non-server asset type such as a configuration may only ever be allocated or decommissioned, for instance. The status values described below also give some specific insight into the Tumblr intake process for hardware.
The status transition should not generally happen by hand. Automated processes should drive status changes, not people. In fact, the collins UI only allows you to change the status by taking an action (e.g. cancelling an asset or putting it into maintenance).
While the status of an asset describes where it is in a discrete lifecycle, the state describes a lifecycle specific to a status. For example,
a server that is in maintenance may have a state of HARDWARE_PROBLEM
or HARDWARE_UPGRADE
. Also note that those states
are not appropriate for healty (non-maintenance) assets, and so these states are restricted to assets with a status of
Maintenance
.
A state can be either a system state, or a non-system state. System states can not be modified or destroyed. Non-system states can be
modified and destroyed. Via the API you can only create non-system states, although support for adding system states may be added in
the future. A state can be bound to a status (such as the case of HARDWARE_PROBLEM
), or can be used with any status
(such as the case of RUNNING
). The out of the box available states are described below.
Status | State Label | State Name | State Description |
Any | Failed | FAILED | A service in this state has encountered a problem and may not be operational. It cannot be started nor stopped. |
Any | New | NEW | A service in this state is inactive. It does minimal work and consumes minimal resources. |
Any | Running | RUNNING | A service in this state is operational. |
Any | Starting | STARTING | A service in this state is transitioning to Running. |
Any | Stopping | STOPPING | A service in this state is transitioning to Terminated. |
Any | Terminated | TERMINATED | A service in this state has completed execution normally. It does minimal work and consumes minimal resources. |
Maintenance | Hardware Problem | HARDWARE_PROBLEM | An asset is experiencing a non-IPMI issue and needs to be examined. It needs investigation. |
Maintenance | Hardware Testing | HW_TESTING | Performing some testing that requires putting the asset into a maintenance state. |
Maintenance | Hardware Upgrade | HARDWARE_UPGRADE | An asset is in need or in process of having hardware upgraded. |
Maintenance | IPMI Problem | IPMI_PROBLEM | An asset is experiencing IPMI issues and needs to be examined. It needs investigation. |
Maintenance | Maintenance NOOP | MAINT_NOOP | Doing nothing, bouncing this through maintenance for my own selfish reasons. |
Maintenance | Network Problem | NETWORK_PROBLEM | An asset is experiencing a network problem that may or may not be hardware related. It needs investigation. |
Maintenance | Relocation | RELOCATION | An asset is being physically relocated. |
More information about the states available in your collins instance can be found in the collins help.
An audit trail with an API
Every modification or lifecycle event that occurs with an asset is logged, along with who made the change and the time of the change. If a tag is modified (and not encrypted), the previous and new value are both stored. Logs can be searched via the API and can be viewed on the web as well. Logs are immutable but can be created via the API. Below is a list of log levels (based on syslog) and descriptions.
Level | Description |
EMERGENCY | A "panic" condition - notify all tech staff on call? (earthquake? tornado?) - affects multiple apps/servers/sites... |
ALERT | Should be corrected immediately - notify staff who can fix the problem - example is loss of backup ISP connection |
CRITICAL | Should be corrected immediately, but indicates failure in a primary system - fix CRITICAL problems before ALERT - example is loss of primary ISP connection |
ERROR | Non-urgent failures - these should be relayed to developers or admins; each item must be resolved within a given time |
WARNING | Warning messages - not an error, but indication that an error will occur if action is not taken, e.g. file system 85% full - each item must be resolved within a given time |
NOTICE | Events that are unusual but not error conditions - might be summarized in an email to developers or admins to spot potential problems - no immediate action required |
INFORMATIONAL | Normal operational messages - may be harvested for reporting, measuring throughput, etc - no action required |
DEBUG | Info useful to developers for debugging the application, not useful during operations |
NOTE | Creates by users via the web UI, can be any kind of message |
System logs (messages that aren't specific to any particular kind of asset) can only be logged internally by collins.
Collins of course uses an asset to log these kinds of messages against.
By default the system asset is the multicollins.thisInstance
value, or tumblrtag1
. You can
specify the system asset via the features.syslogAsset
configuration.
IPAM for engineers, API included
Collins has an IP Address Management (IPAM) system built into it. The IPAM system is used for allocating both IPMI addresses and typical addresses. Addresses are configured in pools (which typically correspond to a VLAN), but can also be configured to be pool-less in the case where you don't manage your own IP Address space.
Collins will prevent duplicate IP address allocation, and will almost always use the smallest available address in a range. It is possible to allocate an address against any kind of asset. This is sometimes useful for instance when managing a VIP (virtual or floating IP address). You can create a configuration asset that holds the VIP for a service, then link that asset to others that will actually share the address.
In addition to address allocation, collins provides the ability to do other things you would expect from a typical IPAM system such as querying used addresses, understanding what an IP space looks like, finding assets in a pool or by address, etc.
At Tumblr we combine the IPAM functionality of Collins with the per asset LLDP data to automatically manage switch provisioning. We also use this data for generating kickstart files with the correct address information.
The fundamental idea with Collins IPAM is that of a pool. A pool is a named group of addresses. A pool definition will specify the network address range (specified in CIDR notation), an optional start address (e.g. the IP to start allocating from in the specified range), an optional gateway (if it's not the one you would infer from the CIDR range), and a name. Once a pool is configured it is possible to allocate addresses in that pool. If you don't manage your own address space, no worries. You can operate in a a 'pool-less' mode where you can specify any address.
There is more information available in the API section as well as in the configuration section.