I’ve been asked this a few times now, so I thought it would be a good idea to publish why I prefer to use the API-based workflow instead of the VCS-backed workflow in our Hashicorp Sentinel SDLC.
Tl;Dr: Want to do lots of automated testing + validation before deploying? Want exceptions? You need API-driven.
Table of Contents
Background
We have several teams of globally distributed people that need to be able to quickly write/test/deploy Sentinel policies. This means that our process for getting policies tested and deployed needs to:
- Be Repeatable
- Be Intuitive
- Be Scalable
- Provide Rigorous Testing
- Provide Confidence in a Successful Deploy
We therefore need a great CI/CD process + pipeline to be successful. There are 2 options with interacting with Terraform Cloud / Terraform Enterprise (TFC/TFE) when it comes to policy set creation – the API-driven workflow and the VCS-backed workflow.
The API-based workflow uses the TFC/TFE API to create a new policy set version, upload the policy set, and apply it to whatever workspaces/orgs you wish. It’s all API-driven and pretty straight forward.
The Version Control System (VCS) backed workflow is a nice way for probably many Sentinel users to keep their policies version controlled in Git, do some testing/validation, then promote to different environments accordingly. You could even do something like slow-rolling new policy sets based on branch names.
Where VCS-backed fell short
A few things stood out to me:
-
Long-standing branches are an antipattern in Git, so I didn’t want to use that as part of our CI/CD processes
-
No possibility for exceptions, because what’s in the repo is what gets applied, so to have an exception means another repo must be added (or another branch, to which I say, “see above”)
-
Lacking environment differentiation - because different environments need different policies, and a ton of repos wasn’t my answer. To me, this sounds like the fast path to drift. I’d rather have a common foundation then remove excepted policies than maintain numerous policy sets.
-
Difficulty comparing different policy sets, because if there’s an outage caused by a wayward policy or someone suddenly wanting to use an unauthorized resource, 2am me doesn’t want to run git diff.
-
Audits … shoutout to my Financial Services buddies that are all-too-aware. The typical line of questions for something like Sentinel might be:
- What are the policies you have? Easy, show them sentinel.hcl
- Where are these applied? Easy, show them either the console or the /oraganizations/{orgName}/polsets GET API response
- What policies were applied and where on $RandomDate? Sure, here’s the git history, see that we’re configured for VCS
- Auditor: And can you prove that the things shown in the Git history were actually applied, and where? Also, you uploaded a tarball, please give me a copy of that tarball. No, downloading that commit and re-tar’ing doesn’t count.
Overall, none of these on their own would have moved me away from VCS-backed workflows, but taken together it paints a picture that perhaps the API-driven workflow is a better option.
What I envisioned
I’ll break this up into a few sections to group like items. In general, I wanted granular capabilities across the policy developer experience and into the production environment.
Pre-Deployment Testing
I plan to write another article about this, but for now suffice it to say that it’s important to have good pre-deployment checks. Note that at this stage, you can and should have similar tests even if you’re running VCS-backed. Here’s what I wrote for this:
- File structure validation: are all the requisite files/directories where I expect?)
- Test Case Validation: Must be 1+ pass, 1+ fail, 2+ *.sentinel mocks – similar to above, to ensure all policies have tests defined
- Linting: It’s not great, but I wrote a very basic linting tool for some common errors (duplicate policies, git artifacts left in policy, etc.)
- Code Formatting:
sentinel fmt *.sentinel
- Unit Testing:
sentinel test *.sentinel
- etc.
Exceptions Management
Surprising nobody, security teams end up creating exceptions for policies when business need arises. For this context, let’s look at how an approved exception might be applied.
I wrote up a little JSON formatted file that had the record items that would be needed and wanted – Here’s a similar example:
[
{
"_id": "603fe2c8c7ab31c5ec883297",
"isActive": true,
"environment": "prod",
"organization": "ProductionOrgName"
"Polset-Name": "CloudGovernance"
"workspace": "api-team-123"
"risk_approval": "https://your-risk-tracking-site/thisRiskID",
"poc_email": "risky.user@example.com",
"description": "This team had a good reason for it (but really, write better descriptions)",
"created": "2021-04-27",
"expires": "2021-05-27",
"exception_details": [
{
"restrict-eks-node-group-size": "advisory",
"prevent-ingress-rdp": "advisory"
}
]
},
{
"_id": "603fe2c4347fd5fb0ce97842",
... <etc>
]
So now, I get several key things:
- I can maintain a single sentinel.hcl file
- The pipeline can handle several steps for me:
- Create directories for each deployment needed on the fly
- Parse the exceptions JSON to find where rewrites need to happen
- Edit the sentinel.hcl file in the requisite environment directories
- Upload each version to the artifact repository, including the exceptions file itself for posterity
- Deploy the policy sets to TFC/TFE
- Exceptions automatically age out
- Aging out exceptions can notify the owner of the risk before the item is removed
Thanks to the dates in these plus an extra “isActive” that can be used to turn off exceptions early, parsing and determining whether to take action is pretty straightforward. Bonus: The exceptions file itself + the process behind it are easy for a non-technical person to understand, making audits that much easier.
Package and Ship
As part of the upload process, you already tar the policies + the sentinel.hcl file. Why not save that artifact? Developers writing anything else publish their build artifact(s) to someplace, perhaps that’s JFrog’s Artifactory, GitHub Package Registry, Nexus Repository, Maven Repo, or some other flavor, it’s a common pattern to build -> store -> deploy from that storage.
This gives a few advantages:
- Guarantees that what you’re deploying is what’s stored/downloadable
- Offers a snapshot in time that’s easily consumable
- Binary validation provides a provable/attestable record
- Evidence can be stored more easily & automatically
Git things
This workflow keeps people using good practices and lets them navigate branches/commits easily without worrying about some of the long-standing branch merge fears. It also lets us keep branches labeled with their associated ticket number rather than having to follow a sort of deploy-inspired-naming-scheme. By having the branches follow ticket number names, we can take advantage of pre-commit hooks & some other internal automated workflows to simplify the policy development experience.
Post-deployment testing
With the API-driven workflow I can trigger a test run in a workspace that I and other automation monitor. Does the count of policies applied match the count of policies in sentinel.hcl? Did the policy check fail to run? Did the policy check fail due to some other error? The VCS-backed workflow doesn’t have a great answer to this, as it’s implied that users would test before promoting and that they’d be confident enough. I wanted additional testing in the same org/under the same applied policy set.
Summary
I hope that’s made sense, and sorry about the long-winded post. Reach out if you’ve got questions/comments/concerns, or want to hire me to consult on your project. I’d love to chat about Sentinel, the Sentinel developer experience, and integrating Sentinel into your business.
Coming soon…
I’m working on testing which checks against the whole org’s Sentinel Mocks to get a “what if” sort of report before deployment. Check back soon!