A Corporate Anthropologist’s Guide to Product Security
Building lasting cultural change
If you are looking for TL;DR, just read the bold headings.
I was recently asked to share some lessons learned deploying a Security Development Life-Cycle (SDLC) in a large engineering organization. That gave me an excuse to write down the various thoughts that had accumulated over the years. I hope this write-up serves two purposes: to be a guide to those who are building out an SDLC and to provide some context to those who have joined an organization with a mature SDLC already in place.
This is not a description of the basics of SDLCs. There is a lot of good literature on what goes into an SDLC. Michael Howard’s and Steve Lipner’s The Security Development Lifecycle and Gary McGraw’s Software Security: Building Security In are two seminal books on the subject. Building Security In Maturity Model (BSIMM) offers a concise reference based on data from dozens of companies. They are all great resources if you would like to learn about SDLC and what it encompasses.
Here I am assuming that the audience is already familiar with what constitutes an SDLC and instead focus on what I refer to as corporate anthropology involved in deploying an SDLC at a large company. It is not anthropology by corporations — “corporate anthropology” in its traditional business sense, where corporations hire anthropologists to study their customers. Rather, it is anthropology of corporations.
Companies have their own cultures, with customs, rituals, social hierarchies, power structures, etc. Any initiative that hopes to succeed has to adapt to that culture. What’s more, deploying an SDLC will require altering that culture and those changes themselves have to be carried out in a culturally acceptable way. Lest your head end up on a pike as a warning to others.
It’s not uncommon in our field to see this type of work described as evangelism. But I have always had a strong negative reaction to that label in this context. In my mind it conjures up images of someone coming in with disdain for local customs and with an air of righteous superiority preaching “the Truth” to heathens. That’s precisely how you end up on a pike. My job is not to preach. Nor do I aspire to martyrdom. My job is to help ship successful, secure products.
Caveat emptor: My first-hand experiences are based on a small sample size (one large company), over a considerable time span (20+ years). The extended tenure allowed me to observe long term outcomes — what worked, what did not, what backfired, what stuck, and what faded. I am obviously biased, but I don’t think the size of the sample is much of a factor. Based on interactions with peers in the industry, my experiences are not unique for a large company. Moreover, as Dan Kaminsky likes to point out, “anyone who thinks a large company, is just one company, has never worked for a large company.”
1. Define your measure of progress.
You probably already know what needs to be done, what activities will form the core of your SDLC. The playbook is pretty standard: design reviews, threat modeling, risk assessments, code reviews, automated analysis, adversarial testing, patching, etc. You know the state of the art.
But how will you known if what you are doing working? Sooner or later you will need to answer — if not to others than at least to yourself — how much more secure have the products become as a result of all the invested time and effort.
Focus on impact, not effort.
You may be tempted to look at all the work that is being performed as a measure of progress. But effort is not the same as impact. Nor are the two necessarily correlated. If you doubled your efforts and didn’t tell anyone, how would your customers notice? Could you quantify the effect?
This is a difficult open question. There is no good measure of “security.” No standardized benchmark to run. But there’s still a range of possible approaches you can take, listed below in order of decreasing correlation to actual impact.
Loss Statistics.
The essential goal of product security work is to minimize harm to customers that might arise out of any latent insecurity of the product. The ideal measure of success, therefore, would be some quantification of reduction in actualized harm over time. If the overall losses due to insecurity of a product are going down, then something is being done right. And vice versa.
Consider NHTSA vehicle fatality statistics. The change in number of fatalities per mile traveled indicates whether car travel is getting more or less safe. Similarly, CDC data can tell us if we are getting better or worse at dealing with different types of diseases. Unfortunately, there is no comparable data source to tell us if computer use has gotten safer over time? At its core, product security is crime prevention. But there’s precious little actual crime data to guide us.
Exploitation data.
Instead of observing losses, you can try to directly monitor for incidence of attacks. This requires extensive detection capabilities, with monitoring and telemetry from deployed products.
The main limitation of this approach is the inherent survivorship bias. You never know what percentage of actual attacks you are detecting. Additionally, depending on your market vertical and position within the supply chain, such monitoring capability may not be available to you at all.
Cost of attack.
In the absence of reliable statistics on actualized losses or attack incidence, you may be able to use cost of exploitation as a proxy for measure of security. For some set of attackers, cost of exploitation will be inversely proportional to probability of attack. This is likely to be true for most financially-motivated attackers and a subset of hobbyists. Though defense and intelligence agencies (if your threat model includes them) will be far less responsive to variation in exploit costs.
Getting reliable exploit prices for your product may prove challenging. It’s not the most transparent of markets. Although there are some well-publicized price lists, it is not clear how accurately they reflect reality. Furthermore, prices can fluctuate due to market conditions that are independent of the security of your product. If the exploit price is low, it may be because the target is insecure, or because it is irrelevant to the attackers.
Beware that market dynamics may result in an ironically symbiotic relationship between you and exploit developers. The more secure you make your product, the higher the cost of exploitation. That, in turn, raises the barrier to entry, which further entrenches the incumbent providers of offensive capabilities and likely raises their profit margins. Keep in mind that your goal is not to make the exploit developers suffer. Your goal is to protect the consumer. The two may not always be correlated. Stay focused on what matters.
Adherence to best practices.
When you can’t measure outcomes, you are left to measure compliance with industry norms. You don’t actually know if what you are doing is working, but at least you are doing the same thing as everyone else.
The dearth of reliable outcome metrics in security leads to a Market for Silver Bullets — a condition where neither the producer nor consumer know if the product is any good. The inability to distinguish which inputs lead to stronger outputs causes the market to herd around a set of “industry best practices.” Initiatives like BSIMM, Cyber-ITL, and a growing list of regulation, legislation, and security labeling schemes all measure adherence to best practices.
The principal drawback of this approach is the lack of empirically proven causal relationship between best practices and stronger security (as reflected in harm reduction).
Another risk of best practices is turning security into a “checkbox” requirement. Absent countervailing forces, organizations will be incentivized to do the absolute minimum to meet the letter of the requirement — just enough to check the box. There is no incentive to outperform the minimum requirements. This is the herding effect of the market for silver bullets. Deviation from the norm, in either direction, is strongly discouraged by the market. Every floor becomes a ceiling. Whenever a bar is set with the intent of it becoming the minimum that everyone is supposed to meet, it inevitably becomes the the maximum that no one is incentivized to exceed.
Finally, as you embrace best practices, beware of the trap you may be setting for your future self. As you are ramping up your SDLC, often the easiest path to building organizational buy-in for an activity is to point to its status as best practice and the long list of peers that have adopted it. “But, mom, everybody else is doing it” is a timeless and effective technique. Eventually, though, you will catch up with the industry norm and be in a position to innovate and to lead yourself. Except that by then you have spent years training your company’s engineering, legal, and finance teams to equate “right thing to do” with “everyone else is doing it.”
Qualitative assessment.
When all else fails, you are left with your own subjective assessment. You probably have some internal sense of how strong the product is based on what you know about the system architecture, ease of finding vulnerabilities, difficulty of getting them fixed, number of public exploits, conference talks, community sentiment, etc. While vague, highly subjective, and not really quantifiable, your intuition will always be essential in how you assess progress. This will likely be your starting point if you are just beginning to build your SDLC. The goal is to keep moving up this list to measures of progress that more meaningfully reflect the positive impact on your customers.
2. Align Authority, Incentives, and Accountability.
I entered the security profession around the turn of the century. Common thinking at the time placed much of the blame for insecure software on developers’ incompetence or, at best, ignorance. The solutions were as simplistic as the diagnosis — shame and education would reform the developers. Though this attitude still persists in pockets, including among some very influential members of the profession, fortunately it has considerably faded in popularity. My experience has been that this is not a constructive approach, rooted in a fundamentally flawed analysis.
“There is always a well-known solution to every human problem — neat, plausible, and wrong.”
— H. L. Mencken
Expect people to do what they are paid to do.
An underappreciated truth of the corporate world is that people do what they are paid to do. And in a development organization people are paid to ship products. In successful organizations (as determined by the market) developers are implicitly conscious of this goal and highly competent at executing on it. That is what makes their teams successful.
A pivotal moment in my security career came after reading Phillip Zimbardo’s The Lucifer Effect. In spite of the controversy surrounding the work, I still highly recommend this book to security professionals, especially team leads. A central thesis of the book is that, overwhelmingly, people behave as the system directs them to behave. We ascribe too much of the person’s actions to intent and severely underestimate the degree to which they are driven by systemic forces. As a corollary, we consistently misjudge what we would do in someone else’s place. More likely than not, put into their shoes we would act just as they do.
Applying this to myself brought on the realization that if I suddenly found myself leading a product team, my code would be on time more often than it would be secure. If that is what my manager demands and what my bonus is based on, then that is what I will deliver. As a colleague once noted, “your code will never be 100% secure, but it can still be 100% on time.”
It is not a matter of technical security expertise, or understanding the importance of security, or caring, or any moral virtue. No amount of training or preaching will succeed in making people do something that is orthogonal, or worse yet, counter to their job function and incentive structure.
This is not meant to take away personal responsibility for one’s choices and actions. Rather, recognizing the role of systemic forces allows us to make more accurate predictions of how people will act. More importantly, it gives us a strategy for meaningfully influencing those actions.
This realization had a profound effect on how I approached security work and security team’s interaction with the product teams. Developers’ behavior reflects the corporate incentive structures (both, formal and informal). If you want to alter behaviors, to create lasting, self-sustaining culture change, you will need to adjust the systemic forces and the incentive structures that drive them.
This is not something that is accomplished with evangelism. Instead, focus your efforts on aligning authority, incentives, and accountability in a way that results in secure products. The person responsible for delivering the product has to be the one responsible for that product’s security. Just as with functionality, quality, and on-time delivery, the ultimate responsibility for the security of the product rests with the one who has the power to make priority calls, allocate resources, set schedules, etc.
With some probability, this may not be the situation when you start. It may require concerted and continuous efforts on your part to put this structure in place and to maintain it over time.
Resolve externalities.
Wherever you see misalignment of authority and accountability — one person responsible for shipping the product on time, and another answerable for its security flaws — you have an externality. As long as security remains an externality, it will not get addressed. It is therefore crucial to identify and resolve such instances.
“An externality is the cost or benefit that affects a third party who did not choose to incur that cost or benefit.”
— https://en.wikipedia.org/wiki/Externality
To detect externalities pay attention to the risk acceptance process. When a risk is identified, consider who ends up making the decision whether to remediate or accept it. Is that the same person who will bear the consequences? Will they still be around? In environments with long production cycles, years can pass between a decision being made and its consequences becoming apparent. Is the team working on initial design and implementation also the one who will be handling post-release bug fixes? Are leadership responsibilities stable, or do people frequently change roles?
To resolve externalities in a corporate setting¹ I’ve relied on a fairly straightforward process. Looking at the company organizational structure, find the node(s) that makes the relevant decisions and the node(s) that pays the price for those decisions. Now find their lowest common ancestor. That’s the person where authority and accountability come together and the externality is resolved. The hierarchical nature of corporate structures ensures that a solution exists. In the extreme case, you have to go all the way up to the top. A decision that puts the project at risk should be made by the project lead. One that puts the business unit at risk should me made by business unit head. And if something puts the entire company at risk, it should be escalated to the CEO.
This doesn’t guarantee that they will make the decision you hope for. Or even the right one. But it will ensure that the right person makes the decision — that the person accepting the risk is the one who is accountable for the consequences.
Enable accountability by establishing measurable goals.
The person you identify may choose to keep the responsibility themselves, or to delegate it down. In either case, merely assigning accountability to the right person is not enough. You will need to enable them to make informed decisions and to effectively manage risks; to help them build mechanisms to enforce accountability for security within their organization.
How will they know what to do? More importantly, how will they know if they are doing it right, whether they are moving in the right direction and at a sufficient pace? They will need a feedback function — a set of metrics to guide them. This is closely related to the earlier discussion on how you choose to measure progress, but not quite the same.
Reservations about the immeasurable nature of security and about business and software metrics in general notwithstanding, leadership will need some way of knowing whether or not business objectives are being met. This is especially true for large organizations. You can’t just rely on tens of thousands of developers “doing the right thing.” You will need metrics and concrete goals.
Metrics can be categorized into two broad groups: predictive metrics and outcome metrics (alternatively referred to as leading and trailing, or bottom-up and top-down). As the name implies, outcome metrics measure the actual results — the outcomes. They are necessarily trailing. The results have to materialize to be measured. Outcome metrics are also the ones that ultimately matter to the business. In security that’s actualized harm — the extent of sustained losses. This is what was discussed in section 1.
Unfortunately, outcome metrics can have very long feedback delays. Years may pass between a risk being accepted (knowingly or not) and the resulting fallout. This makes them unsuitable for assessing progress over shorter time scales.
Because outcome metrics are only available after the fact, businesses develop predictive metrics that attempt to assess progress towards and likelihood of achieving intended results. Predictive metrics look at immediately measurable artifacts that are believed to be indicative of, or at least highly correlated with, desired outcomes. Predictive metrics in product security can include things like defect density, fuzzing coverage, time to patch, bug half-life, etc.² It will be up to you to figure out which predictive metrics best align with your definition of progress and what specific goals to set.
You will need to strike the balance in defining objectives that are accepted as achievable by the development organization but are also sufficiently ambitious in advancing the state of security. Make sure that progress is regularly reviewed by responsible parties. Be generous with acknowledging success, but also have a mechanism for drawing attention to areas of slower-than-expected progress and a way for taking corrective action. Even for predictive metrics, maintain the focus on impact and avoid measuring effort.
You won’t be able to do everything at once and your initial goals may be modest. Which is why you must build in an expectation that objectives will be re-evaluated and the bar raised with some regularity. This can be as frequently as quarterly at the start, leveling out at annual in steady state.
Finally, your job is not just to define and track these goals. You must share the responsibility and work with the development organization to execute on them.
Align security team’s goals with those of the product teams.
An adversarial mindset is essential in the technical security domain, but it has no place in how the security team treats developers. If not actively managed, the relationship between security and development teams can become charged and turn antagonistic. The security team begins to view developers as the source of vulnerabilities and fundamentally uninterested in, or worse yet — incapable of, improving security. Developers begin to see security teams as bearers of bad news who impose extra work and jeopardize product delivery. “Us vs them” mentality begins to set in.
You must not allow this. Fight it whenever you see it. Do not allow it to take root. Burn it with fire. Do not tolerate behind-developers’-backs commentary even in private. Pay attention to performance review feedback coming from product teams.
Make sure the security team sees themselves as being on the same side as developers. The two sides’ goals are fundamentally aligned. Both are optimizing for the price of the same stock. Both share the goal of making a successful product. Both fail if security requirements make the product too late or too expensive to succeed in the market.
3. Understand the Product Development and Deployment Processes.
If your work doesn’t end up in the product, it was wasted effort. Successful deployment of an SLDC requires a thorough understanding of the end-to-end product development and distribution process underlying it — from the twinkle in a product manager’s eye to the landfill.
You will need to figure out the processes for getting changes into the product. Where is the intake — how to get a new requirement submitted, how to get a bug filed? Who are the gatekeepers — who approves changes? What are the associated latencies — when are the requirement, design, and code changes locked down? Etc.
You will also need to know which code ends up in which product. This seemingly simple task that can be much harder than it appears.
Develop software supply chain traceability.
Finding and fixing vulnerabilities constitutes a significant potion of any SDLC. To know which code to analyze and where to apply the fixes, you will need end-to-end traceability of the software development flow.
You will need to know precisely how the code flows from developers’ keyboards to deployed products: IDE, revision control, build toolchain, test infrastructure, release processes, patching, and end-of-life decommissioning. What systems are used? How much variation is there between teams? How can you get access? Are you sure you are seeing everything? There is always a bypass mechanism, an exception route. The product must flow.
Documentation is a great starting point, but you will have to break through the Power Point abstraction layer. Go “native” to see for yourself. Access the source. File a bug. Trace it through its lifecycle. Check in a change. Download the release as a customer would.
All of your prior assumptions will be proven wrong under some set of conditions. Every single one. Even the ones you don’t think can possibly be wrong without violating basic laws of time and space. They are wrong, too. You will always underestimate the determination of the organization to ship product, whatever it takes³.
In the same way that fixing a software vulnerability requires deep technical understanding of how computers work below multiple layers of abstraction, fixing a software development process requires a similarly thorough understanding of the development and release flow. If you don’t fully understand the system, your solution is unlikely to be complete. You don’t need to know everything to begin, but you should strive to build out monitoring and auditing capability that covers everything as you progress.
As an illustrative example, I often turn to the case study of Mattel’s 2007 recall due to lead paint on its toys.
“In July 2007,[…] a European retailer had found evidence of toxic lead paint in Mattel toys. […] Lead paint, which had been outlawed in the US and Europe for decades, had found its way into an unknown number of Mattel toys made in China.”
Press coverage at the time portrayed a lightening response.. In retrospect, the narrative was likely heavily shaped by Mattel. Supposedly, Mattel was able to identify the culpable manufacturer within days of being alerted of the problem and issue a recall for all potentially tainted toys shortly after. I was awestruck by the speed and efficiency of the response. Especially considering that it involved physical goods and a globally distributed supply chain.
“Within a few days [Mattel] had identified the factory that produced the tainted toys, stopped production and launched an investigation to determine the scope of the problem. That investigation concluded by the end of July and by August 2 the company had alerted the public and begun taking back about 1.5m toys. Mattel then voluntarily expanded the scope of its investigation and issued two more recalls, one on August 14 and another on September 5.
To prevent any future lead paint issue, Mattel adopted a new test procedure where every production batch of every toy had to be tested before it could be released to go on sale.”
Subsequent analysis revealed a less aggressive response timeline. Turns out, it took Mattel about a month to trace back through their supply chain to identify the offending manufacturer and another month to trace forward to identify all tainted products and effected retailers to issue a recall.
But even this timeline is still remarkably impressive. How many technology companies could match this response time for a software issue back in 2007? How many can do it today?
Given a vulnerability report for a commercial product in the field, can you efficiently trace it back to the offending line of code in your source code repository? Can you trace it further back to the first revision where this issue was introduced? Accounting for copying and pasting, for variable renaming, for refactoring? Can you then trace it forward to identify all impacted branches and all impacted product variants? Accounting for that team that is reusing the binary they compiled years ago? Can you track that proper fixes are applied everywhere? Accounting for variation? Can you prevent regressions? Do you have a way of enforcing this? Do you have a way of identifying and notifying impacted downstream consumers? Can you do all of this before you retire? Can you scale to do it repeatedly, for dozens of issues every month?
Much of our team’s early work was focused on mapping out the product development process and building compensating controls on top of it. We used to have a multi-hour presentation on it that we called “Life of a Bug,” that traced a single vulnerability through the code-base, from original introduction until final patch. It was like stepping into a Total Perspective Vortex.
For when you are put into the Vortex you are given just one momentary glimpse of the entire unimaginable infinity of creation, and somewhere in it a tiny little marker, a microscopic dot on a microscopic dot, which says “You are here.’’
― Douglas Adams, The Restaurant at the End of the Universe
Leverage existing organizational machinery (processes, gates, etc.).
As you learn more about the product development flow and continue deploying your SDLC over it, you will try to add new process requirements, checkpoints, and gates. Introduce new gates gradually. Leave them open at first, just to enable visibility. Over time start enforcing invariants — clean static analysis, security bug SLAs, etc.
Done carelessly, new processes can trigger a strong auto-immune response from the organization. If you are working with mature development teams then they already have existing processes for getting products out the door. Leveraging the system already in place is both more effective and more efficient than attempting to build something to run alongside it. It is more effective because the existing system is already familiar to the organization (built-in acceptance) and it already intersects the product development flow in all the critical points. It is more efficient because a lot of the logistics of maintaining and enforcing your process requirements can be offloaded to existing support structures. There are already established quality gates, reviews, process enforcement, etc. Your best bet is to plug into and leverage all of that machinery.
Beware of process fatigue.
Control is a hell of a drug. Once you discover the ability to instrument the development process you will be constantly tempted to add more restrictions, more gates, more process. There must be some sort of process gravity at work. The more steps a process contains, the faster it accumulates additional steps. “You already have to do all this other stuff, what’s one more for a good cause.”
But process can’t make up for lack of responsibility (see section 2). People do what they are paid to do. And odds are that they paid to ship product, not follow process. So process will be bypassed if it stands in the way.
More importantly, process is overhead. It adds real cost (and pain) to developers’ work. With the right balance, the benefits far outweigh the costs. But keeping that balance requires ongoing vigilance. Process fatigue is real. If the pain gets too much, developers will find ways around your processes. The cost of the process is itself an externality. Only this time you are the beneficiary while someone else pays the price.
Don’t forget that the full form of the question “is it worth doing” is really “is it worth doing given the cost.” In my experience, process cost grows as a superlinear function of its complexity. Somewhat counterintuitively, the n-th step added to a process costs a lot more than the first one, even if they appear to take the same amount of time in isolation. As a result, all the little things — an extra field here, an extra approval step there — end up costing more than the sum of their parts. Unfortunately, the benefit curve rarely follows the same pattern and quickly falls behind.
Work to minimize the security process burden. Regularly review established procedures and trim away anything that is no longer providing sufficient returns to justify continued pain.
4. Be Patient.
Culture change can take a long time. A really long time. Be patient.
Build your network.
As I have stressed throughout this guide, deploying an SDLC is a cultural activity. You need to get to know the people whose culture you are attempting to influence. You have to understand the power structures and information flows within the organization.
There is no substitute for ethnographic field work when learning about a culture. Join product teams’ mailing lists (or discord, or slack, or whatever else kids are using these days). Talk to people. Lots of people. Build real connections. Be genuinely honest with them and earn their trust and respect so that they can be honest with you. Interpersonal connections and organizational reputation will be essential to your success.
During the early years of our product security initiative, whenever I had a meeting in a different building⁴, I would arrive early and stay late just so that I could roam the halls and chat with developers that I knew there. These conversations where invaluable in helping me understand the inner workings of the organization and learn the lessons outlined above.
Look for paths of least resistance.
Your ambitions will likely exceed the organization’s willingness (or capacity) to act. You probably won’t be able to do everything you want to do. At least not all at once. Don’t get discouraged.
Not everything you attempt will succeed, so try a variety of approaches and look for paths of least resistance. Leverage your network to run pilots with the friendlier teams to see how your ideas fare in practice. Many initiatives look brilliant on paper but quickly go sideways once real people are introduced.
“No plan survives contact with the enemy.” — Helmuth von Moltke
Some initiatives will go fast, some slow. Some will succeed, but many will fail. Be prepared to adapt. There’s always more than one path forward.
Don’t let crises go to waste.
One day “The Incident” will happen. That bad thing you kept saying would happen and now it did and the organization was not prepared. Don’t let such crises go to waste.
You can gloat or you can get results, but not both. Choose wisely. Don’t point fingers. Don’t say “I told you so.” Leverage the opportunity to make as much progress as you can. The organization is suddenly motivated to act and more willing to commit resources to improving security. Help them do it.
Take the long view.
I occasionally joke that in our line of work we must be prepared to fail repeatedly, but maintain the d̶e̶l̶u̶s̶i̶o̶n hope that some day we will succeed. This, of course, is just a bit of gallows humor. If it rings true, it’s because day-to-day challenges may feel never-ending. But such focus on the immediate can obscure longer-term trends. It’s important to periodically pause and take the long view back to see how far you have come. The steady improvement decade over decade is unmistakable. Yes, decade. Like I said, culture change takes a long time. Be patient.
[1] This only covers “internal” externalities, i.e. ones where both the cost and the benefit are borne by people within the organization. Addressing externalities between the corporation and the outside world is a separate topic and one that is unlikely to be resolved from within. Getting your employer to act against the best interests of its shareholders is arguably not part of your job description.
[2] It can be easy to mistake these for outcome metrics. After all, they reflect the outcome of many SDLC activities. But they are not goals in and of themselves. Ask yourself the question: if that’s the only thing that changes, who outside of the security team would be able to observe the effect and how? These may all great things to improve, but none of them matter if they don’t lead to harm reduction in the end. The real outcome metrics are observable to your customers and shareholders.
[3] I thought of sharing some stories, but I don’t have enough liquor in the house.
[4] This is obviously easier to do when everyone is physically co-located on the same campus.
Cross-posted on LinkedIn.