Saturday, July 13, 2013

Book ends

Update: This project was listed on two of my favorite blogs: 

This weekend I wanted to make some book ends so that I don't have to stack books on my desk. This is a few hours later. This article eventually has a tie-in to additive manufacturing and 3D printing. Skip to the end to avoid musings about reductive manufacturing, square holes, shades of orange, and early information exchange.

Six of the twelve hours making this were spent trying and failing to cut square holes accurately for the mortise and tenon joints. The back of the tenon shows with my design, so any gaps would be visible. The trick ended up being to mark the hole out with a razor blade, then patiently drilling large straight holes. I learned this by marking the hole with a broken pencil and impatiently drilling small holes. That didn't turn out so well. I then used a coping saw to remove the bulk of the remaining material and then did final fitting by carefully pushing a sharp chisel into the hole. There are attachments that convert a drill press into a mortise machine, but I've heard mixed reviews about them.

The wood is Osage Orange. It's a domestic wood that originated in Texas and was transplanted throughout the rest of the United States; usually as a wind break. The fruit it produces is spherical, about the size of an orange, and green. It smells like citrus but isn't edible except by some animals. The juice produced by the fruit is a natural bug repellent. For this reason, when I see them in the woods I'll usually throw one in my pack. This study even claims it's more effective than DEET. This was also the wood of choice for bow making by Native American bowyers. It has excellent compression and tensile strengths. The downside is that it's difficult to work with and sections that have enough straight grain to make a bow with are relatively rare. I found this board at a local hard woods retailer. When I saw the knot hole I thought it would make an interesting project and bought it on a whim. I think the knot hole looks like a leaf. This project is a lighter color now, but after a few years of exposure to sunlight from a window it will mature into a dark burnt orange. On that journey it will pass through the electromagnetic spectrum of a few of my favorite colors.

The joints are mechanical. They don't need glue to hold together. I cut into the end of the tenon before placing it in the hole and then pounded a shim that I made from a darker wood (zebra wood) into the cut. This type of joinery can sometimes be found on very old furniture. I'm not sure if it's because it was a common technique or if it's because furniture made this way lasted longer. Probably both. Natural selection would tend to favour furniture that holds together after the glue ages enough to be ineffective. Especially given the performance characteristics of early glues. Most glues used to be made from animal skins boiled down; horses being the most famous example and salmon skins probably being the most effective. Joinery depending on these glues could be taken apart by heating or immersion in water.

Mechanical joinery done by early furniture makers has been interesting to me for a while. I love seeing old pieces of furniture that employ this and I try to imagine if the creator re-invented the technique on their own, studied under a master, or learned of it while chatting somebody up at the local general store, church, or saloon. We take information for granted now, but the finer points of engineering weren't always so available to us.   

Here is a closer view of a joint.

I like that the detailed joinery and knot hole differentiate this piece from something that could be just be purchased. There's no mistaking this as hand-made.

The book supports slide independently and rely on just friction to stay in place. The gap between them is 3 inches and the platform has an overall width of 6 inches. This is small enough to hold the smallest book in my desktop library, my regular expressions reference, and wide enough to stabalize the biggest book that would sit on my desk, Edward Tufte's Beautiful Evidence. The design was inspired by this video by Steve Ramsey who posts weekly on youtube as Wood Working for Mere Mortals. The various curves were traced from French Curve templates I bought at a local art store. Marc Spagnuolo on The Wood Whisperer uses this trick to make some nice looking projects.

Here is the 3D printing / additive manufacturing tie-in I promised: An aspiration of additive manufacturing enthusiasts is that they often dream of printing objects larger than the dimensional capabilities of their tools. The trend has been to print pieces of the end results and assemble them. Those pieces can then be glued, screwed, or bolted together. I think there's room to use some of the joinery techniques from wood working to create even stronger and more robust projects. We have a few millennia of designs to pull ideas from, including one carpenter even more famous than Norm Abram. 

The capture of this knowledge is taking place in our current generation. We are mining the past and dreaming in the present and capturing it all in ones and zeros that will persist in machine readable format until the eventual collapse of the universe. With that capture and optimization of knowledge I think that eventually we'll see better software tools that help us design (or design for us) when given an intended use case for a physical object. But that's a rabbit hole for a future post: where does innovation and craftsmanship go when computers can tell us what an optimal design is? If you've ever seen the movie Idiocracy you understand why this thought gives me pause. Monetarily wealthy societies trend towards vanity (i.e. the "first world problems" meme). Information wealth is more often than not evenly distributed. Information is most valuable when it is shared freely. I don't say this lightly or on a whim. Any information that I can't search and make use of is essentially useless. If I need to go through some sort of gatekeeper arbitrating ideas, it raises the bar to finding useful knowledge often to the point that I won't even bother.  While exploring alternative ideas on a project I'll keep my interrogatives to freely available information and trust that the best solution has been made free at some point and is in a form that I can evaluate and verify to make my decision. See my post on Secret Sauce Algorithms for more discussion than that.  I think engineering will evolve in the context of a wide availability of automated solutions design. It will become something we can't anticipate right now creating things we can't imagine yet. In any case, it's an interesting time to be alive. Maybe this went too deep for a general audience ...but it's my personal blog so deal with it.

Wednesday, July 10, 2013

How good schemas go bad

I've advocated in the past on this blog for extensible formats such as JSON, XML, and even CSV when exchanging data for analysis. As a contrast to that, I know that it's sometimes nice to have data loaded into a database management system. They are a convenient way to run generic queries and interact with the data in a responsive way. Because it's convenient and functionally flexible it would seem like a great idea to try and use that database management system to handle a cross organization exchange of data. However using exported databases for data exchange can cause some unexpected problems. They aren't necessarily problems with the technology alone, its when human nature and project time constraints collide with exchanging data in those formats when the most of the problems arise.

Here's a scenario: You are a data analyst working at organization A which has invested money in Oracle or MSSQL or SAS or whatever. There is investment in both licenses and time in training people to be good with that certain database technology, it has probably even driven hiring practices for a few years. You're required to receive a data set from Organization B which has been provided as an export from a different database technology. The first thing you're going to have to do is to import that data. You'll be doing this with personnel that are not intimately familiar with the native data types or the functionality of the original database. This can lead to a quagmire of problems itself, but here's the real kicker, different database technologies have similar but sometimes different data types. You''ll have to adjust schemas to create the tables to ingest the data. This is usually a smooth process the first time it happens with a data set. These changes always seem to trend toward selecting larger and larger data types. The second or third sharing of the same information and the slight adjustments that are made to schemas between databases will lead to less storage and access efficiency.
To describe how and why this happens, first a high-level side-bar discussion on how data is stored in a database. In order to ensure efficiency of access database management systems will typically ask you to specify the amount of space each bit of data will potentially take up in the database. This ensures everything can be stored in-line and accessed quickly since traversal of the data stored on disk can be done in known increments as opposed to having to determine the size of every single record traversed. It also enables the fast calculation of arbitrary locations. Want to get to the middle of the data table (the first action in a binary sort computation) just take the number of rows, divide by two, multiply by the known cell sizes (thanks to our defined schema) and you know where to start. There are circumstances in which databases can store arbitrarily large data and not be confined to a limit defined in the table schema. But they are stored external to the tables and are very inefficient when returning multiple rows, these are the CLOB and BLOB data types. Since the data is stored separate from the indexes it can be as big as is needed. However conducting operations on arbitrarily large data in a database is much more time consuming and best avoided whenever possible.

Back to the problems associated with sharing data. Let's say that database A has a varchar data type (short for variable length character data) and the table being shared has a varchar cell size that is larger than the database you're trying to import the data into. In this case, the easy and lazy thing to do is to figure out whatever data type on the new database will fit the largest possible value in the data as defined by this schema. If it's a varchar 4000, the next size up on most databases has you into CLOBS. This transition is the most problematic for efficiency. But you can see how through these types of compromises, as data is shared between different database technologies how schemas can become skewed as data types are changed to accommodate schemas from different databases.

The way to avoid this is to use schema neutral and extensible data sharing formats. But, aside from just being schema neutral during transfer, what they really do is require the humans doing the ingest to conduct a sort of study of the data before they ingest it into their own database technology. There's no 4000 character limit assigned by the schema that may or may not exist in the data. If you want to know what the maximum record size is, you need to take a look at the records, find the biggest one and then create your tables with that in mind. You're forced to find the optimal data storage schema for the data. In a lot of cases, the schema for the population data set is dramatically different than the data being shared. Typically organizations don't share entire data stores, just the results of queries that return subsets of the data.

Other side effects avoided by sharing in extensible formats include the following. Ingest routines for database export tables are often brittle processes. It doesn't take much of an error to cause a catastrophic failure that means you don't have access to any of the data they are trying to share. With extensible formats, if somebody cuts off the transfer half way through or random storage errors sneak in, it's not a problem. You lose a few records, but still have something to work with. I can say with first hand experience and after reaching to the 30 or so DBAs that I know that there are no good forensics tools for some proprietary database export files. If it's corrupt, you're done unless you want to spend the money reverse engineering the binary file structure. In some cases going back to the originating organization is prohibitively difficult or impossible. In the best case scenario your project schedule slips by a few days, still a heavy cost to pay. Another avoided problem is the transfer of application logic. It needs to be explicitly done outside of the data store. Whether it be a human readable export or an actual human typing it into a document. There's no assumption that relationships in the data are represented in the data store. Even though this is rare with database technologies, it always seems like people assume that when you export a database the application logic just goes with it. Again, human error.

Extensible formats are always the best means for sharing data across organizations with heterogeneous information systems. But if you do receive data in a non-extensible format, don't assume the schema is descriptive of the data before you start adjusting it to your preferred database technology. Take the time to analyze and ingest in a sensible manner. If you avoid this step you're pushing the problem to the next person and dooming yourself with sub-optimal performance on your own systems.

The Criminal

Monday, July 1, 2013

Hate PowerPoint

PowerPoint is almost universally abhorred as a presentation medium. However, most of the critiques leveraged against it apply to most other slide based presentation systems (Keynote, Google Docs, Impress, etc...).

Here is my list of "big thoughts" on the topic (aka required reading):

Edward Tufte

Peter Norvig

Gen McChrystal, Gen Mattis, and others

Finding an alternative is tricky. Especially when most presentation forums require you to provide or use "slides". More on that in a later blog post. I tried Prezi for a while, but using cloud services for proprietary or sensitive information sometimes isn't an option. Plus with non-linear presentation mediums like Prezi there's always the chance of parts of your audience getting motion sickness during the presentation (this happened once while I was using Prezi during a small group meeting). Do you really want your audience to remember their lurching stomach when they think back to your presentation? Me neither. So... for now... PowerPoint it is in most cases. 

Whenever possible, I at least try to push out a document read-ahead in advance of any important presentation. This trick can help those who care the most about the topic benefit as much as possible from the presentation. Having the document stand alone as a capture of the information seems to be more beneficial to me than disjointed text in the notes section of each slide. It also makes re-use of the information at a later time easier.