Sunday, August 24, 2014

Fish inlay trivets or small cutting boards

This last week I had a chance to visit my wife's grandfather's wood shop and work for a morning. I decided to try an experiment with wood from his scrap bin and this is the result.

I started with a piece of 1" maple board 20" long which I cut roughly in half to make two pieces. Then I hand drew a fish made of two smooth arcs on each piece. I cut one of the arcs as smoothly as I could on the band saw. Smooth and straight was more important than accurate, so I drifted a bit from my original lines rather than making sharp corrections; this was important. It's critical to have a smooth curve right off the cut so everything fits together well.

On the table saw I ripped some thin strips of purple heart, roughly 1/8" thick and a little wider than my maple boards were thick (so just over 1"). I briefly soaked the purple heart in water and then did a test fit by clamping between the two pieces of maple I cut on the band saw. The purple heart didn't crack, so I repeated the clamping process with lots of glue and more clamps as tight as I could get them.

While one piece was drying, I cut the arc on the other piece and did the test fit and glue up. By this time the first arc on the first piece was dry enough. I took a hand plane and cut the purple heart inlay flush with the face of the maple and trimmed the excess ends off on the bad saw.

It wasn't very pretty at this stage, the maple was covered in wood glue and wasn't clamped together perfectly straight, but it was flat enough to be stable when I cut the second arc on the band saw to complete the fish design. I then glued it back together with the same process and purple heart inlay. When I glued the second inlay I referenced off where the pieces cross near the tail of the fish. I didn't care if the edges of the board lined up (they didn't), but I wanted this little section to all line up. My inlay was wider than the kerf of the band saw, so I couldn't have more intersections of the inlays and actually have them line up. That's why the mouth of each fish is open, it wouldn't line up properly without a much thinner inlay to match the thin kerf of the band saw.

With the final glue-up dry, I squared it off using the mitre saw and table saw. Ran it through the planer to clean the faces off. I could have sanded them flat, but why bother with a 220v 15" planer handy! Rounded the corners off with the router. Then finally sanded a bit to clean it up. I coated them with some food safe mineral oil since I plan to use them in the kitchen.

This was beach week vacation and I had been drinking corona light on the beach and while fishing. I intended these as small cutting boards for limes or cheese. I gave one to my wife's grandfather and the other I took home to use in my kitchen. They will work just as well as Trivets. Since the wood goes all the way through, it would be trivial to re-sand and finish them if they were ever to get burnt or scuffed.

When I get home, I plan to use some dark exotic hardwood and maple inlays to make an inverse board to compliment this one. I have the wood to do it now, I took a pile scraps home with me to work with. I plan to do a future post showing the cuts and glue-ups I described above.

Pictures:




Monday, July 28, 2014

Creating users in a deployment script

The simple and lazy thing to do when creating a user in a deployment script is to throw a plain text password in the script. Avoid this temptation.

Here's a better way, which generates a random password and stores it in /root/ of the provisioned machine in case you need it. The major problem this avoids is that this software can now be safely made public or stored on github without concern to exposing credentials.

# create user
sudo apt-get -y install makepasswd
PASSWORD=`cat /dev/urandom | head -n 1 | base64 | fold -w 10 | head -n 1`
echo $PASSWORD | sudo tee /root/tangelo_password.txt
passhash=$(sudo makepasswd --clearfrom=/root/tangelo_password.txt --crypt-md5 |awk '{print $2}')
sudo useradd theusername -m -p $passhash
sudo usermod -s /bin/bash theusername
# end create user

Saturday, July 19, 2014

Floating shelves

This weekend's project was floating shelves. My wife found inspiration for these on pinterest. I made two sets of three, one set for the living room and one set for our bedroom. Here are pictures of the end results.




The construction was fairly straightforward table saw work. The edges were cut from a normal pine 2x4 you can get at the big box stores for 2$. I used two of them for this project. The top and bottom was made from 1/4 inch sanded plywood. I used a 4x4 ft section of plywood for all of these shelves and had some left over. The shelves vary in length but are all 6 inches deep from wall to edge.

I cut the 2x4s roughly to length on the table saw, then used my joiner on two sides of each piece to make them straight, then ripped it on the table saw to size using the joined edges as reference. Here's a video from somebody else (The Wood Whisperer) on how and why to do that.

Before cutting the corners to fit together, I cut the rabbets for the plywood with two cuts on the table saw. The rabbets are the grooves that allow the plywood to sit flush with the top of the edge board.

I then mitered the corners to fit together at 45 degrees, cut the plywood to fit for each shelf, and then glued them together. The plywood was glued everywhere it touches the edges. The miter corners are also glued together. Instead of clamps I used a pneumatic brad nailer and 5/8 inch 18 gauge brads on the top and bottom. I did not shoot any nails into the corners because I wanted to do the final bevel and routing after assembly.

After the glue set (an hour or so) I cut the edge bevel and ran the router along the bottom edge. The brad holes and any other gaps were filled in with wood putty and then sanded. If I was going to stain these instead of painting them, I probably would have used clamps instead of brad nails to avoid the holes and putty.  Here are pictures of how the plywood sits in the rabbets.





I used a micrometer to measure for brackets to hold the shelves to the wall. The brackets are just a chunk of wood that fits exactly inside the shelves top to bottom. I cut them to have some slack left to right to make assembly easier. When the shelves are installed these are completely hidden. Here is a picture of a bracket fitting inside of the shelf.


I took the brackets, a pencil, and a stud finder into the house where the shelves would hang. I marked the stud locations on the brackets then pre-drilled and counter-sunk some holes in them. Since the brackets are hidden I was free to use any size screw or bolt to secure them to the wall. Three inch deck screws into the wall studs worked fine for me. Drywall expansion fasteners would work great too.




When I screwed the brackets to the wall, I made sure they were level.



Then it was just a matter of sliding the painted shelf onto the bracket, pre-drilling a hole, and sinking a screw to hold them on.






With the brackets solidly mounted to the wall, I think these shelves will break apart before they ever droop or fall down. I hate droopy floating shelves and like this system more than the floating shelves you can buy. If I really want it to look clean, I might paint the screws, but for now they'll be hidden behind picture frames. The overall cost of construction was less than 5$ per shelf and all six only took three or so hours total construction time.




Monday, July 7, 2014

curation is not preparation

I was on an email thread where somebody mentioned spending 80% of their time on data curation. I wrote this in response.

I think there's a semantic error we're drifting into. I think the original thread below mixes up the activity of curation and data preparation. Data prep does take a lot of time, but curation is a different thing. I'll try to describe what I'm thinking here. Curation is more of an act of protecting integrity for later use, analysis, or reproducibility. Like a museum curates. The tools for data preparation are awesome and very useful, but move beyond what is just a curation activity which I'd describe as something more fundamental.


data curation != data preparation

Provenance is always paramount when making decisions off of data. That's why trustworthy persistent storage of the raw data is most important. Any transform should not only be documented but should be reproducible.  Canonical elements of a curation system would be the raw data storage (original formats and as attained), transform code, and documentation.

Enabling discovery (locating, standardized access protocols, etc...) starts getting into something beyond curation. The connotation of curation implies preserving the scientific integrity of the data. Like dinosaur bones or artifacts being curated in a museum. Some of them are on display (available via "standardized protocols" on display), but the rest are tucked away safely in a manner that doesn't taint later analysis of them. More often than not, the bones on display are actually semi-faithful reproductions of the artifacts rather than the original artifact. Same thing with data. The graph visualization (or whatever) of the data might not be technically still the same data (different format, projection, normalized, transforms, indexes, geocoding, etc...) but it's a faithful reproduction of it that we put in the display case to educate others about the data. Like a fiberglass T-rex skull tells us a lot about the real thing, it's not meant for nuanced scientific analysis. All transforms of data, especially big data, contain an element of risk and loss of fidelity which would taint later analysis. We're all so bad at transforms that we avoid using them in cases where a life is at risk (court proceedings or military intelligence analysis) processes require citation of raw data sets. A geocoding API rarely assigns location with 100% accuracy (it's usually an exception when they do), sometimes when we do things like normalize phone numbers there's an edge case that the regular expressions don't account for, things can go wrong in an unlimited number of ways (and have... I've caught citogenesis happening within military intelligence analysis several times). The only way to spot these problems later and preserve integrity of the data is to store it in it's most raw form. If we wish to provide access for others to a projection they want to build off, the best way to do it would be to share the raw, the transform code, and a document showing the steps to get to the projection. In the email below this behavior of later analysts/scientists is noted (with disdain?). It shouldn't take long to look at previous transforms and reproduce the results, if it does then those transforms weren't that reliable anyway. If those receiving the data just want to look at a plastic dinosaur skull to get an idea of it's size and gape in wonder, then sharing data projections (raw data that has undergone a transform) is fine.

When providing curated data for research or analysis, I even make it a point to keep a "control copy" of the data in an inaccessible location. That way if there is a significant finding there is a reliable way to determine that it's not because the data became tainted during an inadvertent write operation on it.

On the other end of the spectrum (which I see all the time) is the unmarked hard drive with random duplicative junk dumped on it as an afterthought to whatever project the data originated from. Although "no" is never an answer when handed something like this and certainly useful analysis can be achieved, this is below the minimum. I imagine it's like being handed evidence for a murder case in a plastic shopping bag with the previous investigator's half-eaten burrito smeared all over it.  You can sometimes make the case for a decision with it, but it's not easy and it's a dirty job trying to make sense of the mess. This is probably more the norm when it comes to "sharing" curated data in government and industry. It's ugly.

The minimum set of things needed for useful data curation:
1. raw data store (just the facts, stored in a high integrity manner)
2. revision control for transform code (transform code, applications, etc...)
3. documentation (how to use transforms, why, provenance information)

Everything beyond this could certainly be useful (enabling easier transforms, discovery APIs, web services, file transfer protocol interfaces), but is beyond the minimum for curation. Without these three basic things, useful curation breaks.

Sunday, June 22, 2014

Entomological enclosure (aka "bug house")

I took my son fishing Saturday morning. We didn't catch anything, but Colton enjoyed looking for bugs around the pond. When we got back to the house it was time for his afternoon nap. After he was settled down I wandered out to the garage and made this:


It's just a piece of screen and some scrap lumber. I cut everything to size on the table saw then put the frame together using brad nails and glue. The screen is folded over at the edges and stapled. The door hinge is a piece of leather and the door is held shut by a neodymium magnet that attracts the screw holding on the door knob. A light coat of danish oil finished it off. The entire project only took about 30 minutes.

In the image below, you can see the magnet countersunk into the frame. I drilled the hole to be snug and coated the inside of it with super glue before pressing in the magnet. I've done this before on several projects. The metal holding the knob on actually ended up working great and was more than enough for the magnet to attract.

There is a little green praying mantis in there. As I was putting in the last staples and testing the door in my garage this bug jumped in. I saw it all happen and just stood there in stunned silence. There aren't words I know of for coincidences this crazy. I could have searched my yard for hours and not found a bug this cool to show my son and here he is jumping over my shoulder and into our little bug cage just as I finish it. Perfect.


Here's a video of the enclosure in action. When my son woke up we studied and talked about the bug for a while and then went outside to try and let him go.



Saturday, June 21, 2014

Converting videos for Roku with Linux

I bought a Roku 3 and I wanted to watch some video files on it. Unfortunately it only supports a few video formats. Maybe this is a good thing as long as it does them well. Conversions are pretty easy using ffmpeg or avconv. An Internet search didn't turn up many results on how to do this, so here's a script I made to make my life easier after I figured out which codec works best.

#!/bin/bash
avconv -i "$1" -strict experimental -c:v libx264 "`echo $1 | sed s/\.[^\.]*$//`.mp4"

To use it, put this in a file called roku-convert.sh and chmod +x
Then use it like this:
./roku-convert.sh video_to_convert.avi

It will create an .mp4 from any formats that your avconv install supports. The .mp4 can be placed on a usb thumb drive and plugged into the side of your Roku 3. Find and play it using the Roku Media Player (download for free from the Roku channel store). Here's what it looks like:




Sunday, June 8, 2014

Forge update - gas adaption

When I woke up this morning I didn't know I was going to be converting my coal forge to gas. While on a family trip to Home Depot, I snatched my son up and we split away from my wife and daughter. I did my typical run around to see if anything good was on sale, look at esoteric stuff, and maybe find inspiration for a new weekend garage project.

I was never enthusiastic about playing with flammable gasses, but another project (you'll see in a future post) had me learning how to use Acetylene and Oxygen cutting tools. Acetylene and Oxygen is a potent mix fraught with all kinds of danger. After learning about and circumventing all of the ways to kill myself with it, I'm simply amazed more people don't get hurt using it. By comparison propane seems tame, so when I ran across some refractory cement at Home Depot it didn't seem like too much of a bad idea to build that gas powered forge I've been planning for a few years. Earlier in the week Harbor Freight had a sale on propane torches and I picked one up for next to nothing.

This propane forge is nothing more than a fire-proof container with a hole in the side where you can drive in fire. The pressure of the propane leaving the nozzle sucks air in via the Venturi effect. Here's a cool video by Mathias Wandell about the Venturi effect. Some forges have active air (like a supercharger) driving air by fan, but it's not really necessary. Especially for a back yard forge. The container keeps the heat in allowing you to approach the 3000 or so degree burning temp of propane in air over a wider area.

I created the fire-proof container by cutting/bending some steel mesh to fit inside of a paint can and packing the mesh with the refractory cement. I just used a pair of rubber gloves to smash it all in there and smooth it. Here's a picture of the process.
I cut a hole in the side and then very carefully welded in a piece of square pipe large enough to hold the torch tip. Another set of pictures showing how that works:

The front view of the forge after the first burn:

I scavanged a piece of 1/2" angle iron from a bed frame were throwing out. I cut that up and welded it together to create the stand which the paint can sits on. I sized it so that it fits on top of my coal forge, effectively using my brake disc coal forge as a heavy stable stand. Here's a picture of the whole setup sans propane. I took these pictures after I had already put away the propane for the day.


Before this forge will be truly usable, I'll have to figure out how to narrow the front opening. I'll probably use fire bricks, but not sure yet. I'm out of time this weekend.







Monday, May 5, 2014

12 days, Strata, and a fallen Marine

This is a post I wrote while on a plane in February. I'm just now getting around to posting it.

--

I spent last week at the Strata conference. It was a good time, saw some friends and soaked in some good presentations and discussions. James Burke gave a clever keynote on the last day. It was my favorite, here's a quote: "I'm an optimist, because to be a pessimist means to withdraw and not be involved. I like to be involved." I can relate to that, pessimism reeks of hopelessness and inaction. One of my previous Marine Corps commanders called me "disgustingly optimistic". 

Meanwhile, back in Virginia my very pregnant wife and her mother were digging out from under a foot of fresh snow. Despite that, I was optimistic about making it home on time and headed off to the airport early the next morning. At the airport the ticket attendant greeted me with a smile (she must be an optimist too ...or diligent service professional). The smile faded from her face when she looked down at her console. I knew before the words left her mouth that I wouldn't be getting home this day. Although every runway in the country was functional, the cancelled flights had a cascading effect and had knocked one of my connecting flights from the schedule. I cursed bad computer algorithms. For an hour I stared at the attendant while she made phone calls and typed on her computer, attempting to meddle with other people's schedules. She was able to determine that my choices were to spend another night in silicon valley or get to Denver and try my luck there. My disappointment in algorithms dissipated as I realized even the best flight scheduling system would fail with this amount of random intervention from empowered ticket attendants modifying it's solutions.

Denver is the home town of a fallen Marine I served with in Iraq. Walking through the airport, on the way to my flight, I was undecided if I should attempt to check in on his parents during my time in town. I wasn't sure how long I'd be there and it had been 5 years since I had visited them last. As though speaking directly to me the paging speakers in the airport kicked up and announced the name "Vandegrift". Apparently someone with this last name had missed a flight. They probably didn't hear the message. But I heard it; Matt Vandegrift was the fallen Marine I was thinking of and I was just knocked out of my fog of indecision. I was going to see the Vandegrifts. I sent his father an email from my smart phone in the jetway as I boarded my flight.

On the flight I popped open my trusty Dell XPS12. I had an interesting parsing problem on my work to-do list and some sample data to work with. After getting stuck and unable to access the Internet I ended up going through some of my old code looking for solutions I'd written previously. Unsuccessful, I started playing with one of my favorite pieces of code. twelve.c is 15 lines of c code that outputs all 64 lines of the 12 days of Christmas. I didn't write it, but I enjoy it; it's an old ioccc entry. I love pointing to this competition when people talk about static code analysis in the context of malicious developers. Good luck with that.

 #include <stdio.h>
 main(t,_,a) char *a; { return!0<t?t<3?main(-79,-13,a+main(-87,1-_,main(-86,0,a+1)+a)):
 1,t<_?main(t+1,_,a):3,main(-94,-27+t,a)&&t==2?_<13?
 main(2,_+1,"%s %d %d\n"):9:16:t<0?t<-72?main(_,t,
"@n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l+,/n{n+,/+#n+,/#\
;#q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l \
q#'+d'K#!/+k#;q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw' i;#\
){nl]!/n{n#'; r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#n'wk nw' \
iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c \
;;{nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;#'rdq#w! nr'/ ') }+}{rl#'{n' ')# \
}'+}##(!!/")
:t<-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a=='/')+t,_,a+1)
:0<t?main(2,2,"%s"):*a=='/'||main(0,main(-61,*a,
"!ek;dc i@bK'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m .vpbks,fxntdCeghiry"),a+1);
}



The 12 days of Christmas is pretty repetitive prose (see what i did there?). Repetitive text lends itself well to compression tricks. The 64 lines of the song are generated by the above 15 lines of code, about 75% compression, I should easily be able to write something that could beat that if I were just going for compression and not obfuscation. Doing it without the aid of the Internet made it fun enough of a game to burn some time. Here's what I came up with.

#!/bin/bash
b64="H4sICPFD/lIAA291dHB1dC50eHQA5ZXdcYQwDITfqWILuKSBPKaAq0HBAm\
viH8YyR+g+MtfAOZNkLpcXQGbX+lbAcE6onjFJ0QpHO/KEV19EayRF3FHLygj5w\
pjJDjUj8kBYqNQibmZIgpVMxaTMz8Nwvm6pPObkbt+zbhl1LTUwnN3Sgcx9S6Pq\
pfT08ebGVDiNHp6TnvDl1lM2l7+9d9NjpBAkzXgz7Nb8O3lk6sKRVufgUAxIX36\
cT+Wjh8/kmJmVQU+BdoM64feZ+cKpi7oZoBslNW7dJMZGfh9pWOaeLCb3FZHEtS\
xRwvuBfdcRk3Q9LpPbIjlhNVMaD+i/mLv2vaYmt7VyBAy8HDkeZhYcer/aqwOLL\
Fy0nQ7whx5S3Th0/S+awQpX1hjblI6La/x/NL1PBJqPqzMJAAA="
echo -n $b64 | base64 -d | gunzip | cat


The original code has 840 characters in it, my in-flight solution has 495.
The version of 12 days of Christmas that they both print out has 2355 characters. It would have been nice to fit the whole thing into some code I could tweet, 140 characters. Not possible with direct methods, but that would have been interesting. There are a few minor optimizations I could have done, but nothing dramatic. This is probably close to as tight as it gets. If this is interesting to you, check out this great Dr Dobbs article about a long standing challenge to compress the random numbers generated by Rand in the 1940s.

To find out just how much of the text I could theoretically tweet, I wrote a small program to chart how many characters my approach uses for the first n lines of the text. The program just grabbed the first N lines and then implemented the method and output how many characters it ended up being. The red line shows how the 64 lines of the song eventually add up to the whole 2355 characters. The blue line is my algorithm.



The lines cross down in the left corner of that chart. That's where the benefit from compression overcomes the overhead of the surrounding code. Here's a close up of that crossover.

Unfortunately the crossover takes place when we're already past 140 characters. Meaning that any code you could tweet with this algorithm would be longer than just tweeting the plain text. Not cool. I was bummed when I saw this. It seemed like such a clever idea a few moments ago. Before I was able to come up with a different approach, the attendant told me I had to power down my laptop.

When I powered on my phone a message was waiting from the Vandegrifts. They'd love to see me and that I should let them know when to pick me up from the Airport. The ticket counter informed me that I wouldn't be flying out until the next day so I ended up spending the night and eating dinner with the Vandegrifts and talking about their son. We also spoke about survivor guilt. Matt's father gave me a copy of Viktor Frankl's book "Man's Search For Meaning". I read it on my flight from Denver to DC. It chronicled Viktor himself and his survival through a series of Nazi death camps.

Viktor writes that the "best of us" died in the concentration camps. Those not immediately killed by chance sacrificed themselves by standing up to the guards or "Capos" (privileged inmates). The worst kind of prisoners prospered and survived. That stuck out in my mind, because it truly shows some of the weight that those that survived made. They were often plagued by the gnawing suspicion that they may have survived because they were willing to compromise their ethics; even if it's not true. Life in a Nazi death camp was different than imprisonment by the Japanese like Zamperini endured as chronicled in the recent book Unboken. I finished Unbroken a few weeks after it was released and highly recommend it. Both stories, Unbroken and Man's Search for Meaning are very moving tales.

In Unbroken, Zamperini discusses his life after freedom. It's best summarized in his letter to his most cruel tormentor while in prison camp. Here's a video of Zamprini reading it. And the text:

"To Mutsuhiro Watanabe,

As a result of my prisoner war experience under your unwarranted and unreasonable punishment, my post-war life became a nightmare. It was not so much due to the pain and suffering as it was the tension of stress and humiliation that caused me to hate with a vengeance.

Under your discipline, my rights, not only as a prisoner of war but also as a human being, were stripped from me. It was a struggle to maintain enough dignity and hope to live until the war's end.

The post-war nightmares caused my life to crumble, but thanks to a confrontation with God through the evangelist Billy Graham, I committed my life to Christ. Love has replaced the hate I had for you. Christ said, "Forgive your enemies and pray for them."

As you probably know, I returned to Japan in 1952 and was graciously allowed to address all the Japanese war criminals at Sugamo Prison… I asked then about you, and was told that you probably had committed Hara Kiri, which I was sad to hear. At that moment, like the others, I also forgave you and now would hope that you would also become a Christian.

Louis Zamperini"

Viktor takes on a similar sentiment. He wrote in "Man's Search for Meaning" that "first time in my life I saw the truth as it is set into song by so many poets, proclaimed as the final wisdom by so many thinkers. The truth – that love is the ultimate and the highest goal to which man can aspire. Then I grasped the meaning of the greatest secret that human poetry and human thought and belief have to impart: The salvation of man is through love and in love. I understood how a man who has nothing left in this world still may know bliss"

Those that have ever been present for the death of another, especially from trauma, are destined to relive the events. To wonder "what if" a million times over. What if we wrapped the bandages a different way. What if we missed something that could have cued us to what was about to happen. What if we took a different road. What if.

I don't have a good answer for any of the questions or the weight that comes after. But I do believe both Zamperini and Viktor's messages. Regardless of what you do, where you've been, what somebody else has done to you, or what you've done; the true measure of worth and what you accomplish is ultimately measured by how you treat and your attitude toward others. That's the most important thing you can work on and that's something you can do right now. Don't live in the past, and don't live in the future. You're called to action right now.

February passed, as did April 21st (the anniversary of Matt's death) before I got around to posting this. There are probably only a handful of people that can appreciate a data compression discussion and non-technical book reviews in the same article. But, hey, this is my blog and I can post things in whatever combination I feel like, this isn't a popularity contest. Stranger things have happened on the Internet. If 5 other people can appreciate this, then that's great. Pretty soon I'll get around to posting something else about welding and really narrow down to my key demography which is apparently... well I have no idea.  

Here are some articles about Matt.
http://www.vandegriftvoice.com/news/2012/04/30/valor-day-honors-school-namesake/
 http://projects.militarytimes.com/valor/marine-1st-lt-matthew-r-vandegrift/3495867
http://www.hillcountrynews.com/news/four_points/article_bea6ffac-cbe1-11df-91bc-001cc4c002e0.html




Friday, April 25, 2014

Counting unique file extensions recursively

Here's a quick example of the Linux command line tools working together to do something more complex.

Question:
I wonder what types of files are in this directory structure and and how many of each type there are?

Answer:
A simple way to approach that would be to count the unique file extensions.

Let's start with getting a list of files.
eric@eric-Precision-M6500:~/workspace/temp$ find . -type f  
./page/content/ji.png
./page/content/index.html
./page/content/conf.png
./page/deploy
...


Ok... that worked. We can count them.
eric@eric-Precision-M6500:~/workspace/temp$ find . -type f | wc -l
67


But we're interested in just the file extensions, so let's get those. This regex matches a period followed by any number of non-period non-slash characters until the end of line and prints whatever is in the parentheses.
eric@eric-Precision-M6500:~/workspace/temp$ find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/'
png
html
png
sh
...


We can do the same thing with sed (thanks Chris!).
eric@glamdring:~/Pictures$ find . -type f | sed -e 's/^.*\.//'
jpg
jpg
jpg
eric@glamdring:~/Pictures$


We want to count the unique extensions, but the uniq command only works when unique objects are next to each other in the input stream so we have to sort the extensions first.
eric@eric-Precision-M6500:~/workspace/temp$ find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort | uniq -c 
      1 html
      1 md
      2 png
      9 sample
      1 sh
      2 txt



Let's put some commas in the output so we can import into calc as csv and make a pie chart.
eric@eric-Precision-M6500:~/workspace/temp$ find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort | uniq -c | awk '{print $2 "," $1}'
html,1
md,1
png,2
sample,9
sh,1
txt,2


That looks good. We'll redirect the output to a file and count unique extensions in a more interesting directory, all of the DARPA XDATA open source code on this computer.
eric@eric-Precision-M6500:~/workspace/xdata/code$ find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort | uniq -c | awk '{print $2 "," $1}' > extensions.csv





Extensions are a naive way to determine file type. The 'file' command is a little smarter, it combines several tests (filesystem, magic, and language) to determine file type. Here's an example of how to use it for our experiment.

eric@glamdring:~/workspace/randomproject$ find . -type f | awk 'system("file " $1)' | sed -e 's/^.*\://' | sort | uniq -c
    282  ASCII text
     18  ASCII text, with very long lines
      6  Bourne-Again shell script, ASCII text executable
     14  data
      2  empty
     45  GIF image data, version 89a, 16 x 10
      1  Git index, version 2, 606 entries
      1  Git index, version 2, 8 entries
      1  HTML document, ASCII text
     15  HTML document, Non-ISO extended-ASCII text
...


Here are some iterative improvements.

An awk-free version using the -exec option of find. 
find . -type f -exec file {} \; | sed 's/.*: //' | sort | uniq -c

A sed-free version of the awk-free vesion using the non-verbose output from file:
find . -type f -exec file -b {} \; | sort | uniq -c

If you have an idea for how to make this shorter or more effective, send me a note on social media and I'll include it.










Vagrant boilerplate

Not so long ago integrating software meant doing configuration management at the virtual machine level as integration took place. Taking snapshots and reverting if there was trouble. What you're left with a ~4Gb clunky file that you have to push around and convert to your eventual deployment environment. This is inefficient and risky. There's all the cruft that tags along as the builds take place. There's the complexities of image conversion.  Some painful workarounds sometimes need to take place to move it between environments (VmWare, deleting network interfaces). Hostname problems (Oracle, thanks for that). Security implementation guidelines also change at a moments notice or in different environments. And what if you want to deploy it to bare metal after all?
Thankfully, some better tools are available. If you haven't figured out how to use Vagrant yet, go learn it right now. If you know what a vm is and can use the linux command line, it will probably take you all of 15 minutes to master it.

Here's the website: http://www.vagrantup.com/

Basically vagrant has one configuration file where you specify a baseline OS image (a .box file), some other parameters about how you want it to behave. I say basically because this file does kick off and point to other config files that deploy your software. To kick all of this off you type 'vagrant up'.

Log in to the resultant system ('vagrant ssh') then test, develop, capture the changes you care about in the deployment files. When you mess it up, instead of reverting, wipe the whole thing out of existence with another command: 'vagrant destroy'. Lather, rinse, repeat. Within a short amount of time, what you're left with are some finely tuned deployment scripts that can take any baseline OS image and get it to where you want with one command. Vagrant allows you to spin up multiple vms at the same time so you can emulate and test more complex environments in this manner.


Although vagrant supports chef and puppet, my preference has been to use bash scripts to deploy software. Bash scripting is accessible to most people. When collaborating with groups of people from different organizations, it serves as common language. Recently, I've taken my boilerplate vagrant configurations and put them online.

It's best to separate parts of the deployment process. Don't write the commands that secure the system in the same file as the commands that deploy software components or data. Abstract it all. Then, when your deployment environment changes you only have to modify or switch out that one file. You can capture the security requirements for Site A and keep them separate from Site B. Want to deploy to a Site C? Build it out and you're only one command away from testing if everything works. If a security auditor asks how you configured your system, then send them the deployment file. If they have a critique (and they're good) they can make the recommended changes and send them back to you where you can just test them by running 'vagrant up' again.

My boilerplate includes simple snippets for how to create users, push around files, append to files, wget things to where  you want them, and other useful things that people forget when doing things from scratch (like writing the .gitignore). It should be easy to go through and modify it to do what you want it to do.

Here's a link:
https://github.com/darpa-xdata/vagrant-vm-boilerplate

It has Ubuntu as the base OS, here's a list of a bunch of other base OS's you can use without having to roll your own. Just modify the vagrant file.
http://www.vagrantbox.es/