IoT and Stuff – Cautionary Tales

Overview

IoT (Internet of Things) is an interesting phenomena where “things” become connected and provide either some control and / or sensor capability through this connection. Examples include connected thermostats, weather stations, garage door openers, smart door locks, etc.

It is an area of explosive growth, and like any other system it will have its security failures.

Tale 1 – Hacking Internet Connected Light Bulbs

LIFX lightbulbs are smart LED lights with two wireless interfaces; a Wi-Fi interface to connect to the local network and provide a control path for computers / smartphones, and an IEEE 802.15.4 mesh network to communicate between multiple LIFX smartlights. This dual wireless interface meant that any number of LIFX smartlights could be controlled and managed through a single Wi-Fi connection. Since any of the LIFX smartlights could operate as the “master” device that connected to both networks, it was necessary for each smartlight to have the Wi-Fi network access credentials.

Vulnerability

The vulnerability involves a couple of aspects in the design. These include:

  1. When an additional LIFX smartlight was added to the network, it exchanges data over the IEEE 802.15.4 network in the clear (unencrypted); except for the Wi-Fi credentials and some configuration details. All of this data was sent as encrypted blob.
  2. The encryption key for this blob was a pre-shared key hard coded into the firmware for every LIFX smartlight (of that firmware revision). This key was accessible via JTAG (which was pinned out on the PCB) or through the firmware image (which was not available at the time of the compromise).
  3. The system allowed a client on the IEEE 802.15.4 network to request (and receive) this encrypted configuration / credentials blob at any time in the background.

Compromise

The compromise allows an attacker physically close to the system to:

  1. Acquire the LIFX pre-shared encryption key from the firmware or JTAG interface.
  2. On the IEEE 802.15.4 network, request the encrypted configuration / credentials blob (masquerading as a LIFX smartlight).
  3. Crack open the blob using the encryption key from step 1.
  4. Connect to the Wi-Fi network using the credentials from the blob.
  5. Access the network and / or control the LIFX light bulbs.

Assessment

From this there are at least a few poor design choices that enabled this compromise. The first of these is to use a static pre-shared key to encrypt sensitive wireless data. The ability to establish a secure channel based on PKI has been standard practice for decades, allowing the use of dynamically generated keys at a session level. The use of a static pre-shared key is just lazy design.

The second of these is the ability to request the encrypted credential blob silently. For an initial configuration of an additional smartlight to the network, it is reasonable to require user confirmation to share the data with the additional smartlight. An attacker requesting this data in the background should not be allowed to get this data without user confirmation, or simply rejected when not part of a new bulb configuration.

Although having the JTAG port pinned out may seem to be a poor design choice, it is not really add significant risk. JTAG availability on the device pins would have been more than sufficient for a physical hacker, and that is assuming that the same data would not have been available in a firmware download. A JTAG port does not present a significant risk if keys are managed securely, and the security architecture takes this exposure into account.

Tale 2 – Smart Home Denial of Service

Vulnerability

The vulnerability in this story is that the smart home in this story had connected all of the smart devices in the house through a common Ethernet infrastructure – effectively rendering every device as a node on a flat network. This flat network meant that any one device can saturate the network with packets, effectively breaking the network. It also means that any one device can also monitor every packet on the network, or selectively disrupt packets. Essentially the security of this flat network can be compromised in multiple ways by any device on the network, and the overall security of the network is only as good as the weakest device.

Compromise

This particular compromise was based on a smartlight beaconing on the network as a denial of service attack. This event was not malicious, but if we consider the triad of confidentially, integrity and availability it is still a security failure. A self induced denial of service is still a denial of service.

Assessment

As a systems engineer the smart home described in this article makes me uncomfortable. The designer indicated that he had not installed his smart door locks since he did not want to be locked out / in by the locks. The designer also indicated that the light bulb denial of service rendered all the smart devices in his house broken / unavailable.

As a systems engineer this bothers me for a couple of reasons. The first of these is that it is possible to segment the network so that a failure does propagate through the entire network – effectively setting up security domains on functional boundaries. Even a trivial level of peering management would provide some level of isolation without giving up the necessary control protocols.

The second part that I find bothersome is that it appears that the entire system was designed with a single centralized control mechanism / scheme. Given the relatively poor reliability of network systems as compared with traditional home lighting / appliance controls, it makes sense to to install a parallel control scheme that is based on a local (more reliable) control path that operates much closer to the device being controlled.

In summary – the architecture of this particular smart home implementation is brittle in that a single device failure can precipitate an entire system failure. In addition it is fragile in that the control scheme is dependent on a number of disparate sequential operations that provide a multitude of single point failures for every device. Lastly, the system is not robust in that there is not an alternate control scheme. In my opinion this smart home may be an interesting experiment, but is a weak systems design with lots of architectural / system flaws.

Tale 3 – ThingBots

This is not a cautionary tale about a specific device or attack, but a cautionary tale about embedded devices in general, and by inclusion – IoT devices. Back in last week of 2013/first week of 2014, Proofpoint gathered some data from a number of botnets sending out spam. Specifically, they identified the unique IP addresses in the botnets, and characterized them forensically, and found that roughly a quarter of the zombie machines were not traditional PCs, but things like DVRs, security cameras, home routers, and at least one refrigerator. From this they coined the term ‘ThingBot’, which is a botnet zombie based on some ‘thing’.

The message is that when it comes to compromise and attack, there are no devices that will not be attacked, there is no point where your devices is not a target for a botnet. Harden all embedded devices and design defensively.

Bottom Line

The messages in these three tales are diverse, but can be summarized by:

  1. Every connected device is a target. Simply being a connected device is sufficient.
  2. Key management may be mundane but is even more critical on devices since often the only interface is networked.
  3. Most importantly – System design matters. Most security issues occur at the integration  interfaces between components of one type or another – and good system design reduce that exposure.

References

IOT and Stuff – The Evolution

Overview

This is the first of several posts I expect to do on IoT, including systems design, authentication, standards, and security domains. This particular post is an IoT backgrounder from my subjective viewpoint.

Introduction

The Internet of Things (IoT) is a phenomena that is difficult to define, and difficult to scope. The reason it is difficult to define is that it is rapidly evolving, and is currently based on the foundational capabilities IoT implementations provide.

Leaving the marketing hyperbole behind, IoT is the integration of ‘things’ into what we commonly refer to as the Internet. Things are anything that can support sensors and/or controls, an RF network interface, and most importantly – a CPU. This enables ubiquitous control / visibility into something physical on the network (that wasn’t on the network before).

IoT is currently undergoing a massive level of expansion. It is a chaotic expansion without any real top down or structured planning. This expansion is (for the most part) not driven by need, but by opportunity and the convergence of many different technologies.

Software Development Background

In this section, I am going to attempt to draw a parallel to IoT from the recent history of software development. Back at the start of the PC era (the 80s), software development carried with it high cost for compilers, linkers, test tools, packagers, etc. This marketing approach was inherited from the mainframe / centralized computer system era, where these tools were purchased and licensed by “the company”.  The cost of an IBM Fortran compiler and linker for the PC in the mid 80s was over $700, and libraries were $200 each (if memory serves me). In addition, the coding options were very static and very limited. Fortran, Cobol, C, Pascal, Basic and Assembly represented the vast majority of programming options. In addition (and this really surprised me at the time), if you sold a commercial software package that was compiled with the IBM compiler, it required that you purchase a distribution license from IBM that was priced based on number of units sold.  Collectively, these were significant barriers to any individual who wanted to even learn how to code.

This can be contrasted with the current software development environment where there is a massive proliferation of languages and most of them available as open source. The only real limitations or barriers to coding are personal ability, and time. There have been many events that have led to this current state, but (IMO) there were two key events that played a significant part in this. The first of these was the development of Borland Turbo Pascal in 1983, which retailed for $49.99, with unlimited distribution rights for an additional $99.99 for any software produced by the compiler. Yes I bought a copy (v2), and later I bought Turbo Assembler, Delphi 1.0, and 3.0. This was the first real opportunity for an individual to learn a new computer language (or to program at all) at an approachable cost without pirating it.

To re-iterate, incumbent software development products were all based on a mainframe market, and mainframe enterprise prices and licensing, with clumsy workflows and interfaces, copy protection or security dongles. Borland’s Turbo Pascal integrated editor, compiler and linker into an IDE – which was an innovative concept at the time. It also had no copy protection and a very liberal license agreement referred to as the Book License. It was the first software development product targeted at end users in a PC type market rather than the enterprise that employed the end user.

The second major event that brought about the end of expensive software development tools was GNU Compiler Collection (GCC) in 1987, with stable release by 1991. Since then, GCC has become the default compiler engine nearly all code development, enabling an explosion of languages, developers and open source software. It is the build engine that drives open source development.

In summary, by eliminating the barriers to software development (over the last 3 decades),  software development has exploded and proliferated to a degree not even imagined when the PC was introduced.

IoT Convergence

In a manner very analogous to software development over the last 3 decades, IoT is being driven by a similar revolution in hardware development, hardware production, and  software tools. One of the most significant elements of this explosion is the proliferation of Systems On a Chip (SoC) microprocessors. As recently as a decade ago (maybe a bit longer), the simplest practical microprocessor required a significant number of external support functions, which have now been integrated to a single piece of silicon. Today, there are microprocessors with various combinations of integrated UARTs, USB OTG ports, SDIO, I2C, persistent flash RAM, RAM, power management, GPIO, ADC and DAC converters, LCD drivers, self-clocking oscillator, and a real time clock  – all for a dollar or two.

A secondary aspect of the hardware development costs are a result of the open source hardware movement (OSH), that has produced very low cost development kits. In the not so distant past, the going cost for microprocessor development kit was about $500, and that market has been decimated by Arduino, Raspberry PI, and dozens of other similar products.

Another convergent element of the IoT convergence comes from open source software / hardware movement. All of the new low cost hardware development kits are based on some form of open source software packages. PCB CAD design tools like KiCAD enable low cost PCB development. Projects like OSHPark enable low cost PCB prototypes and builds without lot charges or minimum panel charges.

A third facet of the hardware costs is based on the availability and lower costs of data link radios for use with microprocessors. Cellular, Wi-Fi, 802.15.4, Zigbee, Bluetooth and Bluetooth LE all provide various tradeoffs of cost, performance, and ease of use – but all of them have devices and development kits that are an order of magnitude of lower cost than a decade ago.

The bottom line, is that IoT is not being driven by end use cases, or one group, special interest or industry consortium. It is being driven by the convergent capabilities of lower cost hardware, lower cost development tools, more capable hardware / software, and the opportunity to apply to whatever “thing” anybody is so inclined. This makes it really impossible to determine what it will look like as it evolves, and it also makes efforts by various companies get in front of or “own” IoT seem unlikely to succeed. The best these efforts are likely to achieve is that they will dominate or drive some segment of IoT by the virtue of what value they contribute to IoT. Overall these broad driving forces and the organic nature of the IoT growth means it is also very unlikely that it can be dominated or controlled, so my advice is to try and keep up and don’t get overwhelmed.

Personally, I am pretty excited about it.

PS – Interesting Note: Richard Stallman may be better known for his open source advocacy and failed Mach OS, but he was the driving developer behind GCC and EMACs, and GCC is probably as important as the Linux kernel in the foundation and success of the Linux OS and the open source software movement.

References

Hiatus / Status Report

Apologies to all for being remiss in updates to my blog. Over the last 3 months we sold our house of 17 years, and I took a job in downtown Minneapolis (formerly Spectrum Design Services). We should be closing on a house soon, but for the time being we are homeless (but not indigent) and imposing on relatives.

This presents a number of challenges to the available time to do projects / investigations and document said projects / investigations. First the demands of starting a new job, cleaning / fixing a house to sell and tortured commutes (with each of us spending the weeks living in different states) leaves me little spare time. Second, the limitations imposed by being homeless provide limited maintainable workspace for projects.  Thirdly, the cumulative stress of changing so much of your life is exhausting.  Lastly, when our house closes and we move in there will be a whole new set of things that will necessarily consume said spare time.

But it is all good. The job provides complexity and challenges and opportunities to really build the mental muscles. Boredom can kill you just as effectively as a heart attack – it just takes a lot longer (and it is a lot more tortured). The house is new, with a backyard over a wooded ravine with a creek running through it, and it is in River Falls – a really nice country/college town just beyond the sprawling metropolis of the Minneapolis-St Paul metro area.

So to re-iterate, it has been a while since I posted something interesting to the blog. I also anticipate that it will be a while (going forward) before I have additional posts. However I have been considering some different topics when I get back. Perhaps something on some hardware hacking – Arduino, Raspberry Pi or BeagleBoard Black – the challenge there is to find an interesting topic that hasn’t been done a multitude of times. If I can find a topic that I find interesting and has not been beaten to death, that will be on the list. Another topic is to look at some other programming development options on the Chromebook. Part of me is really curious if/how well PyCharm CE runs under Crouton. I also never got around to doing a Dart test project. Then I think i need to get back and refresh my Android Systems Engineering work – which are getting very stale.

In any case – this blog has not be abandoned, just neglected. I will be back.

A Brief Introduction to Security Engineering

Background

One of the great myths is that security is complicated, hard to understand, and must be opaque to be effective. This is mostly fiction perpetrated by people who would rather you did not question the security theater they are creating in lieu of real security, by security practitioners who don’t really understand what they are doing, or lastly those who are trying to accomplish something in their interests under the false flag of security. This last one is why so much of the government “security” activities are not really about security, but about control – which is not the same. Designing and doing security can be complex, but understanding security is much easier than it is generally portrayed.

Disclaimer – This is not a comprehensive or exhaustive list / analysis. It is a brief introduction that touches on a few of the most practical elements of security engineering.

Security Axioms

Anytime I look at systems security, there are a few axioms I use to set the context, limit the scope and measure the effectiveness. These are:

  1. Perfect security is unachievable, and any practical security is the result of some cost driven tradeoff.
  2. Defining and understanding your threat model is step zero of any security solution. If you don’t know who are are defending against, the solution will not fit.
  3. Defining and understanding success. This means understanding what you trying to protect and what exactly protecting those elements means.
  4. Defending a system is more costly / difficult than attacking that same system. Attacker only need to be successful once, but defenders need to be successful everytime.
  5. Security based on secrecy is weaker than security based on strength. Closed security solutions are more likely to contain flaws that weaken the security versus open security solutions. Yes – this has been validated.

The first of these is a recognition that a security is about a conflict between a system / information defender and an attacker on that system. Somebody is trying to take something of yours and you want to stop them. Each of these two parties can use different approaches and tools to do this, with increasing costs – where costs are monetary, time, resources, or risks of being caught / punished. This first axiom simply states that if an attacker has infinite time, money, resources, and zero risk, your system will be compromised because you are outgunned. For less enabled attackers,  the most cost effective security is that which is just enough to discourage them so they move on to an easier target. This of course leads understanding your attacker, and the next axiom – know your threat.

The second axiom states that any security solution is designed to protect from a certain certain type of threat. Defining and understanding the threats you are defending against is foundational to security design since it will drive every aspect of the system. A security system to keep your siblings, parents, children out of your personal data is completely different than one designed to keep out cyber extortionists out of your Internet accounts.

The third axiom is based on the premise that most of what your system / systems are doing requires minimal protected (depending on the threat model), but some parts of it require significant protection. For example – my Internet browsing history is not that important as compared with my password and account access file. I have strong controls on my passwords and account access (eg KeePass), and my browsing history is behind a system password. Another way to look at this to imagine what the impact could be if a given element were compromised – that should guide the level of protection for that item.

The fourth axiom is based on the premise that the defender must successfully defend every vulnerability in order to be successful, but the attacker only has to be successful on one vulnerability – one time to be successful. This is also why complex systems are more prone to compromise – greater complexity leads to more vulnerabilities (since there are more places for gremlins to hide).

The fifth one is the perhaps the least obvious axiom of this list. Simply put the strength of some security control should not be based on the design being secret. Encryption protocols are probably the best example of how this works. Most encryption protocols over the last few decades are developed, and publicized within the peer community. Invariably, weaknesses are found and corrected, improving the quality of the protocol, and reducing the risk of an inherent vulnerability. These algorithms and protocols are published and well known, enabling interoperability and third party validation reducing the risk of vulnerabilities due to implementation flaws. In application, the security of the encryption is based solely on the key – the keys used by the users. The favorite counter example is from the world of traditional pin tumbler locks , in which locksmith guilds attempted to keep their design / architecture secret for centuries, passed laws making it a crime to possess lock picks or knowing how to pick a lock unless you were a locksmith. Unfortunately, these laws did little to impede criminals and it became an arms race between lock makers, locksmiths and criminals, with the users of locks being kept fairly clueless. Clearly of the lock choices available to a user, some locks were better, some were worse, and some were nearly useless – and this secrecy model of security meant that users did not have the information to make that judgement call (and in general they still don’t). The takeaway – if security requires that the design / architecture of the system be kept secret, it is probably not very good security.

Threat Models

In the world of Internet security and information privacy, there are only a few types of threat models that matter. This is not because there are only a few threats, but because the methods of attack and the methods to defend are common. Generally it is safe to ignore threat distinctions that don’t effect how the system is secured. This list includes:

  1. Immediate family / Friends / Acquaintances – Essentially people who know you well and have some degree of physical access to you or the system your are protecting.
  2. Proximal Threats : Threats you do not know, but are who are physically / geographically close to you and the system you are protecting.
  3. Cyber Extortionists : A broad category of cyber attackers whose intent is to profit by attacking and compromising your information. This group generally targets individuals, but not a specific individual – they look for easy targets.
  4. Service Compromise : Threats who attack large holders of user information – ideally credit card information. This group is looking for bulk information is not targeting individuals directly.
  5. Advanced Persistent Threats (APTs) : Well equipped, well resourced, highly capable and persistent. These attackers are generally supported by governments or large businesses and their targets are usually equally large. This group plans and coordinates their attacks with a specific purpose.
  6. Government (NSA / CIA / FBI / DOJ / DHS / etc): Currently the biggest, baddest threat. They have the most advanced technical resources, the most money, and they use National Security Letters when those are not enough. The collect data in bulk, and they target individuals.

From a personal security perspective we are looking at threats most likely to concern any random user of internet services – you. In that context, we can dismiss a couple of these quickly. Lets do this in reverse order:

Government (NSA et al) – If they are targeting you specifically, and you use Internet services – you are need of more help than I can provide in this article. If your data is part of some massive bulk data collection – there is very little you can do about that either. So in either case,  in the context of personal data security for Joe Internet User, don’t worry about it.

Advanced Persistent Threats (APTs) – Once again, much like the NSA, it is unlikely you would be targeted specifically, and if you are your needs are beyond the scope of this article. So – although you may be concerned about this threat, there is very little you can do to stop this threat.

Service Compromise – I personally pay all of my bills online, and every one of these services wants to store my credit card in their database. Now the question you have to ask is if (for example), the Verizon customer database is compromised and somebody steals all of that credit card information (with 10s of millions of card numbers) and uses them to spend 100s of millions of charges – is Verizon (or any company in that position) going to take full responsibility? Highly unlikely – and that is why I do not store my credit information on their systems. If they are not likely to accept responsibility for any outcome, should you trust them with your credit?

Cyber Extortionists – The most interesting and creative of all these threat classes. I continue to be amazed at every new exploit I hear about. Examples include mobile apps that covertly call money transfer numbers (eg 1-900 numbers in US), or apps that buy other apps covertly. Much like the Salami Slicing attacks (made famous in the movie Office Space), individual attacks represent some very small financial gain, but the hope is that collectively they can represent significant money.

Proximal Threats – If somebody can physically take your laptop, tablet, phone, they have a really good shot at all of the information on that device. Many years ago, I had an iPhone stolen from me on the Washington DC metro, I had not enabled the screen lock, and I had the social security numbers / birthdays of my entire family in my contacts. And yes, there were false attempts to get credit based on this information within hours – unsuccessfully. I now use / recommend everybody use some device access lock, and encrypt very sensitive information in some form of locker. Passwords / accounts and social security numbers in KeePass and sensitive file storage in TruCrypt. These apps are free and provide significant protection for Just In Case. Remember physical control / access to a device is its own special type of attack.

Friends / Family / Acquaintances – In most cases, the level of security to protect from this class of threat is small. More importantly, it is crucial to understand what it is you are trying to protect, why are you protecting it, and what are your recovery options. To repeat – what are your recover options? It is very easy to secure your information, and then forget the password /  passphrase  or corrupt your keyfile. Compromise of private data in this context is orders of magnitude less likely than you locking yourself out of your data – permanently. Yes, I have done this and family photos on a locked TrueCrypt partition cannot be recovered in your lifetime. So when you look at security controls to protect from this threat model, look for built in recovery capabilities and only protect what is necessary to protect.

Conclusions

Fundamentally security engineering is about understanding what you are trying to protect, who / what your threat is, and determining what controls to use to impede the threat while not impeding proper function. Understanding your threat is the first and most important part of that process.

Lastly – I would encourage everybody who finds this the least bit interesting to either read Bruce Schneier’s blog and his books. He provides a very approachable and coherent perspective on IT security / Security Engineering.

Links

Software: Thoughts on Reliability and Randomness

Overview

Software Reliability and Randomness are slippery concepts that may be conceptually easy to understand, but hard to pin down. As programmers, we can write the equivalent of ‘hello world’ in dozens of languages on hundreds of platforms and once the program is functioning – it is reliable. It will produce the same results every time it is executed. Yet systems built from thousands of modules and millions of lines of code function less consistently than our hello world programs – and are functionally less reliable.

As programmers we often look for a source of randomness in our programs, and it is hard to find. Fundamentally we see computers as deterministic systems without any inherent entropy (for our purposes – randomness). For lack of true random numbers we generate Pseudo Random Numbers (PRNs), which are not really random. They are used in generating simulations, and in generating session keys for secure connections, and this lack of true randomness in computer generated PRNs has been the source of numerous security vulnerabilities.

In this post I am going to discuss how software can be “unreliable”, deterministic behavior, parallel systems / programming, how modern computer programs / systems can be non-deterministic (random), and how that is connected to software reliability.

Disclaimer

The topics of software reliability, deterministic behavior, and randomness in computers is a field that is massively deep and complex. The discussions in this blog are high level, lightweight, and I make some broad generalizations and assertions that are mostly correct (if you don’t look to closely) – but hopefully still serve to illustrate the discussion.

I also apologize in advance for this incredibly dry and abstract post.

Software Reliability

Hardware reliability, more precisely “failure” is most often occurs when some device in a system breaks (the smoke comes out), and the system no longer functions as expected. Software failures do not involve broken hardware or devices. Software failures are based on the concept that there are a semi-infinite number of paths (or states) through a complex software package, and the vast majority will result in the software acting and functioning as expected. However there are some paths through the code that will result in the software not functioning as expected. When this happnes, the software and system are doing exactly what the code is telling it to do – so from that perspective, there is no failure. However from the concept of a software failure, the software is not doing what is expected – which we interpret as a software failure, which provides a path to understand the concept of software reliability.

Deterministic Operation

Deterministic operation in software means that a given program with a given set if inputs will function in exactly the same manner every time it is executed – without any unexpected behaviors. For the most part this characteristic is what allows us to effectively write software. If we carry this further, and look at software on simple (8 / 16 bit) microprocessors / microcontrollers, where the software we write runs exclusively on the device, operation is very deterministic.

In contrast – on a modern system, our software exists in a relatively high level on top of APIs (application programming interfaces), libraries, services, and a core operating system – and in most cases this is a multitasking/multi-threaded/multi-cored environment. In the world of old school 8 / 16 bit microprocessors / microcontrollers, none of these layers exist. When we program for that environment, our program is compiled down to machine code that runs exclusively on that device.

In this context, our program not only operates deterministically in how the software functions, but the timing and interactions external to the microprocessor is deterministic. In the context of modern complex computing systems, this is generally not the case. In any case, the very deterministic operation of software on dedicated microprocessor makes it ideal for real world interactions and embedded controllers. This is why this model is used for toasters, coffee pots, microwave ovens and other appliances. The system is closed – meaning its inputs are limited to known and well defined sources, and its functions are fixed and static, and generally these systems are incredibly reliable. After all how often it is necessary to update the firmware on an appliance?

If this war our model the world of software and software reliability, we would be ignoring much of what has happened in the world of computing over the last decade or two. More importantly – we need to understand that this model is an endpoint, not the whole story, and to understand where we are today we need to look further.

Parallel Execution

One of the most pervasive trends in computing over the last decade (or so) is the transition from increasingly faster single threaded systems to increasingly parallel systems. This parallelism is accomplished through multiple computing cores on a single device and through multiple processing threads on a single core, which are both mechanisms to increase the ability of the processor to produce more work by being able to support concurrently running programs. A typical laptop today can have two to four cores and support two hardware threads per core, resulting in 8 relatively independent processes running at the same time. Servers with 16 to 64 cores would have qualified as supercomputers (small ones) a decade ago are now available off the shelf.

Parallel Programming: the Masochistic Way

Now – back in the early 80s as an intern at Cray, my supervisor spent one afternoon trying to teach me about how Cray computers (at that time) were parallel coded. As one of the first parallel processing systems, and as systems where every cycle was expensive – much of the software was parallel programmed in assembly code. The process is exactly how would imagine. There was a hardware scheduler that would transfer data to/from each processor to main memory every so many cycles. In between these transfers the processors would execute code. So if the system had four processors, you would write assembly code for each processor to execute some set of functions that were time synchronized ever so many machine cycles, with NOPs (no operation) occasionally used to pad the time. NOPs were considered bad practice since cycles were precious and not to be wasted on a NOP.  At the time, it was more than I wanted to take on, and I was shuffled back to hardware troubleshooting.

Over time I internalized this event, and learned something about scalability. It was easy to imagine somebody getting very good at doing two (maybe even 3 or 4) dissimilar time synchronous parallel programs. Additionally, since many programs also rely on very similar parallel functions, it was also easy to imagine somebody getting good at writing programs that did the same thing across a large number of parallel processors. However, it is much harder to imagine somebody getting very good at writing dissimilar time synchronous parallel programs effectively over a large number of parallel processors. This is in addition to the lack of scalability inherent in assembly language.

Parallel Programming – High Level Languages

Of course in the 80s or even the 90s, most computer programmers did not need to be concerned with parallel programming, and every Operating System was single threaded, and the argument of the day was Cooperative multitasking versus Preemptive multitasking. Much like the RISC vs CISC argument from the prior decade, these issues were rendered irrelevant by the pace of processor hardware improvements. Now many of us walk around with the equivalent that Cray supercomputer in our pockets.

In any case the issue of parallel programming was resolved in two parts. The first being the idea of a multi-tasking operating systems with a scheduler – the core function that controls what programs are running (and how long they run) in parallel at any one time. The second being the development of multi-threaded programming in higher level languages (without the time synchronization of early Crays).

Breaking Random

Finally getting back to my original point… The result today is that all modern operating systems have some privileged block of code – the kernel running continuously, but have a number of other services that run the OS, including the memory manager and the task scheduler.

The key to this whole story is that these privileged processes manage access to shared resources on the computer. Of these two, the task manager is the most interesting – mostly due the arcane system attributes it uses to determine which processes have access to which core / thread on the processor. This is one of the most complex aspects of a multitasking / multi-core / multithreaded (hardware) system. The attributes the scheduler looks at include affinity flags that processes use to indicate core preference, priority flags, resource conflicts and hardware interrupts.

The net result is that if we take any set of processes on a highly parallel system there are some characteristics of this set that are sufficiently complex and impacted by unknown external elements that they are random – truly random. For example if we create three separate processes that generate a pseudo random number set based on some seed (using unique values in each), and point all of them to some shared memory resource- where the value is read as input and the output is written back. Since the operation of the task scheduler means that the order of execution of these three threads is completely arbitrary, it is not possible to determine what the sequence is deterministically – the result would be something more random than a PRNG. A not so subtle (and critical) assumption is that the system has other tasks and processes it is managing, which directly impact the scheduler, introducing entropy to the system.

Before we go on, lets take a closer look at this. Note that if some piece of software functions the same (internally and externally) every time it executes, it is deterministic. If this same piece of software functions differently based on external factors that are unrelated to this software, that is non-deterministic. Since kernel level resource managers (memory, scheduler, etc) function in response to system factors and factors from each and every running process – that means that from the perspective of any one software package, certain environmental factors are non-deterministic (i.e. random). In addition to the scheduling and sequencing aspects identified above, memory allocations will also be granted or moved in a similar way.

Of course this system level random behavior is only half the story. As software packages are built to take advantage of gigabytes of RAM, and lots of parallel execution power, they are becoming a functional aggregation of dozens (to hundreds) of independently functioning threads or processes, which introduce a new level of sequencing and interdependancies which are dependent on the task manager.

Bottom Line – Any sufficiently complex asynchronous and parallel system will have certain non-deterministic characteristics based on the number of independent sources that will influence access / use of system shared resources. Layer the complexity of parallel high level programming, and certain aspects of program operation are very non-deterministic

Back to Software Reliability

 Yes we have shown that both multitasked parallel hardware and parallel programmed software contribute to some non-deterministic behavior in operation, but we also know that for the most part software is relatively reliable. Some software is better and some is worse, but there clearly is some other set of factors in play. 

The simple and not very useful answer is “better coding” or “code quality”. A slightly more insightful answer would tell you that code that depends on or uses some non-deterministic feature of the system is probably going to be less reliable. An obvious example is timing loops. Back in the days of single threaded programs and single threaded platforms, programmers would introduce relatively stable timing delays with empty timing loops. This practice was easy, popular and produced fairly consistent timing – showing deterministic behavior. As systems hardware and software have evolved, the assumptions these coding practices rely on become less and less valid. Try writing a timing loop program on a modern platform and the results can be workable much of the time, but it  can also vary by orders of magnitude – in a very non-deterministic manner. There are dozens of programming practices like this that use to work just fine, but no longer do – but they don’t completely break, just operate a little bit randomly. In many cases, the behavior is close enough to “correct” that the program appears to function, but not very reliably.

Another coding practice that used to work on single threaded systems was to call some function and expect the result would be available on the next line of code. It worked on single threaded systems because execution was handed off to that function, and did not return until it was complete. Fast forward to today, and if this is written as a parallel program – the expected data may not be there when your code thinks is should be. There is a lesson here – high level parallel programming languages make writing parallel code fairly easy, but that does not mean that writing robust parallel programs is easy. Parallel inter-dependencies issues can be just as ugly as parallel assembly code on a Cray system.

Summary

A single piece of code running exclusively on a dedicated processor is very deterministically, but parallel programmed software on a multitasking parallel hardware system can be very non-deterministic, and difficult to test. Much of software reliability is based on how little a given software package depends on these non-deterministic features. Managing software reliability and failure mechanisms requires that programmers understand the system beyond the confines of the program.

References

Android Systems Engineering: A Quick Look Under the Hood

Overview

Most of us we have become comfortable with the understanding that Android is a very specialized and customized version of Linux. The resulting level of customization does however render the OS sufficiently different that most general purpose Linux Systems Engineering expertise is of limited value. We also need to acknowledge that this simply means we need to get back to the foundations and explore / relearn from the bottom up (or top down – depending on your perspective).

If we take a quick survey of the multitudes of Android devices available, it is clear that there are both similarities and significant differences. Some of these are cosmetic and some are not, and one way to understand which is which,  is by dissecting the construction of the OS – and identifying where these modifications were interjected. From the bottom up we have:

  • The Android Kernel: The Android kernel is the engine at the bottom of the Android software stack that is responsible for everything that happens after the bootloader hands off control.
  • Core Android Services: When you hear about functions named Binder or AshMem, these are two of the core services that are responsible for managing memory, communication and process launching. Like the kernel these services generally are a key part of everything that happens in Android.
  • System Services/APIs: These are functions that provide more specialized functions and programmatic interfaces to the Android platform. This is the layer that applications start to touch.
  • Proprietary Device Drivers: A set of binary loaded drivers that enable the very general interfaces in the OS to talk to specialized hardware devices. These are not open source.
  • Dalvik Virtual Machine (VM): A specialized Java like virtual machine that isolates (or virtualizes) the APIs so that applications can be device independent.
  • AOSP User Interface/Apps: The standard user interface and core applications provided as part of the Android Open Source Project. This is a very small set of applications.
  • Google Applications: A larger set of Google specific applications, which include Gmail, Google Maps, Google Play Market, Google Talk, Google Voice, and Chrome.
  • Device Vendor Theming / Applications: A look and feel overlay on the user interface (also known as ‘skinning’) combined with a set of vendor specific applications. Nexus devices are not “skinned”.
  • Telecom Vendor Applications: A set of applications specific to a telecom vendor that provide some level of function that often duplicate functionality. Nexus devices do have telecom vendor apps.

From an Android Systems Engineering viewpoint, a few of these merit further discussion. Specifically we will expand on the Android Kernel, AOSP, and the proprietary drivers.

Android Kernel

The Android kernel is based on the Linux kernel, but has a number of Android specific patches that provide improved performance and function on a mobile platform. These patches are generally very specific to how Android manages memory, tasks, interrupts and timers to provide better performance and battery life than a more traditional Linux kernel. There has also been an evolving effort to reduce the amount of OS code running as a privileged user (i.e. root), which has been relatively successful, and has driven some kernel architectural changes. This is why Android privilege escalation attacks are so very uncommon.

AOSP: Android Open Source Project

The Android Open Source Project, or AOSP is the open source part of the Android system. The homepage for AOSP is at http://source.android.com/. AOSP includes the Android Kernel, a number of key OS services, the Android API functions and the Dalvik Virtual Machine.

Another way to look at this is that AOSP is the open source foundation from kernel to User Interface – excepting the proprietary apps, drivers, and ‘skinning’. To clarify – skinning is a process that layers a user interface theme on top of the standard Android look and feel. This is generally done by product vendors to create a brand style or appearance. Examples include Motorola MotoBlur, HTC Sense and Samsung TouchWiz.

The most important thing to know about AOSP is that it is open source (distribution, but not development) and it is capable of generating a fully functional Android Operating System without any special or purchased tools.

Android Proprietary Drivers

Although the Android OS is fairly generic for platforms, it is still necessary to have device drivers and a hardware abstraction layer (HAL) that maps a generic API interface to the device. Unlike the PC Linux movement, there is not a significant effort to develop open source drivers for Android hardware devices and for the most part, these are proprietary pieces of code that get installed on the OS image.

If your device is a Nexus device, drivers can be found at https://developers.google.com/android/nexus/drivers. For other Android devices the process is more complicated, and drivers are generally extracted from the factory device firmware.

Beyond Skinning

As the Android marketplace evolves (I hesitate to call it maturing), device vendors are increasingly working to develop product differentiation that would enable them to carve off their own walled gardens in the Android world. Initially this was achieved by changing the look and feel of the User Interface (UI) through skinning – which is a fairly cosmetic process. The upside to skinning is that it is also a relatively safe process with minimal risk of introducing vulnerabilities.

Over the last few years this process has progressed, and Samsung in particular has introduced and applied broad sets of  patches to the AOSP source code that add significant capabilities – such as Multi-Window on the Galaxy devices. These changes are fairly significant and end up touching a many parts from the kernel to the UI, with increasing risk of bugs and vulnerabilities. Of course the real question is whether this has actually happened.

Implications on Stability

My favorite device differentiator is stability, and the following statements make some relatively safe, but somewhat anecdotal assertions about stability / vendor support on smart phones.

  • Apple iPhones are fairly stable but introducing a major new OS version each year that is compatible with 3 generations of hardware may be having some impacts. Just for fun Google “iphone camera crashes” and see what you discover.
  • Google Nexus / Developer devices are more stable because of their minimalism. I have had a Nexus tablet for over a year and a Nexus phone for about 6 months. I have never had to power cycle either of them except for firmware patching / updates. The only app crash I have ever had is Facebook – and it happened once.  Based on discussions with other Nexus owners, this is fairly typical.
  • Google Nexus / Developer devices generally receive OS updates within days of announcement. This is because Google does this directly – and does not go through the device / telecom vendor channels (who have no business interest in maintaining a device already sold and obsoleted out of the sales chain).
  • Non-Nexus Android devices range from slightly flakey to very crashy (and barely functional), require regular reboots, and if they are updated – they can be up to a year behind the current version. Vendors typically lock the bootloader and discourage any third party OS development (AOKP and/or CyanogenMod for example).

I also understand that the plural of anecdote is not fact – but several of these assertions are supported by a recent academic paper that has some interesting findings on code provenance and vulnerabilities. More simply – the paper confirms that some Android phones are significantly more vulnerable / buggy than others, and who is to blame.

The vendors surveyed included Google Nexus, LG, HTC, Samsung and Sony. I will let you read the paper, but there are few tidbits worth sharing.

  • Google Nexus phones were used as the baseline since they have no third party software – and did provide the lowest number of vulnerabilities.
  • Sony customization produced significantly few vulnerabilities than the other three vendors.
  • Of LG, HTC and Samsung devices – between 65% and 85% of identified vulnerabilities came from their 3rd party code and customizations.
  • Samsung Galaxy S2 and S3 take the dubious honor of having the greatest number of vulnerabilities (see table 5 – S3 had 40 vulnerabilities versus the Nexus 4 with 3).

Conclusions I personally drew from this paper include: a) Whatever Sony is doing in their code development QA process – they should keep doing it, b) Whatever Samsung is doing in their code development QA process – they should stop doing it and fix it, and c) This data supports some of my anecdotal assertions above and actually adds detail.

Anecdotes are interesting, and as story tellers we can more easily relate to them versus raw data. However – occasionally data can support the anecdotes and then the combination can become useful.

Bottom Line

For the aspiring Android Systems Engineer, this quick look under the hood is intended to provide you with a few reference points you can related back to your Linux Systems Engineering understanding.

For the truly ambitious, the next step is to download the AOSP source and dig in…

Reference

Howto: Setup Arduino on Chromebook

Background

Arduino is an interesting microcontroller platform / board that arguably launched the era of low cost, standalone microcontroller systems. At this time, there are a multitude of these devices in the $50 less price range – but the Arduino was one of the first.

ArduinoUno_R3_Front_450px

The Arduino is an 8 bit microcontroller with a USB interface, a GPIO interface, A/D and D/A converters, I2C interfaces, and UART(s). More importantly, it has a free and easy to use IDE that supports C coding for the device. The Arduino Uno runs about $25 (at this time) from a number of sources – AdaFruit or Sparkfun (for example).

In any case, this post will (hopefully) be relatively short and provide a proof of concept that the Arduino system can be installed and function on the Chromebook 14.

Dependencies / Assumptions / Caveats

This install requires that:

  1. The target Chromebook 14 is in developer mode.
  2. It has an SD card of at least 8GB to support the installation of a crouton chroot Ubuntu install.
  3. A fairly recent version of Ubuntu installed to a crouton chroot jail – for details refer to my post on installing an Android Development Environment.
  4. Java JDK installed and functioning. Once again – refer back to the Android Development Environment post.
  5. An Arduino device to test with.

Note: All of the instructions below are based on name of my user (joeuser), the name of my SD-Card (chrome-32), and particular versions of the install packages. You will need to modify for your respective names / versions.

Hardware – Arduino / USB Interface

My biggest concern with Arduino on the Chromebook is whether the Arduino Uno (my test board) will be recognized / configured correctly by ChromeOS – since there is a real risk that the appropriate kernel drivers may not be included on ChromeOS. Our chroot Ubuntu jail still uses / depends completely on ChromeOS for the kernel, kernel drivers and /dev.

So the first thing we are going to do is see what the ChromeOS kernel messages are when we hotplug the Arduino Uno into the Chromebook. Taking a look at the before by opening a crosh window <ctrl-alt-t>, followed by:

chrosh>shell
dmesg

Produces a screen full of device messages. Interestingly the last message indicates that  a GSM modem is mapped to ttyUSB0 – information that may be useful in the future. In any case, if we plug in the Arduino Uno to a USB interface on the Chromebook and run ‘dmesg’ again (looking specifically for new messages), we get the following information.

[12028.022309] usb 1-1: new full-speed USB device number 39 using xhci_hcd
[12028.035738] usb 1-1: New USB device found, idVendor=2341, idProduct=0001
[12028.035752] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=220
[12028.035763] usb 1-1: Product: Arduino Uno
[12028.035771] usb 1-1: Manufacturer: Arduino (www.arduino.cc)
[12028.035779] usb 1-1: SerialNumber: 649323436383514051E1
[12028.035970] usb 1-1: ep 0x82 - rounding interval to 1024 microframes, ep desc says 2040 microframes
[12028.036424] cdc_acm 1-1:1.0: ttyACM0: USB ACM device

Which provides us with a couple of useful datapoints. Specifically, that the device is recognized as an Arduino Uno, and that it is mapped to ‘ttyACM0’ – implying that it is recognized and likely supported by kernel driver.

The next thing we want to look at is is what it looks like in /dev – which is where the tty devices are mapped. In order for this interface to function correctly, the devices needs to readable / writeable from the Arduino IDE, and that will be installed on an crouton chroot Ubuntu install. So – to be more specific, we need to see what the ‘/dev/ttyACM0’ device looks like from inside of Ubuntu on the Chromebook – ownership and permissions. Start the Ubuntu install, switch to that interface (VT3) and open a terminal window. Inside that window, enter:

cd /dev
ls -al tty*

And this produces a listing in which the line of interest looks something like:

crw-rw---- 1 root serial 166,  0 Nov 26 05:37 ttyACM0

Note that the line containing ttyACM0 has permissions set to 660 and is owned by group ‘serial’. Most significantly, it is not world readable/writable. This will matter later when we need to access it from the Arduino IDE (Interactive Development Environment).

Software – Installing the Arduino IDE

There are multiple options for installing the Arduino IDE on Ubuntu. The easiest is launch the Ubuntu Software Center (or Synaptic)  from inside the Ubuntu system, search for Arduino and install. The only real issue with this is that the version in the Ubuntu respository is usually a few versions behind the most current version at the Arduino homepage. My suggestion is to try the version in the Ubuntu repository, see if it works (or doesn’t), and then evaluate the differences between the installed version and the most current version. If the updated features are critical to your needs, download and install the current version from the Arduino homepage – and follow the instructions for Ubuntu install.

After the install has completed, start start the Arduino IDE. I dialog box will popup indicating that the current user is not part of the ‘dialout’ group.   This can be remedied by closing the Arduino IDE, opening a terminal window and entering:

sudo usermod -a -G dialout joeuser 
sudo usermod -a -G serial joeuser

Which of course is based on my default username ‘joeuser’ – adapt to your match your configuration. Note that we added our user to two groups. The reason for this is a bit complicated, but it is important that the second group is the same as the groupname associated with /dev/ttyACM0 (from above).

After this is completed, you can restart the Arduino IDE and connect the Arduino to Chromebook. Under the settings menu, serial device you will find ‘/dev/ACM0’ is now enabled.

Arduino1Blink

If you pull up the demo sketch for blink, compile and install – and it should work. However we still have one more open issue that needs to be wrapped up.

One Dangling Detail – Fixing udevd

Our dangling detail is the fact that the Arduino IDE install created some association between the Arduino serial port (/dev/ttyACM0) and the dialout group as part of the install – but it is not working quite as expected. We can verify this by repeating the following:

cd /dev
ls -al tty*

Which produces the same information we have above with our /dev/ttyACM0 port in the serial group – not the dialout group. Now if we do this (from inside an Ubuntu Terminal):

sudo udevd --daemon
{disconnect / reconnect the Arduino Uno}
cd /dev
ls -al tty*

This produces a slightly different listing of which the line of interest will look something like:

crw-rw---- 1 root dialout 166,  0 Nov 26 05:37 ttyACM0

Which now shows that this device interface is associated with the dialout group. The reason for this is that the udevd daemon is a service that manages device configuration on most modern Linux systems. ChromeOS does not use udevd or even have it installed – for security reasons. The Arduino IDE creates some udevd rules (in the Ubuntu Chroot system) that map Arduino devices to the dialout group – but since the udevd daemon is not running in this crouton install – the rules are not applied until we manually started the daemon. We could manually start this each time we run our Ubuntu install, but the more correct and complete solution is add to udevd to the startup apps in ‘/etc/rc.local’. In Ubuntu, open a terminal and do the following:

sudo gedit /etc/rc.local

On the line before ‘exit 0’, add a new line with the following:

sudo udevd --daemon

Save and exit. What this does is, every time you boot your Ubuntu install, the udevd daemon will start – and all of the udev rules will be implemented. You can reboot, plug in the Arduino and confirm that it maps to the dialout group.

Wrapup

This is a  slightly messy install – since we had to get the udevd daemon started, and that would not be typical for an install. But overall this is nothing too far off the beaten path of Linux installations and maybe we learned something new in the process.

Update : 2013 Jan 27

The Cortado – https://launch.punchthrough.com/. Arduino compatible, Bluetooth programmable, onboard sensors, and long battery life. The Chromebook also has Bluetooth and it could likely function as a dev platform, and I would really like to try this on for size. The up to 100ft range and meshed networking – makes this a potential in the IoT space. [twolf]

Howto: Android Development on Chromebook

Background

One the primary reasons I got the Chromebook was to support a broad range of development options on a Linux platform. The risk is that since it is a Chromebook, it is most definitely not a general purpose Linux environment – and some things may not be practical. Even inside a chroot jail, there are weirdnesses since it inherits the kernel and devices from the platform OS – ChromeOS.

In any case the following is a process I developed to implement an Android application development environment. There are many equally valid solutions, and perhaps better solutions, but this is a working / tested example of how to get from point A to point B.

Approach

Since the base platform I am using is a Chromebook 14 – with a x86 Haswell (64bit), Intel binaries will work fine – making life a quite a bit simpler. Unfortunately, it is not possible to install anything like Eclipse or Debian packages directly to the ChromeOS since the developers have (purposefully) not included most of the traditional shared libraries used in Linux. This minimalist approach to the ChromeOS means that it has a minimal attack surface for malware, but minimal opportunity for us to hack the OS.  As a point of trivia, ChromeOS appears to be based on a Gentoo build model – but it has been scrubbed clean of anything extraneous to the ChromeOS function.

However, since we are lucky enough to have a relatively polished / low pain solution to installing a chroot jail version of Ubuntu – Crouton, our system level approach will be:

  1. Switch the Chromebook to developer mode
  2. Install a Crouton based chroot jail version of Ubuntu on the SD-Card
  3. Install Oracle Java in the chroot jail (along with all of the rest of the pieces)
  4. Install BitTorrent sync to create a shared workspace for Android Studio
  5. Install adb and fastboot and verify operation with an Android device.
  6. Install Android Studio / test with Android device.

Note: All of the instructions below are based on name of my user (joeuser), the name of my SD-Card (chrome-32), and particular versions of the install packages. You will need to modify for your respective names / versions.

Step 1 – Developer Mode

Developer mode is a way to unlock your Chromebook a bit. Of course it is much less secure than the default ChromeOS mode, and setting developer mode means everything on the platform is erased – except the SD-Card . Additionally, switching it back to default mode will also clear everything (again). So any files you want to be persistent should be stored up in the Google drive or on the SD card. After you have mentally prepared yourself for that, take a shot at developer mode – it really is much more interesting than lockdown mode. The details are on my Chromebook Cookbook page. While you are there, spend some time and figure out how to switch between virtual terminals and set the password for ‘chronos’.

Step 2 – Ubuntu on Crouton Chroot

The default Crouton install installs the Crouton tools in /usr/local/bin and the chroot jail in /usr/local/chroots. With a 16GB internal hard drive, it is not practical to risk using most/all of the available local space for a chroot jail – so we should plan to store it off to the SD-Card. There are two basic approaches we can use to accomplish this. The first is to use command line arguments with the Crouton tools to point it at /media/removable/chrome-32 (which happens to be the name of my card). Another option is to symbolically map these directories to corresponding directories on the SD-Card. The first approach means that every time the crouton scripts are used, the command line arguments are needed, and the second approach means it is done one time up front (I recommend the second approach as the less stupid approach).

Open a chrosh (chrome shell – ctl-alt-t), and enter (with corrections for your sd-card name):

shell
cd /media/removable/chrome-32
sudo mkdir bin chroots
cd /usr/opt
sudo ln -s /media/removable/chrome-32/bin/ bin
sudo ln -s /media/removable/chrome-32/chroots/ chroots
ls -al

The last instruction should show the local directories with the symbolic mapping to the sd-card locations. This ensures that both the crouton tools and the chroot jail is installed to the SD-Card enabling a much easier restore if you somehow clear your system (it happens to me at least once a week).

FYI – If that happens, the Crouton install can be restored by recreating the symbolic links above. That’s it.

From here install a crouton chroot jail ubuntu with the following:

sudo sh -e ~/Downloads/crouton -r raring -t unity
sudo startunity

Note that in most cases I recommend Precise due to its stability, but in this case I went with Raring since it has support for ADB and Fastboot in the Ubuntu repository (and Precise does not). From inside the Ubuntu install, open a terminal and enter:

sudo apt-get install ubuntu-standard
sudo apt-get install ubuntu-desktop
sudo apt-get install ia32-libs
sudo apt-get install synaptic

Shutdown the Ubuntu chroot jail. You now have a fairly complete and clean Ubuntu Raring install, and this would be a good point to make a backup. Instructions are on the Crouton Cookbook page. Once again – when the backup is done, move it to Google Drive or the SD-Card for safekeeping.

Step 3 – Install Oracle Java

From the ChromeOS interface (VT1), you can download the  Java 7 JDK for 64 bit Linux – grab the tar.gz package from Oracle (not the RPM). It will download into the Downloads directory, which incidentally is mapped to the Downloads directory inside the Crouton chroot jail.

After the download is complete, switch over the Ubuntu interface on VT3, open a file manager and copy the Java 7 tar.gz package from Downloads to the home directory. Right click and extract. It should create a directory name something like ‘jdk1.7.0_45’ in the home directory. Open an editor and open ‘.bashrc’ and append the following:

PATH=${PATH}:/home/joeuser/jdk1.7.0_45/bin
JAVA_HOME=/home/joeuser/jdk1.7.0_45

This will make it easier for *some* apps to find the JDK. The JDK tar.gz file in the homedir can safely be deleted after this is done.

Step 4 – BitTorrent Sync

On the ChromeOS interface (VT1) download the Linux/64bit install package for BitTorrent Sync from http://www.bittorrent.com/sync/downloads. Open a terminal with <ctrl-alt-T> and enter:

cd /media/removable/chrome-32
sudu mkdir btsync
sudo chmod 777 btsync
cd btsync
mkdir android-studio

This creates a target sync directory that is readable / writable / executable to everybody for syncing. We will use it later.

When the download is complete, copy the tgz file to the user home directory(from the Ubuntu interface) . Extract the files in the home directory. This will create a directory that looks like ‘~/btsync_glibc23_x65’. On the Unity Desktop, click on the gear in the upper right corner and select ‘Startup Applications’. Under Command, browse to the btsync directory and select the ‘btsync’ app. The will configure the app to startup when the chroot is started – similar enough to a service for our purposes. After this is done, the tgz file in the homedir can also be deleted.

Start the app by double clicking from the file manager or reboot the chroot jail to force the app startup (and validate that it is configured correctly). On either the ChromeOS or Ubuntu interface, open a browser with URL ‘localhost:8888/gui’ to confirm that Bit Torrent sync is running. Configure according to directions – using the ‘bysync’ directory (on the SD-Card) we created above as the target. You will also want to create another endpoint to this share on a desktop, server, or other laptop to ensure your data is offloaded from your Chromebook.

Step 5 – ADB and Fastboot

ADB and Fastboot are really a make or break part of effectively using the Chromebook for Android development. Note that in Precise, the adb and fastboot packages need to be retrieved manually from the Debian repository. For details refer to the Ubuntu Cookbook page. In Raring, we can use the easier method shown below.

From the chroot Ubuntu interface on VT3, and open a terminal. Enter the following:

sudo apt-get install android-tools-adb
sudo apt-get install android-tools-fastboot
adb version
fastboot help

The last two lines confirm that both adb and fastboot are operational. The true test is to now plug in an Android test device and enter (this may take a couple of tries):

adb devices

If everything if functional the adb server should start and the attached device will be identified. Note that if the Android device is reasonably current, it will require onscreen approval before it connects to adb.

Step 6 – Android Studio

The last piece in this puzzle is Android Studio. From the ChromeOS interface, download the Linux/64bit install package from http://developer.android.com/sdk/installing/studio.html.

From the chroot jail (VT3) open a file manager and copy the Android Studio tgz file to the user home directory. Right click on the file and extract the files in the home directory. This will create a directory that looks like ‘~/android-studio’ with an ‘bin’ subdirectory. Once again, after this is complete the tgz file in the homedir can be deleted.

In order to make the launch script easier to find, we will put it on the PATH system variable. In the user home directory, edit ‘.bashrc’ and append the following:

PATH=${PATH}:/home/joeuser/android-studio/bin

In some cases, I discovered that the launch script did not seem to be picking up on the .bashrc updates, so it was necessary to define the JDK_HOME more explicitly. Use your favorite editor to open ‘~/andoid-studio/bin/studio.sh’. Right below the ‘#!/bin/sh’ line add the following:

JAVA_HOME=/home/joeuser/jdk1.7.0_45″

Open a terminal and enter ‘studio.sh’ to confirm that Android Studio was found and executed. If / when prompted, select the directory for the Oracle Java JDK. Android Studio should launch.

As a test we are going to create a new project (mostly with the defaults), except for project location. For the location, navigate to ‘media/removable/chrome-32/btsync/android-studio’. This will put the project files in the sync directory, which will then synchronize the project to your others systems on share. This is all part of that concept that everything should be stored in some “cloud” or at least off device.

At the project screen, create a new project and when it comes up on the screen, click the green arrow button at the top of the screen. The connected Android device should show up as an option, select it and go. Alternatively, you can create an emulator and use it. The app should install, run and show ‘Hello World’ on the interface.

Version Note: As part of my testing, I noticed that my initial version of Android Studio had functioning menu drop downs, but after I updated to the current version (0.32) the menus would no longer drop down. I confirmed this with both Precise and Raring on Unity. It may be worth testing Gnome as some point, or it may be fixed in some upcoming update to Android Studio. Overall – it did not prevent me from doing most activities since the control bar was fully functional.

Lastly – go to another node on your BitTorrent Sync share and confirm that the project was created and migrated to that system.

Wrapup / Notes

Overall the Android Studio IDE is fairly functional and well structured. I was able to test on both an external device and the emulator without any issues.  From a practical perspective, I am actually surprised at how easily this came together on a Chromebook. As far as the menu issues, this appears to be an Android Studio / Unity issue which could be resolved by using Eclipse/Android ADT tools or switching to a different window manager.

In Summary – This Chromebook Android Environment provides me with a very slick and portable Android Development environment without a lot of compromises.

Howto: Browse (more) Securely / Privately / Anonymously

Background

For a number of reasons, many people are increasingly concerned with their privacy and security on the Internet. Since the primary reason most people use the Internet is for browsing, this would be a opportunistic use model to look for improvement. Of course the tradeoff is that as we make browsing more secure, we also may make the browsing experience more difficult. So in the list below, it progresses from low return / low impact to high impact / high return, and you can pick you pain threshold.

Note that in the context of a browser (and browsing), I define security as the ability to browse without being infected or compromised by malware. I define privacy as the ability to browse without sites (or other parties) tracking, harvesting information from my browser. Anonymity is when there is a sufficiently high degree of privacy that the browsing activity is anonymous – and true anonymity is not easy to achieve.

Off the Shelf / Good Browser Hygiene

Browser: There are lots of browser options and I cannot offer an opinion on most of them. On a regular basis browsers are reviewed for security – and Chrome, and Firefox are usually in the top three. Privacy is distinct from security, and generally Firefox rates higher than Chrome in that respect. However everything is a tradeoff, and I personally think that Chrome has better performance (which I may be imagining), and my Android devices and Chromebook are Chrome by design – so that is my browser choice by default. Secondary to that, I appreciate the rolling updates and aggressive stance Google takes on security, and I think that outweighs the weaker stance they take on privacy – since I believe I can manage my privacy / personal data easier than I manage security threats. Consider browser selection as the first thing to do in cleaning up your browser security / privacy concerns.

Browser Settings: The obvious things to check in your browser include:

  • Turn on “Do Not Track” / Open settings and search for this flag – if it is not set, set it. This provides some minimal and non necessarily mandatory level of tracking reduction.
  • Content Setttings (Cookies): I up the default level to “Keep local data only until I quit my browser” and “Block third-party cookies and site data”.
  • [Chrome Specific]Under Signin and Sync Settings, I encrypt my sync data with a passphrase. This is all about key management and reducing personal data on Google Servers.

Browser Plugins: The following list includes a few plugins that provide improved privacy.

  • HTTPS Everywhere: This is a plugin that will force a HTTPS connection as the default, with HTTP (non-secure) as the fallback.
  • DuckDuckGo Search: Duck Duck Go is a search service that provides much stronger statements about not tracking your browsing / searching activity (as compared with Google). They feel fairly strongly that this is a big deal. Take a look at their positions on results bubbling
  • DoNotTrackMe: A plugin that gives you explicit tracking information as you browse. This actually provides some visibility into what sites are tracking you in realtime.

Sites: What to do to reduce your browsing footprint.

  • Google Search History: By default Google saves your search history and used it to target ads and search results. My recommendation – turn it off.
  • Google Dashboard: A nice portal that provides a one view view into your data footprint on Google Servers. Review and clean it up. While you are there, setup an Alert on your name. It will give you any visibility into possible misuse of your name.
  • Twitter Privacy: Twitter by definition is fairly public so there is not much to tweak. However it makes sense to verify that “Do Not Track” is enabled and consider turning off / deleting location data.
  • Facebook: Expect this to change over time. Privacy settings seem to be a fast moving target at Facebook. So much of the business value proposition of Facebook is about eliminating privacy, so this will always be about providing some minimal level of privacy control that that is just enough to keep most users from leaving.

Overall these tweaks to your browsing experience will provide some improved level of security and privacy, but fundamentally much of the browsing process from your client system will still be relatively visible – the contents may protected with SSL/TLS, but where you are going, what you are downloading and how long you are there is not. Specifically, where you are going (page by page by page), how long you are there and how my kilobytes you have downloaded is all visible.  If your ISP / employer / campus / hotel / building has a proxy server between you and the Internet, they have access to this level of information.

Overall I consider these steps to just be good browser hygiene.

Some Better

If this level of exposure bothers you (it may), and you feel a need to mitigate this issue, read on – a VPN / proxy service may be the solution you are craving.

Technically a VPN and a proxy server are two very distinct functions. A VPN (Virtual Private Network) is a secure (i.e encrypted channel) and authenticated (i.e. username/password and server certificate) channel from your client system to some server on the Internet. In the enterprise / business world, VPNs are used to enable authorized users on the Internet access to corporate servers on the private networks. In the world of proxy servers, VPNs are used to provide a secure channel to some proxy server on the Internet.

A Proxy server is simply a relay for your Internet / Browsing traffic. You send some Internet request to the proxy server, and it redirects it to the Internet, with the source mapped back to the proxy server. When the response is received by the proxy server, it is then relayed back to your client system. Proxy servers are not explicitly secure, so they are generally coupled with some form of VPN to provide a secure channel.

There are large number of VPN/Proxy service providers around the world. For the most part, the free ones (reportedly) have a fairly high rate of malware infection and the for pay ones are from $40 to $100 a year. This is not an endorsement – but PureVPN and HideMyAss are both typical for-pay VPN/Proxy Services, with very typical pricing and functionality providing a wide range of target servers around the world.

When using a VPN/Proxy service, the net effect is that any geolocation will place you at (or near) the location of the proxy server. This means that if you are accessing some Internet service with geolocation service qualifiers (e.g. bbc.com, nfl.com) , you can appear to be somewhere that you are not. It also means that if your employer, hotel, campus, school has blocked sites/services, you can circumvent these restrictions with a VPN/proxy. In both of these cases you are not likely violating any laws, but you are likely violating some Terms of Service – implied or otherwise.

More legitimately, if you often use public or untrusted WiFi networks, a VPN / Proxy ensures that your traffic will not be sniffed on the local network. If you use WiFi in a high density environment, and are concerned about your network being compromised, or you don’t trust the other users on a shared network – a VPN/Proxy can ensure your traffic is secure / private even if your network may not be.

Ultimately, a VPN / Proxy service can provide a step up in privacy / security for a specific set of threats. However, by using a VPN / Proxy service you are literally handing this same information over the VPN/Proxy service provider – so if your concern is browsing/security in general, you have just shifted the risk.

More Better

From this point, there is one very obvious and better way to achieve better security/privacy – the TOR Browser. The TOR (The Onion Router) Browser is a custom version of Firefox packaged/integrated with a few tools related to The Onion Router, including an Onion Router proxy for your client system. The download package installs easily, and the TOR proxy starts automatically just be launching the TOR browser. If you are serious about using it for the privacy it can provide, read the Warnings FAQ.

The general principle behind TOR is that an outgoing datapacket is encrypted with some relay address on the TOR network, with multiple successive similar layers applied, and ultimately the packet is sent out to the network in which each one of the relays peels off the successive layers – and it is finally sent to the Internet destination. The goal / purpose of this effort is that through this obfuscated path, the user is much more anonymous and their privacy is protected.

In an ideal world, where TOR relays were spread around the world from different organizations it is possible to achieve some level of anonymity. In the real world, some of these relays are operated by agencies with the intent to compromise the TOR network, reducing the effectiveness. In addition some academic research has shown a few other weaknesses related to coordination between TOR relays. The net result is that the TOR network and the TOR browser provide a much high degree of anonymity than any other readily available solution – but it can be broken. For a recent example, refer to the story behind Silk Road shutdown. Details are lacking, but this does show it is susceptible if the incentive is high enough.

Bottom Line

There are a wide range of things you (as a user) can do to reduce your browsing footprint, reduce your ability to be tracked, increase your security and privacy (and anonymity). However, the first step to any of this is to assess what your threats are, and take reasonable steps to mitigate those threats. If you threats are non-specific and general, than it is likely that the non-specific and general browser hygiene solutions are sufficient. If you have specific threats that fit the more elaborate solutions, use appropriately.