Lessons of a Madman: 2010

Monday, July 26, 2010

Cloud Computing Models -- What is the future

I was reading through the first chapter of “The Cloud at Your Service” and come to the section that was describing the different types of clouds available.

The three types that they discussed were:

IaaS -- Infrastructure as a service. An example is Amazon EC2. In this model you are provided with access to the bare metal of the machines. You provision the operating systems, installing the applications that you need to run on each type of machine. That way you define the virtual image that you want to run for your app servers, your database servers, etc.
PaaS -- Platform as a Service. This is the model of Google App Engine. You don’t provision the operating systems, but you can only use the languages that they support on the platform.
SaaS -- Software as a Service. This model is where you are subscribing to software that will work for you. It may or may not have the ability to extend the software.

I’m trying to decide which way would make the most sense in the future. I think if we want to utilize the systems most efficiently we may need to look at the PaaS model more. This way developers must understand that the systems that they are making will be used on other machines. But we would also need a way to run the platform locally. To enable developers to test applications in the small scale before putting them up to the cloud.

I think that if a platform was created that encouraged developers to think of problems in a more cloud friendly way, it would scale easier. However the platform would need to handle the most common reasons for performance issues. The major issue that comes up when scaling systems is accessing the database. This could be minimized if the platform allowed the code that needed the data to move to increase the locality of the time and place that the data was needed.

These are just some of my ideas that I needed to get out.

Saturday, July 17, 2010

Distributed Computing

I am thinking about how to do fault tolerant distributes computing.

I think the concept that would make this feasible would be verified functional coroutines. Where there are no side effects that are globally witnessed.

I think this would need to have command nodes that schedule what coroutine is executed on which machine in the cluster. I would want the command nodes to be redundant, so that if one of them failed the computation would still complete. The command node would need to know what each node could process, what processing they specialize in, ex. GPUs specialize in SIMD classes of problems, so if we had a piece that was calling coroutines in that way, we could run them on a GPU focused machine rather than a normal general purpose machine.

The command node would need real time statistics to understand the cost of communication to the nodes, so it could intelligently schedule what would be queued on each machine, and when to shift a coroutine to a machine that is less able to perform the task, but would finish sooner than one that has a longer wait time.

Each coroutine would need to have it’s estimated run time and estimated communication size. Then we could make intelligent decisions as to when to move the computation to a different machine, and when to just run the computation on the machine that needs the result, and when to run the computation on a machine that is faster and transmit the result back.

Every routine has the time that it should take, and an timeout limit. If the result hasn’t returned before the timeout has been reached the command node sends the routine to a different node and marks the node as unresponsive. Once it receives a response from the node, it asks for it’s current load and network traffic to properly rank the nodes.

What we would need in this system is a language that was composed of verified coroutines, that way we knew exactly what each one did in the time between when it started and when it yielded. It would also have to have the size of communication as a function of the input size, and the estimated complexity to know approximately how long it would take to finish the stage of the coroutine.

They would have to be functionally programmed, because then we wouldn’t worry about any side effects. We would still have to think through deadlock scenarios, but I think that would be possible to analyze the code as a compilation step.

Sunday, May 30, 2010

Introduction to Program Tracing

I’m trying to understand more of program tracing.

First thing we have to agree on is what do I mean by program tracing. Program tracing is collecting the dynamic execution of a program. What this means is actually understand what methods were executed when.

The most common way that I have come across people doing this is by using Log4J trace logging level. The way that people do this is usually log.trace(“Method ENTER”); Then the method, but the last line of the method is: log.trace(“Method EXIT”);

This works fine sometimes, but what happens when an exception occurs, we will never reach the “Method EXIT” line. Also this has a problem in that it is a very manual process. We have to manually add these statements into the code and it just clutters it up.

There is a simple solution to these problems. The Java JPDI API. This API allows another program to be written that will receive event notifications when threads are started, and methods are entered and exited. This way we don’t have to clutter out source code with the tracing information.

So far I have found one application MuTT that handles multi-threaded tracing, but it does it by outputting each threads trace into a separate file. This shows what each thread does but not how they interacted.

I’m currently looking for a way to keep track of how the threads were interleaved for a specific trace. This way I can find out what really influenced the values of different fields in the code, and the specific path we followed to show an issue.

Monday, April 19, 2010

Analysis of "An In-Vivo Study of the Cognitive Levels Employed by Programmers During Software Maintenance"

I thought I would just start jotting down some ideas around this paper that I read today. The Paper is "An In-Vivo Study of the Cognitive Levels Employed by Programmers During Software Maintenance" by Tara Kelly and Jim Buckley. It was in the 2009 ICPC conference proceedings.

In this paper they start by assuming that Bloom’s Taxonomy of mental processing is used by programmers during maintenance tasks.

I was interested in how they were doing the research. They had 6 programmers speak all of their thoughts during a programming assignment into a voice recorder. They followed this by having the researchers analyze the utterances of the developers. They then categorized the utterances into the different levels of the taxonomy. However they admit that it was very subjective how they were making the categorizations.

They did have a few statistics that I would be interested in looking at the prior sources. One of these stats that I’m interested in reading from the first source, is that 90% of the total system’s cost is in maintenance. They quote this from the article “Leveraging legacy system dollars for E-Business” by L. Erlikh from the IT Pro publication of May/June 2000.

Wednesday, March 10, 2010

Program Understanding

I’m starting research into Program Understanding and Program Comprehension. I am doing this research because software maintenance is the portion of the software development lifecycle that takes the most time and money. Some studies claim that software maintenance can consume 60 - 80% of the total cost of a software project.

I would like to create tools and techniques to enable people to get up to speed faster on legacy codebases. To me every code base is legacy, or will be legacy shortly.

The first idea that I want to look at to understand an arbitrary code base is finding how the classes are related. To get a feel the overall structure of the code base, is it a standard “n-tier” architecture, where all dependencies are downward? Is it a client server code base, where there are two distinct pieces that communicate with each other over a well defined boundary? Or is the code the classic “Big Ball of Mud” architecture that is common in tools that have grown organically since they were started.

Lessons of a Madman