Nov 07

CSV Parsing with Erlang

Posted By:  Praveen Ray

Since OTP doesn't provide any CSV parsing capabilities, I decided to write my own based upon gen_fsm. Following the short explanation given in Perl's Text::CSV_XS module, I implemented a simple state machine. Erlang's excellent binary processing capabilities and built in gen_fsm behavior make the code compact, and surprisingly easy to implement. You can download the code from here. A short explanation follows.

The state machine has only following handful of states:

start_field

Start reading a CSV field – it might be double quoted and might have special chars such as \r, \n, comma and double quote. Goto read_field or read_quoted_field, depending upon if a double quote started this field.

read_field

Once a field has started, we switch to this state and read binary bytes until and end of field condition is detected. End of Field is marked by either a comma or a newline. 

read_quoted_field

We're inside a double quoted field; read everything until another double quote is encountered. A double quote might be end of this field or an embedded double quote marked with two consecutive double quotes. Switch to escaped_double_quote if a double quote is encountered.

escaped_double_quote

We come inside this field upon encountering a double quote inside read_quoted_field state. If another double quote is seen, it's an escaped double quote, else, it's nothing special. Both these cases go back to read_quoted_field.

Usage

Following public methods are exported:

parse_csv(File_path)
parse_csv(Binary_blob)
parse_csv(File_path, Options)
parse_csv(Binary_blob, Options)

where Options is:

[{callback_fn, Fun}, {callback_state, term()}]

Fun is a function/2 and gets called with a List of Fields and callback state.

With no Options passed, the return is a list of list of Fields. With callback_fn passed, the Callback is called at the end of each line with a list of Fields.

Examples:


parse_csv:parse_csv("/tmp/data.csv").

Returns:

 

[[<<"Date">>,<<"Source">>,<<"Destination">>,
  <<"Seconds">>,<<"CallerID">>,<<"Disposition">>,<<"Cost">>],
 [<<"2009-09-18 09:44:54">>,<<"5097213333">>,
  <<"18667778888">>,<<"66">>,<<"5098761323">>,<<"ANSWERED">>,
  <<"0">>]]
F = fun(Fields, State) -> io:format("~p~n",[Fields]), State + 1 end.
parse_csv:parse_csv("/tmp/data.csv", [{callback_fn, F},{callback_state, 0}]).
It calls F repeatedly. First with second parameter set to 0, then 1, then2 and so on. Note that your fun must return a modified state which becomes second parameter to callback_fn for the next line.
 
Tagged with:
Sep 30

Duplication is the mother of all evil in programming. Remaining DRY at all costs should be your goal at all times. A pattern repeats itself one time and your code complexity goes up exponentially.
One great way to DRY up your code is via Traits. Traits are functional equivalent of base classes where you can collect most often used functions and inject these Traits into your objects. Here’s one simple way to do it in Javascript (using extjs here):

MyTrait = {
   xhr: function(config) { return Ext.Ajax.request(config) },
   submit: function(basic_form,config) {
                //perform some checks on config 
               // and call basic_form.submit 
   },
   //...add more utility functions
}
function MyGrid(config) {MyGrid.superclass.constructor.call(this,config)}
Ext.extend(MyGrid, Ext.grid.GridPanel, MyTrait);
Tagged with:
Sep 28

Although the extjs grid is an awesome component, IMO, it suffers from one major drawback – it’s inability to place active components within data cells. At least it’s not straightforward. Here’s a simple solution to place clickable links within cells. The trick is to place s which look like links and then intercept rowclick and determine which of many links was clicked on.

Here’s an example of a custom Grid Component:

function MyComponent() {
   MyComponent.superclass.constructor.call(this);
}
Ext.extend(MyComponent, Ext.grid.GridPanel);
MyComponent.prototype.initComponent = function() {
  var action_tmpl = new Ext.XTemplate(
     "<span class='link' id='edit-{id}'>Edit</span>",
     "<span class='link' id='delete-{id}'>Delete</span>"
   );
   action_tmpl.compile();
   var content = {
     // grid related config
     columns: [
        {header: 'First Name', dataIndex:'fname'},
        {header: 'Action', dataIndex:'id', 
          renderer: function(c) { 
              return action_tmpl.applyTemplate({id: id});
          }}
     ]
   }
   MyComponent.superclass.initComponent.call(Ext.apply(this,content));
}
MyComponent.prototype.afterRender = function() {
  MyComponent.superclass.afterRender.call(this);
  this.on('rowclick', this.row_click, this);
}
 
MyComponent.prototype.row_click=function(grid, ri, evt) {
   var rec = grid.getStore().getAt(ri);
   // See Below for within_el method
   if(evt.within_el('edit-'+r.id)) {
       alert('Edit clicked');
   } else if(evt.within_el('delete-'+r.id)) {
        alert('Delete clicked');
   }
}

The within_el method on Ext.EventObject object looks like this:

Ext.apply(Ext.EventObject, {
    within_el:function(el) {
        el = Ext.get(el);
        if(!el)
            return false;
        var evt_xy = this.getXY();
        var evt_x = evt_xy[0];
        var evt_y = evt_xy[1];
        return (evt_x > el.getLeft() && evt_x < el.getRight() && evt_y > el.getTop() && evt_y < el.getBottom());
    }
});
Tagged with:
Jun 12

There are number of ajax javascript libraries and frameworkds out there and it can be excruciating to try and find the best fit your projects and organization. A good library should encapsulate most often used patterns and provide clear and easy to use abstractions. No library, however mature, can be complete, since it’s always possible to find that one most important piece of missing functionality that you need. So, extensionability is a major requirement.

Javascript is an awesome language. It’s super flexible. It empowers the developer with immense flexibility. You can use the power to advance world peace or choose to shoot yourself. It’s totally upto you. As spiderman said – with power comes responsibility. Since there are no private namespaces in Javascript, it’s ultra important that the library you choose NEVER dirties your namespace and lives completely inside it’s own namespace. One of the most popular javascript library – prototype.js – violates this principle completely and IMHO, should be used with care.

Javascript started in Browsers and even today it’s most often found in the browsers. Browsers are the modern UI paradigm. The javascript library must not be limited to cookie-cutter DOM manipulation APIs. That was cool in the 90s. The Libraries now must provide a rich set of UI Widgets. You don’t want to be using two Javascript libraries – one for Ajax and other for UI widgets.

Documentation. If the developer has to resort to ‘grep’ the source code to find essential pieces of functionality, the library becomes a time hog instead of rapid development platform!

We looked at quite a few js libraries :

* JQuery
* Qooxdoo
* Dojo
* Prototype.js
* mootools
* extjs

and settled for extjs as our framework of choice.

Here are our reasons for picking a commercial open source library like extjs.
The overall design of extjs is exemplary. One can learn a lot from it’s unified architecture – no matter which language one is programming in.

It lives within it’s own namespace. Prototype.js was out at this point.

The UI widget set is extremely rich. Dojo, qooxdoo and mootools – although promising, were nowhere close to extjs in widgets collection. Although jquery , with it’s collection of opensource plugins, has a rich collection, it suffers from one major disadvantage. The plugins are from multiple vendors and there is no consistent Object model to dictate their design. Extjs requires you to start with one of their base classes – ensuring a consitent model. Consistency is extremely important for the library to be reusable.

Not to mention, extjs documentation seems to be very comprehensive and well maintained. In a library as comprehensive as extjs, one should always be prepared to look into the source code to fund missing bits but all essential pieces are very well documented.

Many people seem to object to commercial licencing of extjs – however, we believe, the licencing is quite fair and inexpensive. A single developer licence costs less than $300 and one can deploy on unlimited domains. You can develop your application for free and purchase a licence when you go live. For most businesses this shouldn’t be an issue at all.
On the other hand, if your business can’t come up with $300.00, you’ve bigger issues and shouldn’t be worried about javascript libraries !!

Tagged with:
preload preload preload