Nov 07

CSV Parsing with Erlang

Posted By:  Praveen Ray

Since OTP doesn't provide any CSV parsing capabilities, I decided to write my own based upon gen_fsm. Following the short explanation given in Perl's Text::CSV_XS module, I implemented a simple state machine. Erlang's excellent binary processing capabilities and built in gen_fsm behavior make the code compact, and surprisingly easy to implement. You can download the code from here. A short explanation follows.

The state machine has only following handful of states:

start_field

Start reading a CSV field – it might be double quoted and might have special chars such as \r, \n, comma and double quote. Goto read_field or read_quoted_field, depending upon if a double quote started this field.

read_field

Once a field has started, we switch to this state and read binary bytes until and end of field condition is detected. End of Field is marked by either a comma or a newline. 

read_quoted_field

We're inside a double quoted field; read everything until another double quote is encountered. A double quote might be end of this field or an embedded double quote marked with two consecutive double quotes. Switch to escaped_double_quote if a double quote is encountered.

escaped_double_quote

We come inside this field upon encountering a double quote inside read_quoted_field state. If another double quote is seen, it's an escaped double quote, else, it's nothing special. Both these cases go back to read_quoted_field.

Usage

Following public methods are exported:

parse_csv(File_path)
parse_csv(Binary_blob)
parse_csv(File_path, Options)
parse_csv(Binary_blob, Options)

where Options is:

[{callback_fn, Fun}, {callback_state, term()}]

Fun is a function/2 and gets called with a List of Fields and callback state.

With no Options passed, the return is a list of list of Fields. With callback_fn passed, the Callback is called at the end of each line with a list of Fields.

Examples:


parse_csv:parse_csv("/tmp/data.csv").

Returns:

 

[[<<"Date">>,<<"Source">>,<<"Destination">>,
  <<"Seconds">>,<<"CallerID">>,<<"Disposition">>,<<"Cost">>],
 [<<"2009-09-18 09:44:54">>,<<"5097213333">>,
  <<"18667778888">>,<<"66">>,<<"5098761323">>,<<"ANSWERED">>,
  <<"0">>]]
F = fun(Fields, State) -> io:format("~p~n",[Fields]), State + 1 end.
parse_csv:parse_csv("/tmp/data.csv", [{callback_fn, F},{callback_state, 0}]).
It calls F repeatedly. First with second parameter set to 0, then 1, then2 and so on. Note that your fun must return a modified state which becomes second parameter to callback_fn for the next line.
 
Tagged with:
Oct 29

 Sending emails is easy. All you need is access to an SMTP server. Or local Sendmail daemon. Installing and configuring either of the two is a massive task. Even if you get it right, your ISP might not like you running SMTP server. If you are past ISP hurdle, chances are your emails will get marked as SPAM for lack of Reverse DNS.

This is an easier way and we all know it – just use Google SMTP server. All you need is a gmail account or Google Apps account – and who doesn’t have a google account?

However, sometimes, the third party software you’re trying to configure to send emails doesn’t support anything other than localhost. Or it does but wants plain SMTP whereas Google does only TLS. Or simply, connecting to Google’s SMTP for each email takes few seconds and you want faster response.

Here’s a simple solution:

  • Install the wonderful Email Relay program from here
  • Create a file :
    vi /etc/google.email.auth

    and add following line: 

    • login client yourgoogleemailid@gmail.com yourgooglepassword
  • chown daemon:root /etc/google.email.auth
    chmod 400 /etc/google.email.auth
  • run emailrelay as:   
    emailrelay --as-proxy smtp.gmail.com:587 --client-tls --client-auth /etc/google.email.auth
  • Configure your software to use localhost for SMTP server (Port defaulted to 25).

Test your Local Emailer. If it doesn’t work, enable logging in /etc/emailrelay.conf by uncommenting verbose line and look into your syslog file (/var/log/syslog)

Tagged with:
Oct 28

Recently I had to create a clone of this blog site so we can apply upgrades and test these out without fear of breaking our actual blogs. Here’s a list of steps I followed. Hopefully, it’ll help others.

Assumptions:

  •    existing blog URL: http://a.com.
  •    New blog URL: http://b.com/wordpress
  •    Existing blog is installed in /var/www
  •    New blog will be installed in /var/www/wordpress
  •    The MySQL database for wordpress content is called ‘wordpress’. Username and passwords are also ‘wordpress’
ssh root@a.com
 cd /var/
 tar cf wordpress.tar www
 bzip2 wordpress.tar
 mysqldump --add-drop-table-uwordpress -pwordpress -Dwordpress -hlocalhost &gt; db.backup
 bzip2 db.backup
 scp wordpress.tar.bz2 db.backup.bz2 root@b.com:/var/www
ssh root@b.com
 cd /var
 tar jxf wordpress.tar.bz2
 mysql -uroot -p<mysql root="" pw="">         
 create database wordpress;
 GRANT ALL PRIVILEGES ON wordpress.* to 'wordpress'@'localhost' identified by 'wordpress*'; 
 exit;  
 cd /var/www  bunzip2 db.backup.bz2 
 mysql -uwordpress -p'wordpress' -Dwordpress -hlocalhost < db.backup  
 cd /var/www
 mkdir wordpress 
 cd wordpress 
 mv ../wordpress.tar.bz2 .   
 tar jxf wordpress.tar.bz2  
 mv www/* .  
 rm -rfwordpress.tar.bz2 www     
 cd /var/www    
 mysql-uwordpress -p'wordpress' -Dwordpress -hlocalhost
    update wp_posts set guid=REPLACE(guid,'http://a.com','http://b.com/wordpress');         
    update wp_options set option_value='http://b.com/wordpress/' where option_name='siteurl';   
    exit;

You should be able to login as admin at http://b.com/wordpress/wp-admin

Goto Settings/General and change all instances of http://a.com to http://b.com/wordpress

You might also have to goto your currently selected theme and make sure URLs are changed. Some themes have URLs embedded in them.

 

Tagged with:
Oct 27

Bing and chinese food

Posted By:  Lori Barkyoumb

Bing. Since Bing has arrived on the scene, the engine has caused many opinions to be expressed on it’s future.

Will it continue to give great search results? Is it because webmasters haven’t learned how to make websites ‘Bing Friendly’ yet? People who are true microsoft haters ( I’m occasionally one of them…) have been looking for reasons to hate Bing because it’s microsoft. I guess only time will tell if Bing is good because it’s authentic and websites haven’t been able to skew the results via knowledge of how the engine works or if it’s truly genius.

Oh yeah, you are probably wondering why I brought chinese food into the mix…

Following my research on Bing, I was hungry and got some chinese food. The backside of my fortune in my cookie teaches you a word in chinese. The word was ‘BING’. The definition was disease. Very coincidental…

Let’s see if Bing makes it or dies off.

Tagged with:
Oct 26

I want to know who is clicking on the google ads that are placed in the middle of paragraphs, before the answer you need to find, and anywhere else they can be jammed on the web page? Okay, maybe I’m a touch intolerant of ad sense. But I have a good reason why. For example, I too fell prey to the lure of googles billions, thought I could make money and implemented google ad sense into the yellowfish website.

Hard as I would try, I could not get the ad sense placements to stop competing with the content. Now here’s where my tolerance started to go south. We are Yellowfish Technologies, an Information Technology firm. Ad sense decided to pick the word ‘yellowfish’ and proceeded to list out crap ads about water, feeding, fish types, yellow belly fish flounder and you name it. It was ridiculous to see this type of information being offered on our technology home front. The competitors it also advertised was another story. I removed ad sense from the Yellowfish site. I guess now I’m not such a big fan, but the bad news is, it continues to haunt me.

I am a victim. I really am. Bad google advertising makes me lose track of what I was looking for. Usually the rest of something interesting… When I see these odd ad blocks hogging up valuable real estate on the web page where the content suppose to be, I can’t believe they are in such big use. The ads are just plain distracting and side-track you. Honestly, I do not know anyone who would stop to click further on them. The Yellow belly fish flounder incident comes back loud and clear.

I am now convinced most people who use and those who over-use ad sense do not pay attention to what actually shows up on their site under these ads. I was having a victim moment the other day and noticed the link options very strange. I think it was out of irritation and to prove it’s all irrelevant stuff displayed in those ad sense links, but I clicked on a couple as an experiment. All 5 of the links I chose brought me to a page without graphics and that was just a series of more random links, too many choices about the same thing, with an occasional link that made a small amount of sense. It’s just as I thought…useless. Oh, and by-the-way, I never went back to that site and found another that delivered information not ads. I book marked it.

I know people are just trying to make an extra buck off google hoping someone clicks on one of them and earns them extra money, but geeze, good sites are falling prey and putting all google ads above the fold and the important stuff below. I can barely read what’s I came there for. To all you over-users of google ad sense, please evaluate if it’s worth switching out your business content that should be easy to find and read for ads on things generally irrelevant and out of your control. Do you really want visitors leaving your site to go somewhere else anyways?

 

 

Tagged with:
Oct 23

Ok, I am all for video on a website. It’s interesting, visually stimulating and its fun to watch. EXCEPT when you arrive at a website and the video blasts off and startles you half to death. How dare they scare me like that? A few choice words later, while I try to get my heart rate under control from the sudden fright… I can’t help but find myself irritated at the disturbance as well as wonder why my speakers were set so loud. I think the use of video on a website and on landing pages are great when used appropriately. But you must allow the user to control when it starts, ends and how loud it will be. Set the default volume to a non-invasive level. Let the user adjust it to their liking. Asking how do you get the user to play the video and know it’s there if it doesn’t blast off? Make it interesting! Draw attention to the display, make the user want to push ‘play’. Use a giant ‘Play me button’ if you need to, but just don’t set the video on ‘auto-play’ with a volume level loud enough to wake moon men.

Tagged with:
Oct 14

RESTful programming is all the rage these days – at least in Rails world. It's a simple way to perform request dispatching which is easy to understand and maps database CRUD operations into web requests elegantly. While Rails comes with REST routing built in, there is no such support in Spring MVC. However, Spring's decoupled design makes it rather easy to implement. Here's code I wrote to implement RESTful dispatches using Spring MVC.(Also available as a download here)

package com.yellowfish.servlets;
import javax.servlet.http.*;
 
import org.apache.log4j.*;
 
import java.util.*;
 
import org.springframework.web.servlet.mvc.multiaction.*;
 
public class RESTMethodResolver implements MethodNameResolver {
 
    public String getHandlerMethodName(HttpServletRequest req) throws NoSuchRequestHandlingMethodException {
 
        Logger log = Logger.getLogger("com.yellowfish.servlets");
 
        log.debug("Resolving method name for the Request");
 
        String method = req.getParameter("_method");
 
        String id     = req.getParameter("id");
 
        String http_method = req.getMethod().toUpperCase();
 
        String controller_method = null;
 
        log.debug("method: "+method+" http_method: " + http_method + " id: "+id);
 
        if(http_method.equals("GET")) {
 
            String uri = req.getRequestURI();
 
            controller_method =  (uri.endsWith("/new") ? "_new" : "show");
 
            if(controller_method == "show") {
 
                controller_method = (id == null) ? "index" : "show";
 
            }
 
        } else if(http_method.equals("POST")) {
 
            if(method != null) {
 
                method = method.toUpperCase();
 
                if(id == null)
 
                    throw new com.yellowfish.servlets.ServletException("id cannot be NULL for PUT and DELETE");
 
                if (method.equals("PUT"))
 
                    controller_method = "update";
 
                else if(method.equals("DELETE"))
 
                    controller_method = "delete";
 
            } else {
 
                controller_method = "create";
 
            }
 
        }
 
        if(controller_method == null) {
 
            log.debug("No Controller Method Found");
 
            throw new NoSuchRequestHandlingMethodException(null, this.getClass());
 
        }
 
        log.debug("Controller method resolved to :" + controller_method);
 
        req.setAttribute("controller_method_name", controller_method);
 
        if(id != null) {
 
            try {
 
                req.setAttribute("model-id", Long.valueOf(id));
 
            } catch(NumberFormatException exp) {
 
                log.warn("id" + id + " is not a Number");
 
            }
 
        }
 
        return controller_method;
 
    }
 
}

Then simply declare this bean in the spring config:

<bean class="com.yellowfish.servlets.RESTMethodResolver" id="rest-method-resolver"></bean>

It resolves incoming URL and Method as per following table:

 

 

REST Resolver Rules
URI GET/POST ID Parameter Present? _method Parameter Present? Method Name
 /server GET  No  N/A index
/server/_new GET No N/A new
/server?id=100 GET Yes N/A show
/server POST No  No create 
 /server POST  Yes PUT update 
 /server POST  Yes  DELETE delete 

 To make life simple, the code above assumes all IDs to be numeric and if ID is present in incoming request, it creates a Request attribute called 'model-id' . 

Tagged with:
Oct 12

Chicanery on Wall St

Posted By:  Praveen Ray

Reading the book ‘Traders Guns & Money‘  by Satyajit Das. It’s an awesome account of greed and ‘anything goes’ attitude on Wall St (and financial markets around the globe).  From ultra complex to borderline illegal instruments concocted by Dealers and Investment bankers around the world to make quick killings , everything that unraveled three years after the book was written, is detailed by the author. It’s interesting, he wrote about these years before the meltdown. But so did Warren Buffet who called these derivatives the financial WMDs!

Some of my favorite quotes from the book:

The most intriguing thing about convertible bonds is that everybody seems to be getting a great deal. It’s a win-win. Markets don’t work that way. Question is, who is fooling whom?

Robert Citron, treasurer of Orange County before it went bust, confident Interest rates wouldn’t rise.: ‘I am one of the largest investors in America, I know these things’

Same guy testifying during investigations, after Orange county lost 1.5B: ‘My brain was unable to audit’ !

Traders work on the theory that what is mine is mine, and what is yours is also mine.

The irony of trying to model chaos, the finding of order in complete disorder, is lost on most quants.

Tagged with:
Sep 30

Duplication is the mother of all evil in programming. Remaining DRY at all costs should be your goal at all times. A pattern repeats itself one time and your code complexity goes up exponentially.
One great way to DRY up your code is via Traits. Traits are functional equivalent of base classes where you can collect most often used functions and inject these Traits into your objects. Here’s one simple way to do it in Javascript (using extjs here):

MyTrait = {
   xhr: function(config) { return Ext.Ajax.request(config) },
   submit: function(basic_form,config) {
                //perform some checks on config 
               // and call basic_form.submit 
   },
   //...add more utility functions
}
function MyGrid(config) {MyGrid.superclass.constructor.call(this,config)}
Ext.extend(MyGrid, Ext.grid.GridPanel, MyTrait);
Tagged with:
Sep 28

Although the extjs grid is an awesome component, IMO, it suffers from one major drawback – it’s inability to place active components within data cells. At least it’s not straightforward. Here’s a simple solution to place clickable links within cells. The trick is to place s which look like links and then intercept rowclick and determine which of many links was clicked on.

Here’s an example of a custom Grid Component:

function MyComponent() {
   MyComponent.superclass.constructor.call(this);
}
Ext.extend(MyComponent, Ext.grid.GridPanel);
MyComponent.prototype.initComponent = function() {
  var action_tmpl = new Ext.XTemplate(
     "<span class='link' id='edit-{id}'>Edit</span>",
     "<span class='link' id='delete-{id}'>Delete</span>"
   );
   action_tmpl.compile();
   var content = {
     // grid related config
     columns: [
        {header: 'First Name', dataIndex:'fname'},
        {header: 'Action', dataIndex:'id', 
          renderer: function(c) { 
              return action_tmpl.applyTemplate({id: id});
          }}
     ]
   }
   MyComponent.superclass.initComponent.call(Ext.apply(this,content));
}
MyComponent.prototype.afterRender = function() {
  MyComponent.superclass.afterRender.call(this);
  this.on('rowclick', this.row_click, this);
}
 
MyComponent.prototype.row_click=function(grid, ri, evt) {
   var rec = grid.getStore().getAt(ri);
   // See Below for within_el method
   if(evt.within_el('edit-'+r.id)) {
       alert('Edit clicked');
   } else if(evt.within_el('delete-'+r.id)) {
        alert('Delete clicked');
   }
}

The within_el method on Ext.EventObject object looks like this:

Ext.apply(Ext.EventObject, {
    within_el:function(el) {
        el = Ext.get(el);
        if(!el)
            return false;
        var evt_xy = this.getXY();
        var evt_x = evt_xy[0];
        var evt_y = evt_xy[1];
        return (evt_x > el.getLeft() && evt_x < el.getRight() && evt_y > el.getTop() && evt_y < el.getBottom());
    }
});
Tagged with:
preload preload preload