JavaScript RegExp weirdness

June 15th, 2010 by Ville Orkas

While doing some localization optimization for JavaScript templates that load the localized strings with AJAX calls, I ran into some strange behaviour in the JavaScript RegExp object. I had to parse through a string that contains calls to dicole.msg function. The most straightforward way to do this was of course with a regexp match. This is where the strangeness occurred. There seems to be no obvious way to get all matches and their subgroups with JavaScript regular expressions. This is a test I used to confirm this:

http://rikshot.ath.cx/jsregexptest.html

Now if you analyze that for a second, it doesn’t really make sense.

  • string.match(pattern) returns the matched string and the subgroups as expected
  • string.match(global_pattern) returns all the matched strings but no subgroups, this is also okay (is it, really?)
  • pattern.exec(string) returns the matched string and the subgroups as expected
  • global_pattern.exec(string) returns the first matched string and the subgroups, this is where the problem lies

For me at least, it would make more sense if global_pattern.exec(string) would return all the matched strings and their subgroups. Now, the only way to get this result is to first use the string.match(global_pattern) to get all the matches and then run through that array using either the string.match(pattern) or the pattern.exec(string) method of capturing the subgroups. Why doesn’t the global_pattern.exec(string) return an array of arrays that contain all matched strings and their subgroups, as would be expected?

Edit: Tested this on the newest Firefox, Opera, Chrome and Safari. All displayed the same results. Now perhaps the Microsoft boys have noticed this issue, because from IE6 to IE8, the global_pattern.exec(string) returns null. Funny.

Inets httpd:parse_query performance problems

October 20th, 2009 by Ville Orkas

Recently I’ve done some work with the Inets http server and ran into problems with the rather useful function in the httpd module called parse_query. Its job is pretty simple: convert the %XX hexadecimals into characters in the HTTP header data and also convert the plus characters into spaces. I had to run this function with a bunch of JSON data, and when the size of the data grew, so did the total execution time of this function. After some head-scratching and profiling, I did confirm that the culprit was the parse_query function, so I had a look in the source.

It seems that the parse_query uses the internal regex module inets_regexp to do splitting and substitution and that is what seemed to slow the function down considerably. When ran with different input data sizes, the execution time grew exponentially! This is unacceptable when handling many megabytes of data at a time. Quick and simple solution was just to replace the parse_query function with one of my own, and as I really didn’t need the plus sign conversion, the following code did the trick.

url_decode([$%, Hi, Lo | T]) ->
 [erlang:list_to_integer([Hi, Lo], 16) | url_decode(T)];
url_decode([H|T]) when is_integer(H) ->
 [H | url_decode(T)];
url_decode([H|T]) when is_list(H) ->
 [url_decode(H) | url_decode(T)];
url_decode([]) ->
 [].

So a word of warning to people using the parse_query function, the execution time grows exponentially with the input size.

io:format(“~p~n”, ["Hello, World!"]).

July 22nd, 2009 by Ville Orkas

Yes, excuse the not very clever title, it had to be done. Anyway, welcome to my blog. This is my first try at blogging and my first try at making an introduction to a blog, so it might not be so coherent here and there but I’ll try to manage. My name is Ville Orkas and I’m currently working as a lead developer at Dicole Oy in Helsinki.

The title is a clue to what this blog will mostly be about: Erlang. I’ve been recently introduced to this wonderful language from Ericsson and I’ve had the opportunity to use it and its rather unique and extremely useful features in a couple of projects. I will write about Erlang and my experiences with it in further posts, but I would like to summarize it briefly in here: Think about distributed servers and services, think about redunancy, think about robustness and fault tolerance, think about load balancing, think about reliability. Now think about doing all these in a “conventional” programming language. After your spine has stopped tingling and the shivers have calmed down, you shouldn’t feel ashamed, as this would be the usual reaction to programmers when presented with these problems. Erlang is different, it turns the situation upside down. It does all these things as standard, and it does them well. You do not have to worry about for example writing the boilerplate code to distribute a messaging system across nodes in a network, not to mention the network protocol for this. Its all in Erlang. I should also mention that Erlang has built-in support for concurrency and almost anything you do in Erlang revolves around lightweight processes. Its just another thing you don’t have to worry about in Erlang, when you would have to worry about it to a lot in “conventional” languages (threads, locks, mutexes, synchronization, headaches, etc).

But as in all things, there is a flipside. Erlang is not the catch-all, holy grail of programming languages, nor should it be. The philosophy of ‘Right tool for the right job’ applies here very well. Erlang has it’s problems. There isn’t a massive community around it yet, as it was open-sourced to the public in 1998, though the amount of people intrested in the language is growing all the time. The documentation, while being generally exceptionally good, varies in quality across different applications. The language itself has some questionable design decisions that have aroused discussion, such as records (A primitive Erlang named data structure) and the implementation of strings (As lists of integers), which I don’t think is a problem at all, but some people are turned off by it.

Anyway, more on Erlang in later posts. This blog is also about programming in general, my thoughts about different controversial issues in programming languages, object-oriented programming, C++, web programming, web site design and implementation, Windows programming and just about anything I think of as interesting in the field of computing.

Thanks and I apologize in advance for all the boring stuff I’m going to write here. Over and out.