Monthly Archives: August 2011

Collection type result – speed comparison of IList.Add and IEnumerable.Concat

From time to time I’m working on some method that returns some collection. Mainly processing some data from input. Often it’s really just couple of conditions, get something from there and here and return it. Because I’m composing these methods too, if I return IEnumerable<T> and later in other method I need to add something (if you’re lost, you’ll see what I mean in example below), I need to use some variable, like array or list and append (or prepend). Boring.

For a while I was wondering, how slow it will be, if I’ll be simply creating new IEnumerable sequences and concatenating these. I was expecting it to be slower, but is it only couple of percents or order of magnitude? Today, when I came to my office, I simply decided to test it.

The first version looks like

static IEnumerable Test1(int[] part1, int[] part2, int[] part3)
{
	IEnumerable<int> result = Enumerable.Empty<int>();
	result = result.Concat(part1);
	result = result.Concat(part2);
	result = result.Concat(part3);
	return result;
}

And the other one

static IEnumerable<int> Test2(int[] part1, int[] part2, int[] part3)
{
	IList<int> result = new List<int>();
	foreach (var item in part1)
		result.Add(item);
	foreach (var item in part2)
		result.Add(item);
	foreach (var item in part3)
		result.Add(item);
	return result;
}

Although it, especially the other one, can be written in different way(s), as a measure I think it’s OK. And it’s close to how I often process the data.

I did couple of runs to eliminate some errors, with “Release” build, without attached debugger. The part1 and part3 parameters were always the same in size. The part2 I was playing with, because the size affects the speed too.

If the part2 size was roughly under 10k items, the speed difference was on the edge error of measurement. From 10k+ to 1M items it’s about 25% the Concat approach being slower. Some absolute numbers (averages from 20 runs, “Release” build, without attached debugger) from my laptop:

Size: 1         Time1: 0        Time2: 0        %: 0
Size: 2         Time1: 0        Time2: 0        %: 0
Size: 3         Time1: 0        Time2: 0        %: 0
Size: 4         Time1: 0        Time2: 0        %: 0
Size: 5         Time1: 0        Time2: 0        %: 0
Size: 6         Time1: 0        Time2: 0        %: 0
Size: 10        Time1: 0        Time2: 0        %: 0
Size: 100       Time1: 0        Time2: 0        %: 0
Size: 1000      Time1: 0        Time2: 0        %: 0
Size: 6000      Time1: 0,2      Time2: 0        %: 0
Size: 20000     Time1: 2,05     Time2: 0,5      %: 24,390243902439
Size: 60000     Time1: 6,55     Time2: 1,6      %: 23,3576642335766
Size: 100000    Time1: 11,8     Time2: 3,15     %: 24,7899159663866
Size: 1000000   Time1: 124,85   Time2: 33,2     %: 26,9609775325187

Conclusion? If the data is relatively small, the path you choose doesn’t really matter. For “bigger” collections the imperative approach provides better performance.

Stored procedures vs. indices and Entity Framework

Sometimes I came to discussion about Entity Framework not being able to use (map) particular stored procedure somebody wrote to do something very quickly and/or efficiently (kind of ;) ). You know, it’s boiling water for coffee, printing invoice and sending flowers to cafeteria girl down in a hall.

Not always this is a good optimization. Don’t get me wrong, I like stored procedures, if used properly. But sometimes the solution is easier. More and more are people forgetting about indices. Something databases are very good at using. And not only using, also maintaining and defining and so on. Proper index in heavily used query can make it order of magnitude faster. Especially for huge tables (when on proper fields).

The conclusion? Don’t immediately try to jump from sets and plain query definitions into imperative programming in stored procedures. Set operations are still very fast, database optimizers can do magic when it’s just query definition and indices are in place. And it’s way easier to live with index than to maintain stored procedure.

All and Any optimization in Entity Framework queries

When I’m teaching my Entity Framework trainings, I’m always begging to look, at least from time to time or when you see the query looks complex, to generated SQL statement. And if you have (near to) real data, also execution plan. Although Entity Framework helps you with standard data access layer, it’s not magic – the query translation is complex process and sometimes what you capture in LINQ query isn’t exactly how you’d express it in SQL. You simply have different concepts in LINQ vs. in SQL.

Last week I was writing some decision algorithms based on data and I was accessing it, of course, using Entity Framework. Because the conditions we’re complex I was writing these as it came from my head to my fingers. The day after I was writing similar condition, only one or two options negated and I wrote it differently. Basically I was swapping All and Any methods. These two are interchangeable, if you change conditions accordingly.

As an example let’s have and condition: “All apples are green.” aka “All(apple => apple.Color == Green)“. But you can also say “No (any) apple is non-green.” aka “!Any(apple => apple.Color != Green)“.

Now the magic comes to play. You might think, well, if it’s interchangeable, then it’s good, as Entity Framework can always utilize EXISTS predicate from SQL. For simple queries maybe. But if you think about various places where the condition can occur and how easy is to negate the condition you immediately have a lot of problems in front of you. Add to this database engine optimized, where it can or can’t use properly indices, reorder conditions to create smaller intermediate result sets etc. A lot of places where the machine needs to (try to) figure out what’s best way of getting your data for you.

Sadly there’s no rule of thumb, like always use Any. Only one good and 100% working advice is to always check the query and execution plan. But even with i.e. All the result could be absolutely fine.

Kindle’s “furthest page read” solution when reading book multiple times (or by more people)

I wrote a follow up post.

Kindle has a nice feature that keeps your furthest page read synchronized across all devices. Sadly it has one or two problems. First, it’s really furthest page read, hence if you start reading the book from start later again, it is still keeping the furthest location, which is basically the end of the book. And similarly, if you have more Kindles under one account and more people are reading the same book. But this one is kind of expected.

Though you can reset the furthest page read easily through Amazon support (didn’t tried) or by juggling with turning off and on synchronization and redownloading the book from archive (didn’t tried either), I found an easy solution. Before e-books we were using bookmarks, I mean real bookmarks. Piece of paper (or some fancy materials like leather) inserted between pages where you stopped reading. Voilà. Same concept we have in Kindle. It’s little bit more powerful, but the basics are same.

So my solution works like this. If stop reading, I put there a bookmark. When I later begin to read again, on different device, I simple go to last bookmark (if you have some other bookmarks further in book, you’ll need to recall from excerpt which one is correct, but I believe you, as me, often end/start on “milestones” like (sub)chapters or at least paragraphs where some idea ended). Bookmarks are synchronized across devices and you can have more than one in book – good when you’re reading the book with somebody else. Of course, from time to time, I remove previous bookmarks, to keep just the last one and have it clean.

Most of the time I’m reading the book only once, and then looking for specific passages, but sometimes it’s just worth read it again. Knowing where I ended bothered me and asking support to reset it, isn’t in my opinion good experience. But I think bookmarks are solving it pretty well.

My first real BlackBerry application – SA SMS Booking

It has been few days my first BlackBerry application was approved in AppWorld. I’m trying to learn how things are done in this world and real world application is in my opinion best. In fact I had same application previously on my Windows Mobile devices, though only I and, I believe, one friend used it. Anyway because I knew what I want, it was good starting point and good motivation, because I’m and I’ll be using it too.

So what is this application about? It’s pretty simple. There’s a local company called Student Agency that’s running nice bus lines between biggest cities and also allows you book the seat, change reservation, check availability etc. via SMS. Great if you need to change your plans during the day without access to internet. Only problem is that you need to send these text messages in specific and exact format. Learning these is boring and typing even more. Here the SA SMS Booking application comes handy. You simply select from options on screen and the message is created for you (you see it, so you can learn all the commands if you want) and you can immediately send it from application as well (no need to do copy and paste). And that’s it. Nothing more, nothing less. Actually one more thing. The application remembers your ticket card number (kind of your internal ID in the booking system), because that’s almost always the same.

The SA SMS Booking is free.

Here’s the screen (only one) of the SA SMS Booking 1.0:

It’s basic. Focusing only to do the thing you want to do as quickly as possible (I’m sometimes trying to book a seat while trying to catch the bus itself :) ).

The design is, yes, none. I’m not a designer, hence for 1.0 I used the default look of elements. Anyway, if you’re interested in creating a design for it, feel free to drop me a line. The application is free, so only paycheck will be your name on screen on something like that.

Size of application in old days and now…

Maybe you still remember days where we were trying to make application smaller to fit it to floppy. UPX, ARJ and all this stuff. Then the internet came and the limits were transformed to to application size itself, but to a download time of the installer. Yes in these days every application had installer, even the simplest one. And do you remember the magic numbers around how quickly it will download on 56kbits modem? Nostalgia.

And I think the history repeats self now. Obviously nobody cares how big the application is if it’s distributed on CD or DVD, it’s big enough. Same for download. In fact I think only few of you are using CDs/DVDs now. It’s easier to download the image from internet (or the application itself). But more and more applications run in browser, using JavaScript (and HTML, CSS) for doing something useful. And though we have reasonably fast lines now it still matters how quickly the code files will be downloaded, because it might take significant waiting when i.e. there’s a lot of JavaScript files or the files are huge. Every good site is minimizing JS code and also compressing them, like the old days.

Wondering what the next step will be…