Posts Tagged ‘Databases in general’

24
Jan

InfoQ interview with me

I was recently interviewed for InfoQ article – Q&A with Jiri Cincura of the Firebird Database Project. We touched Firebird, ADO.NET, O/RMs, Entity Framework etc. If you have any questions, feel free to ask.

19
Dec

ADO.NET provider for Firebird 2.7 released

I’m happy to bring you early Christmas gift packed as ADO.NET provider for Firebird version 2.7. This version brings important bug fixes (tracker.firebirdsql.org) and logging improvements.

This release wouldn’t be possible without support of people/companies using provider actively. Big thanks to them.

You can download it at www.firebirdsql.org or use NuGet package.

Enjoy!

9
Dec

Sorting using blob column on Firebird

Imagine you have a blob column and you want to add sorting clause to your query based on that column. Crazy? Might be. On the other hand, why not?

Firebird allows you to use blob column for sorting. No problem. But the behavior might surprise you. I’m not going to deeply describe how the blobs are stored in Firebird database. Simply speaking, it’s stored in separate data pages and inside row only blob id is stored. If you use blob column for sorting, Firebird isn’t fetching the complete blob (though looks straightforward, it would be very slow), but rather uses blob id for sorting. You probably see the problem already – the blob id has nothing to do with content. Hence the sorting will be very likely broken.

But there’s a solution. I’m assuming that you want to mainly sort on text blobs (though you can use it on binary blobs too). Simply cast the blob to i.e. varchar(20) (choose length that fits your needs) and sort using this. Yes, it’s going to be slow, but if you need to do it often, you can precompute this column (using trigger etc.).

26
Aug

Stored procedures vs. indices and Entity Framework

Sometimes I came to discussion about Entity Framework not being able to use (map) particular stored procedure somebody wrote to do something very quickly and/or efficiently (kind of ;) ). You know, it’s boiling water for coffee, printing invoice and sending flowers to cafeteria girl down in a hall.

Not always this is a good optimization. Don’t get me wrong, I like stored procedures, if used properly. But sometimes the solution is easier. More and more are people forgetting about indices. Something databases are very good at using. And not only using, also maintaining and defining and so on. Proper index in heavily used query can make it order of magnitude faster. Especially for huge tables (when on proper fields).

The conclusion? Don’t immediately try to jump from sets and plain query definitions into imperative programming in stored procedures. Set operations are still very fast, database optimizers can do magic when it’s just query definition and indices are in place. And it’s way easier to live with index than to maintain stored procedure.

24
Aug

All and Any optimization in Entity Framework queries

When I’m teaching my Entity Framework trainings, I’m always begging to look, at least from time to time or when you see the query looks complex, to generated SQL statement. And if you have (near to) real data, also execution plan. Although Entity Framework helps you with standard data access layer, it’s not magic – the query translation is complex process and sometimes what you capture in LINQ query isn’t exactly how you’d express it in SQL. You simply have different concepts in LINQ vs. in SQL.

Last week I was writing some decision algorithms based on data and I was accessing it, of course, using Entity Framework. Because the conditions we’re complex I was writing these as it came from my head to my fingers. The day after I was writing similar condition, only one or two options negated and I wrote it differently. Basically I was swapping All and Any methods. These two are interchangeable, if you change conditions accordingly.

As an example let’s have and condition: “All apples are green.” aka “All(apple => apple.Color == Green)“. But you can also say “No (any) apple is non-green.” aka “!Any(apple => apple.Color != Green)“.

Now the magic comes to play. You might think, well, if it’s interchangeable, then it’s good, as Entity Framework can always utilize EXISTS predicate from SQL. For simple queries maybe. But if you think about various places where the condition can occur and how easy is to negate the condition you immediately have a lot of problems in front of you. Add to this database engine optimized, where it can or can’t use properly indices, reorder conditions to create smaller intermediate result sets etc. A lot of places where the machine needs to (try to) figure out what’s best way of getting your data for you.

Sadly there’s no rule of thumb, like always use Any. Only one good and 100% working advice is to always check the query and execution plan. But even with i.e. All the result could be absolutely fine.

3
Jun

ADO.NET provider for Firebird version 2.6.5 released

I’m proud to announce new version of ADO.NET provider for Firebird – 2.6.5. It’s half maintenance release, half new features.

You can find all bug fixes in tracker.

The new features include and improvements:
* Support for Trace API in Firebird 2.5.
* Improvements in SQL generation for Entity Framework.
* Support for commands logging .
* Slightly faster command execution of big queries.
* And a lot of small code improvement making it more stable…

You can download it at http://sourceforge.net/projects/firebird/files/firebird-net-provider/2.6.5/ or http://www.firebirdsql.org/en/net-provider/.

Hope you’ll enjoy the release.

25
Apr

Some thoughts on denormalization

From time to time I have an idea for helpful project I could do to help myself or people around me to finish some task faster/easier. And, of course, it often involves database. And though I’m a strong believer in normalization (3NF is a must), at least during initial design, sometimes I see this project being partly simple and partly I want it done in no time (because I’m doing it in my own spare time) I think about storing data denormalized and doing the work in application.

Classical variation is user and his/her permissions. It’s a standard 1:N case (also might be M:N). The proper way is to create two tables and use foreign key to ensure data integrity. But often you feel it’s not so crucial function and maybe you’ll have less than five permissions, you think, storing it in column comma separated is good idea and will cut the time significantly.

You might be right. But you feel it’s not correct. It’s bad. It’s not for future extending of application (and we all know it’ll happen 8-)). Today I realized I can have all from both worlds – good design and quick development.

Design it normalized, because it’s what you should do. But then you can create a simple view where you use some kind of “LIST” function (i.e. Firebird has exactly that named one). This view will create you denormalized form of data. With a small help from triggers you can also update that view hence underlying data (left as an exercise for reader).

select list(r, ',') from
(
  select 'r' as r from rdb$database
  union all
  select 'r' from rdb$database
  union all
  select 'r' from rdb$database
);

Because you write this only once (yes, probably you have to rename some table next time), for next “fun” project you’re designing tables as it should be and because you have it already ready in tool-belt you have the ease of development too.

My head-split is solved. :)

14
Mar

Type safety of SQL commands

I don’t know why, but often I hear that the SQL commands are not type safe. That’s kind of badly said. No it’s wrong, without the proper context.

SQL commands are actually type safe. Did you tried to compare blob and date? Yep, it doesn’t work. And you get the error immediately during the prepare phase (or during compilation of i.e. trigger). Yes, you can compare int and date, because there’s a lot of implicit conversions in every database engine. But basically, if you wrote something wrong, the engine will let you know very quickly.

The problem lies in other detail. Nobody creates and application based only on SQL commands. Your application is probably something like C# or Delphi application, created from C# or Delphi code and compiled. And inside this code you have SQL commands, very probably strings. The C# or Delphi compiler isn’t able to verify the commands, hence to be sure it’s OK, your application needs to execute these commads. And that’s runtime.

Summary? No, SQL commands are type safe but only on server. Written in your applications code it’s still type safe, but not from point of view of type safety of your code and compiler’s rules.

It’s similar to some kind of hypothetical “eval” function in i.e. C#. If you write directly the code, it’s OK. But once you put something into this function (corresponds to database engine), compiler can do nothing with it, until executed.

18
Nov

ADO.NET provider 2.6.0 for Firebird released

Firebird .NET provider team is proud to announce next version of ADO.NET provider for Firebird – 2.6.0.

This version contains various bug fixes as well as new features and improvements. And of course support for some of new Firebird server features. Highlights:

All changes can be found at tracker.firebirdsql.org.

You can download it at http://sourceforge.net/projects/firebird/files/firebird-net-provider/2.6.0/ or http://www.firebirdsql.org/index.php?op=files&id=netprovider.

Thanks to all who helped improving and hunting issues.

8
Nov

FBCon 2010 Bremen – topics on Firebird & .NET

I almost forgot to announce the topics touching ADO.NET provider for Firebird. This year I have three sessions. You can find more info, and register, of course, at firebird-conference.com. Here’s the list.

So if you’re interested in any of these topic, come, ask, participate. Especially hearing scenarios in what you’re using the .NET provider is very valuable for me. I also like talking between sessions, I always learn something new and I’m eager to to my best to answer all questions people are asking.

6
Oct

Using CTEs to generate ranges and/or fill holes

Common Table Expressions are a tool to express things in SQL without need to write long queries or use procedural SQL. The ability to write recursive queries is even nicer and yesterday I had an idea how to use it little bit differently.

Often I’m generating data for reports and I’m trying to do as much as I can on database server. One common problem I’m facing with these reports is the need to include some “zero” data even for, in this case, days, where no data were available.

Imagine, you’re doing something like this (Firebird syntax).

select data.d, sum(data.i) from
(
	select current_date as d, 1 as i from rdb$database
	union all
	select current_date, 1 from rdb$database
	union all
	select current_date+1, 2 from rdb$database
	union all
	select current_date+3, 5 from rdb$database
	union all
	select current_date+3, 2 from rdb$database
) data
group by data.d

It’s OK, but you’ll get data only for date that are present in database. But because you’re generating report for whole month, as I did, it would be nice to have there the data too even with “zeros”. Because then the logic on application side is much simpler, you just need to deal how to present the report’s data.

If you have CTEs, it’s pretty easy to use the recursiveness and generate the sequence, join it with original data and do what you have to do.

select range.d, coalesce(sum(data.i), 0) from
(
	select current_date as d, 1 as i from rdb$database
	union all
	select current_date, 1 from rdb$database
	union all
	select current_date+1, 2 from rdb$database
	union all
	select current_date+3, 5 from rdb$database
	union all
	select current_date+3, 2 from rdb$database
) data
right outer join
(
	with recursive padding as
	(
		select current_date as d from rdb$database
		union all
		select dateadd(day, 1, d) from padding where d < dateadd(month, 1, current_date)
	)
	select * from padding
) range
on (data.d = range.d)
group by range.d

I'm using here right outer join just to keep the original data above the sequence I'm generating. But you can swap the tables and use left outer join as it may be easier to read and maintain.

The code is pretty straightforward and if you used CTEs ever before, you'll be able to read. I'm simply generating the dates until I'm ready with the sequence (in this case one month) and then joining with original data. The outer join will ensure the holes are filled and nulls are there. These nulls are later processed with coalesce to "zero" values.

If you want to directly control the number of elements in sequence, simply use some counter.

with recursive padding as
(
	select current_date as d, 0 as cnt from rdb$database
	union all
	select dateadd(day, 1, d), cnt+1 from padding where cnt < 10
)
select * from padding
4
Oct

Firebird 2.5 (final) released

Today the Firebird 2.5 is being released. You can read the press release and most importantly release notes. And sure, you can download it and use/test/deploy.

I’m not going to repeat what’s written in documents above. Instead I’m going to provide some of my information about using it and some thoughts from .NET provider view.

To be honest I’m using Firebird 2.5 in production environment since betas (I made a small mistake and wasn’t able to easily go back and was lazy to do the long process :) ). And I’m more than pleased with the stability. Even the betas were more stable than 2.1, where I, probably thanks to environment, experienced I/O problems from time to time. The language features are nice too. I don’t have any personal favorite, all are great addition to whole ecosystem. Actually I do have. It’s not a language feature, but a fixed behavior. And it’s affecting .NET provider.

In some cases the left outer join produced wrong results. Unluckily this error was exposed when discovering database structure for Entity Framework, so even you could use 2.1 in production, for development (at least for model generation) you had to use prerelease versions of 2.5 and that’s, if nothing else, little bit inconvenient. Not taking into account some internal protocol and core improvements that are now exposed in provider as well. I.e. cancellation of running command to pin point one.

I could write more and more about the .NET provider and new version. True to be told, I like every improvement I do in provider that may help other to fully unleash the power of Firebird.

Congratulation to us, the Firebird Project, especially the core team. And also to you, users, I hope you’ll enjoy and like the new Firebird 2.5 version as we (I) do.

Note: The MindTheBird campaign team will run a webinar today at 13:00 GMT in anticipation of the launch of Firebird 2.5 Final Release. See the details.

10
Sep

Multigenerational architecture (in databases) and immutable structures (from functional programming)

I’ve just realized something interesting.

Right now we’re kind of experiencing the renaissance of functional programming and some (well many) functional concepts are heading into mainstream programming languages.

One of the important concepts in functional programming is, that everything is immutable, hence if you want to change it, you need to create new item with new value(s). Consequently this is very good for parallel programming, as when nothing can change implies there’s no shared state.

And one of the segments, where parallel programming is used heavily is database systems. This brings me to my point. You know MGA/MVCC is used in Firebird, but only in it, also with some modifications in Oracle Database, MS SQL Server, Postgres, …, in fact many todays modern RDBMSs. But the concept of MGA is actually idea of using immutable data structures. Yes, it uses some additional concepts to fully support ACID and scale well in some particular cases etc., but the core idea is same.

Isn’t it nice? Sometimes you know two things and then suddenly you realize both are based on same idea and are basically same. Connecting dots…

24
Aug

Vyjmenovávat sloupce v SQL příkazu nebo ne???

Nedávno se na Builderu lehce rozjela diskuze, ačkoli v threadu na odlišné téma, jestli sloupce v SQL příkazech vyjmenovávat nebo ne. Nedá mi to a přispěju taky svou troškou.

Já bych “problém” rozdělil na dvě části. První je SQL kód v aplikaci. Druhou je kód v rámci pohledů, spouští, uložených procedur atp.

V prvním případě jasně a striktně sloupce vyjmenovávám. Jednak mám pocit, že to přidá na přehlednosti. Nejenom v rámci zápisu, ale i když později človek kód prochází a třeba něco přidává – vidí co do aplikace přichází a jak se to jmenuje. Případně se tak triviálně (aka Find) dají hledat závislosti. A jednak si myslím, že to ukazuje, že člověk nad výsledkem přemýšlel. Samozřejmě platí i to klasické, že při změně struktury nedochází k neočekávanému chování, kdy aplikace tahá sloupce, o kterých nemá ponětí a nedokáže je využít.

V druhém případě to již tak striktně nemám, ačkoli stále jsem daleko od volnosti. Sloupce opět vyjmenovávám, ale mám pár – v pro mě velmi opodstatněných, triviálních případech – “povolení”, kdy nevyjmenovávám. V zásadě se jedná o případy, kdy daný objekt jen k datům něco přidává resp. je překlápí. Typicky pohled, který k sadě přidává třeba RANK. Nebo trigger kopírující data 1:1 navíc do jiné tabulky. Potom si dovolím použít * (a někdy i přesto sloupce vyjmenuji). Jinak standardně vyjmenovávám.

Kdejaký nástroj vám všechny aktuální sloupce dokáže vygenerovat: Flamerobin, SQL Server Management Studio, IBExpert, Database Workbench, … Slušně si je naformátovat do zápisu je pak triviální.

Proč psát něco, čehož chování se může změnit, aniž bych se o tom ïmplicitně dozvěděl. A samozřejmě stále platí to co jsem napsal výše o přemýšlení nad tím co píšu.

4
Aug

Microsoft.Data.dll after crop

Last 24 hours was interesting. Almost everybody doing something with databases and .NET was talking about new Microsoft.Data.dll. And there has been recently a lot of clarification (i.e. this or this) (check your favorite search engine for more) what it is and what it is not and who’s target audience.

From what I read emerges that it’s focused on starting developers who may not even be interested in being “real” developers. Somebody who wants to just put some site together and run. Be close to PHP world style of working. Together with WebMatrix.

I’ll not think about whether it’s good for MS’s ecosystem or not. Although I see the the points as valid, I still don’t get it. If you’re a beginner, really beginner and you wanna create some site, would you spend some time on creating it or rather try to install i.e. WordPress. It’s probably the same challenge, but the result will be different. Not taking into account, that you would still need to have at least basic knowledge of (X)HTML, programming and databases and SQL when doing it on your own, isn’t it? OK, so maybe the novice just wants to create some site on his/her own. Learn something. Improve own skills. But all this stuff isn’t going to give him/her touch of how it’s really done. You’ll be learning something that useless for bigger, real-world, applications. Umm, than he or she may not be interested in expanding skills, really just create something for fun, you might think. But do you really think this case exists? Doesn’t it sooner or later fall into one of two categories above?

And the dynamic point of view? I read somewhere that the target audience is expected to use not so sophisticated editors etc. (BTW isn’t Visual Studio Express free???). But that’s not what bothers me. What bothers me, it’s the lack of compile-time checking (which is what I damn like when I develop). You’ll upload the site to web a you’ll have to check a lot of code (pages) to see whether it works. The refactoring is harder; ahh damn, that’s not what the target audience is going to be doing or will they? Don’t forget the find&replace is refactoring as well, and here the compile time checking is really useful?

On the other hand, maybe it’s a good idea, and only because we’re not novices and the “product” isn’t enterprise ready, we should understand it and try not to judge it with enterprise-ready eyes. Because we are maybe spinning in our own world not seeing outside the box. But I personally still think the idea how it’s done it’s wrong.

3
Aug

Let’s try something new… What about Microsoft.Data.dll?

The new Microsoft.Data.dll seems to be very “interesting”. 8-)

I generally agree, that pure raw ADO.NET is more verbose than the shown example. And that’s all. On the other hand, everybody using pure raw ADO.NET have written couple of helper methods already. And directly for his/her desired approach, with non-dynamic typing. This isn’t going to make anyones life easier. As an experiment with dynamic keyword it’s nice. But a stuff used in real world deployed application? No so sure.

Wanna go easy path? Go LINQ to SQL of Entity Framework with just 1:1 model generated from database (if we’re on MS’s stack, NHibernate or LLBLGen Pro will do the work as well). Wanna go pure SQL, all under own control? ADO.NET is waiting for you. But this, this looks like kind of mixture of both, but nothing good resulted.

By the way, will there be any support for parametrized queries? I hope so. Because if not, then it’s even more funny than it already is.
UPDATE: There’s a support: http://blog.andrewnurse.net/2010/08/03/MicrosoftDataItrsquosNotAsEvilAsYouThink.aspx.

For me it’s a step back. Or am I missing something?

23
Jul

Is ORM just about bridging the gap or are we trying to wrongly match ER world into OO world?

Recently I’ve got to think about ER world and OO world. There’s a lot of people around there who understand ER or OO world very deeply. In fact ER world is well formalized, described and has a strong mathematical background. A lot of theoretical work is behind OO as well. But what about ORMs?

Is it about the fact we understand both worlds correctly and it’s good we’re trying to bridge them using the power of machines instead of our hands and brains all over again or the exact opposite? Trying to wrongly create something that automagically does this bridging even if the bridge itself will be bloated, slow and creating ugly results?

Have you ever seen query from Entity Framework, LLBLGen Pro, Hibernate, NHibernate, …, name yours? Don’t blame the tool. I think any of them is doing great job translating the query from some king of object oriented way of expressing what I want to i.e. SQL. The problem is, that we’re not querying database, but objects mapped to database objects. Sometimes with very clever mapping. The tool then has to reason about some general transitions from objects to tables and then creating the query. Sure a room for improvement is here. But is it worth?

What other options do we have? Object databases (or post-relational)? Relational programming (probably not)? Maybe just the ORMs idea is wrong. Maybe the bridging is good, and we just need to start from the other end and do it differently. Maybe allow lower the abstraction for those who understand both worlds. And remember these worlds proved self to be very useful doing what they were designed to do. So we just need to find a way to work in both with less friction; or did we find it?

4
Jul

Firebird Embedded in comparison to SQL Server Compact Edition 4

Scott Guthrie recently posted article about New Embedded Database Support with ASP.NET. This made me think about other options, Firebird in particular, and advantages and disadvantages. What I’m going to do is to very shortly introduce Firebird Embedded here and then compare it with features Scott wrote in his article.

Firebird Embedded, shortly, is Firebird database server in one DLL. No need to install etc., just load this DLL and use it. To be precise, there are some other DLLs, i.e. to support national charsets, but it’s still in under 10MB all. The database itself is built from same sources as “full” server and it’s not limited in any way.

Works with Existing Data APIs – as I said, Firebird Embedded is based on same codebase as “full” server, thus the SQL and API is same. And not only this, the ADO.NET provider for Firebird works with it and you’re programming using same thinking.

No Database Installation Required and Database Files are Stored on Disk – databases created by Firebird Embedded are stored wherever you want, with any extension. Firebird itself doesn’t have any master database, thus even the “full” server works with any (valid database) file.

Shared Web Hosting Scenarios Are Now Supported with SQL CE 4 – sure you don’t need to install anything with Firebird Embedded, that’s why it’s called Embedded. And not only this, from version 2.5, you can open database from different processes, i.e. IIS worker process and Apache workers or some console application doing something in background.

Visual Studio 2010 and Visual Web Developer 2010 Express Support – as the tool set for Firebird Embedded is exactly same as for “full” Firebird, you can use DDEX (aka Server Explorer support), Entity Framework (LINQ), …

Supports Both Development and Production – this is something I’m silently expecting. But yes, you can do the same with Firebird Embedded too.

Easy Migration to SQL Server – ahh, my favorite point. The Firebird Project has of course whopping number of tools to support migration from Embedded to “full”. The most used is … nothing. The databases are fully compatible and you can switch servers without any other tools, migration, conversion, … Just place it where you want it and connect to it, either with “full” or Embedded Firebird. And to switch your application? Again nothing. Same ADO.NET provider, just change connection string, if needed. Mostly you’ll add server IP address and maybe different path and manually switch server type, if you want. Really the migration is so simple. It is one minute task and thanks to same tool set, no matter what you version your targeting during development, your application will work with the other as well without any additional effort.

You like Firebird Embedded? I do, a lot. SQL Server Compact Edition 4 looks promising, but Firebird has something to offer as well. And recall, it’s based on same sources as “full” Firebird, very mature codebase, examined with tons of installations.

8
May

CUD stored procedures “improvements” in Entity Framework 4

The EDM designer in Entity Framework 4 has a “nice” new feature. You, know in EFv1 you was forced to map all three CUD procedures or nothing. If you didn’t do that, the validation feature told you. If you do the same in EFv4 the validation succeeds. Great, you may think. Finally I can use just procedures I really want (if you whatsoever want to use SPs and tight the logic into database when using ORM) to use and the rest will be generated for me. But nope. In fact the error will be thrown in runtime(!!!).

It’s described in documentation. But I couldn’t help but wonder, who did this decision??? I hate every error in runtime, I like my static compile time checking. It’s much safer, in my opinion.

25
Apr

New Translate<T> and ExecuteStoreQuery<T> (+ExecuteStoreCommand) on ObjectContext in Entity Framework v4

I don’t know whether it’s somewhere specifically pointed, but the ObjectContext in Entity Framework v4 has two (three) new handy methods. And I like these.

It’s kind of escape hatch similar to DefiningQuery. First method is Translate<T>. It takes DbDataReader and materializes the data back into entities. It’s similar to Materialize method from EFExtensions. If you some code in pure ADO.NET and you don’t have time or resources to redo it in EF (or it’s way easier old way) you can rewire the result into existing objects. I like it. Whenever I’ll feel I need to get dirty (and probably due to performance reasons) I can do it pretty easily.

using (testovaciEntities ent = new testovaciEntities())
{
	IDbConnection conn = (ent.Connection as EntityConnection).StoreConnection;
	conn.Open();
	using (IDbCommand cmd = conn.CreateCommand())
	{
		cmd.CommandText = "select * from master";
		using (DbDataReader reader = (DbDataReader)cmd.ExecuteReader())
		{
			MASTER[] result = ent.Translate(reader).ToArray();
			Console.WriteLine(result.Length);
		}
	}
}

The other method is similar and simplifies the process of getting dirty if you simply need to run you fine tuned query with neat and sexy constructs. :) It’s ExecuteStoreQuery<T>. This method simply allows you to run any sql command directly in store language (thus you can use all features your database offers) and fetch and materialize back resulting entities. Similar to this is ExecuteStoreCommand which is similar to ExecuteNonQuery from pure ADO.NET. But you can do this without the method easily too, the method is just more convenient.

BTW also note that the Translate method isn’t adding the entities into context, it’s just about fetching and materializing.

Next Page »