Tag Archives: Cloud

Timestamp property on Windows Azure Table entity

Every entity in Azure Table Storage has PartitionKey, RowKey and Timestamp property. PartitionKey gives the “bucket” where the entity will be stored (entities in one partition can be on more machines, though). And the RowKey is simply the key of that row. Queries where you’re matching both keys are fast. Matches inside partition (you know PartitionKey but only part (or none) of RowKey) are slower. Cross partition matches even slower. You get the idea. But what about the Timestamp?

The Timestamp property is maintained by server and it’s the time the entity was last modified. Simple as it is. Because the property is always there, I thought it would be good to actually use it. But it’s not. This property is not being “indexed” (not even slightly :) ) or anything like that. That means any query including condition on it will be as slow as table scan (or any other property condition). And that’s something very very slow, especially if you have dozens of partitions with millions of records and you’re doing range condition (not mentioning the limit on one batch for result).

And I learned that by my experience. :) Because the secondary indices are not here yet, the only option, if you’re doing really a lot of queries based on date&time/last modified to (ab)use RowKey (or build and maintain secondary index yourself).

You might wonder how to create such secondary index. Easy, like in 1980s. You already have table (BTW did you noticed, how easy is to store something into table, but it’s order of magnitude harder (read slower) to get something from it, except if you’re doing PartitionKey&RowKey exact match?), if the inserts/updates/deletes are not super fast and you’re not fighting for throughput you can compute the key for secondary index directly while doing the operation and store it in another table with both keys from original table. But if you need the operation to be super fast, you can just put the data into queue message (if it’s too big for message (64kB in Jan 2013) you can put the data into blog and store just reference to blob) and use worker role(s) to process the data, compute secondary keys and insert/update/delete into table(s). Boring? Yes. Old school? Yes. But Azure Table Storage isn’t exactly smart storage, but it scales. :)

Let’s hope secondary indices (even stale would be good) will be added in the future.

AWS Simple Storage Service (S3) pain

When we worked together with Aleš Roubíček using Windows Azure cloud, we faced quite a few surprises with applications in cloud. Especially when the application isn’t small and is using a lot of “cloud” services. Sometimes we used hashtag on Twitter for it: #cloudlife.

Last week I was creating fairly simple tool that allows (or should allow) me to copy some files to S3 bucket. I had some special needs because the folder where the files were was heavily changed and the tool needed to handle that. Because AWS has SDK for .NET I was not expecting that to be extremely difficult. How wrong I was.

Every file in S3 has ETag associated. And you will get it with almost every request, i.e. when you’re listing bucket. Good candidate for decision whether to copy the file (because it was changed) or not. The SDK contains AmazonS3Util class which allows you to compute the checksum/ETag. Great, should be piece of cake. Not so fast. The SDK and S3 itself has some gotchas – #cloudlife.

First the ETag returned from SDK is inside double quotes. But the value from AmazonS3Util is not. I can fix that. Could be worse. Let’s move forward. For big or huge files the new multipart upload (you are uploading the file in smaller chunks) is recommended. Boom. Checksum/ETag is then different. For same file uploaded “normal” way and multipart the ETag differs. Sadly the AmazonS3Util can compute only the checksum for “normal” type of upload. Never mind. Next. Every file can have some metadata associated, I can store my own checksum there. I will need to request metadata in separate call, but I can live with that. So let’s use WithMetadata method to add some key/value. Good no problem so far. Retrieving metadata. Suspicious method GetObjectMetadata (and the BeginGetObjectMetadata/EndGetObjectMetadata counterparts) returns NameValueCollection, I can probably find the same key as I stored there. Not so fast. My key originally stored as i.e. foobar is now x-amz-meta-foobar. Maybe the WithMetadata could enforce that in input, so I know there’s always this prefix. I don’t like hidden magic. Almost there. Every request is also signed (that’s, also, why we have the key and secret) to be sure the request was not changed on the way etc. (headers, parameters etc. are part or the input for signature). SDK handles the signature for me. Badly. As long as you use US-ASCII characters in filenames/paths or better to say keys for storing the data, you’re fine. Try to put there some “special” characters, like in my case diacritics. Suddenly the signature doesn’t match what’s expected and server will not allow me proceed further. Yes you’re right, the encoding issue. Sadly I don’t know currently workaround. Hope it’ll be fixed soon. Hitting wall every while.

And you know what? Some limitations and challenges I like (ETag issue. These are reasons why I like development and creating software. Some I don’t like (encoding issue), I shows somebody wasn’t doing own work properly. And why #cloudlife? From marketing we’re told the cloud is so cool and so … best, it’ll solve all our problems and we expect it to be perfect. But it’s not. It’s piece of architecture as every other component in your solution. You should expect issues.

Have fun, develop for fun.

Off-site initial upload for Synology Amazon S3 backup

I recently started thinking about backing up also “less” [1] important data to Amazon S3 from my Synology NAS, which is my primary backup location. The problem was, that the amount was about 150GB. Far more than I can upload through my home connection in reasonable time. It would take weeks. When I first started using S3 backup on my NAS, I checked, what’s exactly being copied to the bucket, in case disaster happens and I’ll be force to restore without any Synology box around (or at least restore critical data sooner than I’ll have it available).

Luckily it’s almost the same structure as on disk. There’s a @tmp folder added and whole backup is in particularly named folder, but the name is same for all backups from the NAS. If you checked also Enable metadata backup, there’s one file per folder, with metadata in it (guessing, I haven’t tried to reverse it), but I’m not using this option.

I wanted to upload the initial data from my office, where the connection is better and, because the data isn’t changed too often, just backup the differences directly from box. With the above knowledge I was pretty confident, the off-site upload is going to work. :) And it did. After I uploaded the data, I triggered the backup on box and after a while it was done. Like a glove.

And remember, backup is one part. Restore is the key other!

[1] No matter what you think, every data is important. You’ll find out, when you will loose the least important data (you thought it’s least important). But everybody needs to learn that the hard way, before really believing.

Amazon’s EC2 Micro Instance and Firebird

Few months ago I was playing with different VPS providers and I was also considering deploying the application to some “small instance” cloud. I came to Micro Instance of Amazon EC2. Part of the solution I was doing research for is Firebird database and because the pricing looks good, I tried to install Firebird (2.5 SuperClassic) on this instance.

The good news is, that it’s possible. As I did manual install, it was fast. The bad news is, that it’s really slow. If you read the description of Micro Instance you can expect it will not be blazing fast. But even for really low-load database server it’s almost unusable.

I’m not saying you can’t use it. Maybe some slow/late backup/mirror or something like that should be doable. But for normal applications using database server you’ll do better job with your own server or VPS.

Using Firebird inside Azure (without VM role)

Recently I was playing with Amazon EC2 trying what I could use it for. I was playing with Firebird there as well. But in fact you have virtual machine you can use. So using Firebird there wasn’t that hard. But I got and idea about Azure, because it’s more platform for applications than computers.

Then the VM role was introduced and the challenge was somehow not so challenging. But …

Yes, it was still in my mind. My rough idea was abuse Firebird Embedded and load it inside Web or Worker role. Only problem was where to store the database. Sure, the scaling will be compromised, but it’s just “try it and see what could be done and let others (not) use it”. 8-) The CloudDrive solved my problems about where to store the database.

So it was no-brainer to try it.

First some helper for CloudDrive use.

public class CloudDriveHelper : IDisposable
{
	CloudDrive _drive;
	
	public string DriveLetter { get; private set; }

	public CloudDriveHelper(CloudStorageAccount storageAccount, string name, int cacheSize = 0, int driveSize = 1024)
	{
		CloudBlobClient client = storageAccount.CreateCloudBlobClient();
		CloudBlobContainer container = client.GetContainerReference("drives");

		container.CreateIfNotExist();

		string diskName = string.Format("{0}.vhd", name);
		_drive = storageAccount.CreateCloudDrive(container.GetPageBlobReference(diskName).Uri.ToString());
		try
		{
			_drive.Create(driveSize);
		}
		catch (CloudDriveException)
		{ }
		DriveLetter = _drive.Mount(cacheSize, DriveMountOptions.None);
	}

	public void Dispose()
	{
		Dispose(true);
	}

	~CloudDriveHelper()
	{
		Dispose(false);
	}

	void Dispose(bool disposing)
	{
		if (disposing)
		{
			_drive.Unmount();

			DriveLetter = null;
			_drive = null;
		}
	}
}

I added reference to FirebirdSql.Data.FirebirdClient assembly. I played with Firebird Embedded in package, but putting it to blob storage via CloudDrive would be easier, maybe next time. :) The environment in Azure is x64 (for Web and Worker roles, run by WaWebHost and WaWorkerHost respectively), so don’t forget to use proper version. Anyway, then I abused Web role to see some results.

Simple controller action.

public ActionResult Index()
{
	using (CloudDriveHelper drive = new CloudDriveHelper(Global.Account /* could be CloudStorageAccount.DevelopmentStorageAccount as well */, "firebird", driveSize: 1024))
	{
		string database = Path.Combine(drive.DriveLetter, "SomeDatabase.fdb");

		TestClass.CreateDatabase(database);

		ViewData["FirebirdVersion"] = TestClass.GetServerVersion(database);
		ViewData["SomeData"] = TestClass.SomeQuery(database).ToArray();

		return View();
	}
}

And some methods to do actual work.

public static void CreateDatabase(string databasePath)
{
	FbConnection.CreateDatabase(CreateConnectionString(databasePath), true);
}

public static string GetServerVersion(string databasePath)
{
	using (FbConnection conn = new FbConnection(CreateConnectionString(databasePath)))
	{
		conn.Open();
		return conn.ServerVersion;
	}
}

public static IEnumerable<Tuple<string, object>> SomeQuery(string databasePath)
{
	using (FbConnection conn = new FbConnection(CreateConnectionString(databasePath)))
	{
		conn.Open();
		using (FbCommand cmd = conn.CreateCommand())
		{
			cmd.CommandText = "select * from mon$database";
			using (FbDataReader reader = cmd.ExecuteReader())
			{
				if (reader.Read())
				{
					for (int i = 0; i < reader.FieldCount; i++)
					{
						yield return Tuple.Create(reader.GetName(i), reader[i]);
					}
				}
			}
		}
	}
}

Well, it did worked OK. Before you think how cool is that I have some bad news. The cloud computing is mainly about scaling and elasticity. With this you have one drive and (have to have) one instance of Firebird working with it – you’re not scaling. You can’t scale with this solution. So it’s more about being concept. However I came with two possible options, that are more realistic.

First is having one special Worker role processing some data and storing it in Firebird database (for whatever reason). The Azure machines are quite powerful and if you have everything there why to setup your own server… And I think this can be worth in some scenarios (apart not being fault tolerant etc.).

Other one builds on top of previous solution and abuses Firebird’s external tables. You can load or store data via external tables to blob storage via CloudDrive and (re-)use already written logic in stored procedures (or triggers). Little crazy, I know hence I’ve not tried it (so maybe it’s not doable). But technologies are here to use these on the edge.

As I said, the real world usage of all this isn’t big, but as a exploration project it was fun. :)

SQL Azure and Entity Framework

From time to time I get a question about SQL Azure and Entity Framework. Can I use it? Does it work? Etc. Because the SQL Azure is in fact a special version (you can check unsupported and partially supported T-SQL statements) of MS SQL Server it works without any hassle. That’s the short story.

The following lines will quickly guide you through the steps of using SQL Azure with Entity Framework. You need to have and Azure account ready. There’s no way to install SQL Azure to your own server or something like that. First login in to sql.azure.com and select some project you have there. This will bring you this screen.

By default only master database is created and it’s up to you to create additional database. Right now you can create from two editions and few sizes: Web and Business and from 1GB to 50GB. The price you’ll pay depends on this selection (but not only). For purpose of this article I created 1GB Web edition database.

On the main screen you have a button to get a connection string. I’m often using it, as it’s easier for me to get information from it than compose it from pieces on screen. :) Mine is:

Data Source=lskqxi46a0.database.windows.net;Initial Catalog=test;Persist Security Info=True;User ID=jiri@lskqxi46a0;Password=******;MultipleActiveResultSets=True;Encrypt=True

After these initial steps we can start using the database. Because I was too lazy to create tables and so on manually I tested one feature of Entity Framework I’m not much familiar with. It’s Model First. You simply create the conceptual model and later generate SQL script from it for your database (yes, Firebird has support too ;) ). I created basic “blogging system” model. Author has many Posts. Posts have many Tags. Right clicking on model and selecting Generate database from model resulted in my case in simple script.

-- --------------------------------------------------
-- Entity Designer DDL Script for SQL Server 2005, 2008, and Azure
-- --------------------------------------------------
-- Date Created: 01/29/2011 16:40:18
-- Generated from EDMX file: C:\Users\Jiri\Desktop\ConsoleApplication3\ConsoleApplication3\Model1.edmx
-- --------------------------------------------------

SET QUOTED_IDENTIFIER OFF;
GO
USE [test];
GO
IF SCHEMA_ID(N'dbo') IS NULL EXECUTE(N'CREATE SCHEMA [dbo]');
GO

-- --------------------------------------------------
-- Dropping existing FOREIGN KEY constraints
-- --------------------------------------------------

IF OBJECT_ID(N'[dbo].[FK_BlogPostTag_BlogPost]', 'F') IS NOT NULL
    ALTER TABLE [dbo].[BlogPostTag] DROP CONSTRAINT [FK_BlogPostTag_BlogPost];
GO
IF OBJECT_ID(N'[dbo].[FK_BlogPostTag_Tag]', 'F') IS NOT NULL
    ALTER TABLE [dbo].[BlogPostTag] DROP CONSTRAINT [FK_BlogPostTag_Tag];
GO
IF OBJECT_ID(N'[dbo].[FK_BlogPostAuthor]', 'F') IS NOT NULL
    ALTER TABLE [dbo].[BlogPosts] DROP CONSTRAINT [FK_BlogPostAuthor];
GO

-- --------------------------------------------------
-- Dropping existing tables
-- --------------------------------------------------

IF OBJECT_ID(N'[dbo].[BlogPosts]', 'U') IS NOT NULL
    DROP TABLE [dbo].[BlogPosts];
GO
IF OBJECT_ID(N'[dbo].[Authors]', 'U') IS NOT NULL
    DROP TABLE [dbo].[Authors];
GO
IF OBJECT_ID(N'[dbo].[Tags]', 'U') IS NOT NULL
    DROP TABLE [dbo].[Tags];
GO
IF OBJECT_ID(N'[dbo].[BlogPostTag]', 'U') IS NOT NULL
    DROP TABLE [dbo].[BlogPostTag];
GO

-- --------------------------------------------------
-- Creating all tables
-- --------------------------------------------------

-- Creating table 'BlogPosts'
CREATE TABLE [dbo].[BlogPosts] (
    [ID] int IDENTITY(1,1) NOT NULL,
    [Created] datetime  NOT NULL,
    [Heading] nvarchar(max)  NOT NULL,
    [Content] nvarchar(max)  NOT NULL,
    [Author_ID] int  NOT NULL
);
GO

-- Creating table 'Authors'
CREATE TABLE [dbo].[Authors] (
    [ID] int IDENTITY(1,1) NOT NULL,
    [Name_FirstName] nvarchar(max)  NOT NULL,
    [Name_LastName] nvarchar(max)  NOT NULL,
    [LastLoggedIn] datetime  NOT NULL
);
GO

-- Creating table 'Tags'
CREATE TABLE [dbo].[Tags] (
    [ID] int IDENTITY(1,1) NOT NULL,
    [Name] nvarchar(max)  NOT NULL
);
GO

-- Creating table 'BlogPostTag'
CREATE TABLE [dbo].[BlogPostTag] (
    [BlogPosts_ID] int  NOT NULL,
    [Tags_ID] int  NOT NULL
);
GO

-- --------------------------------------------------
-- Creating all PRIMARY KEY constraints
-- --------------------------------------------------

-- Creating primary key on [ID] in table 'BlogPosts'
ALTER TABLE [dbo].[BlogPosts]
ADD CONSTRAINT [PK_BlogPosts]
    PRIMARY KEY CLUSTERED ([ID] ASC);
GO

-- Creating primary key on [ID] in table 'Authors'
ALTER TABLE [dbo].[Authors]
ADD CONSTRAINT [PK_Authors]
    PRIMARY KEY CLUSTERED ([ID] ASC);
GO

-- Creating primary key on [ID] in table 'Tags'
ALTER TABLE [dbo].[Tags]
ADD CONSTRAINT [PK_Tags]
    PRIMARY KEY CLUSTERED ([ID] ASC);
GO

-- Creating primary key on [BlogPosts_ID], [Tags_ID] in table 'BlogPostTag'
ALTER TABLE [dbo].[BlogPostTag]
ADD CONSTRAINT [PK_BlogPostTag]
    PRIMARY KEY NONCLUSTERED ([BlogPosts_ID], [Tags_ID] ASC);
GO

-- --------------------------------------------------
-- Creating all FOREIGN KEY constraints
-- --------------------------------------------------

-- Creating foreign key on [BlogPosts_ID] in table 'BlogPostTag'
ALTER TABLE [dbo].[BlogPostTag]
ADD CONSTRAINT [FK_BlogPostTag_BlogPost]
    FOREIGN KEY ([BlogPosts_ID])
    REFERENCES [dbo].[BlogPosts]
        ([ID])
    ON DELETE NO ACTION ON UPDATE NO ACTION;
GO

-- Creating foreign key on [Tags_ID] in table 'BlogPostTag'
ALTER TABLE [dbo].[BlogPostTag]
ADD CONSTRAINT [FK_BlogPostTag_Tag]
    FOREIGN KEY ([Tags_ID])
    REFERENCES [dbo].[Tags]
        ([ID])
    ON DELETE NO ACTION ON UPDATE NO ACTION;

-- Creating non-clustered index for FOREIGN KEY 'FK_BlogPostTag_Tag'
CREATE INDEX [IX_FK_BlogPostTag_Tag]
ON [dbo].[BlogPostTag]
    ([Tags_ID]);
GO

-- Creating foreign key on [Author_ID] in table 'BlogPosts'
ALTER TABLE [dbo].[BlogPosts]
ADD CONSTRAINT [FK_BlogPostAuthor]
    FOREIGN KEY ([Author_ID])
    REFERENCES [dbo].[Authors]
        ([ID])
    ON DELETE NO ACTION ON UPDATE NO ACTION;

-- Creating non-clustered index for FOREIGN KEY 'FK_BlogPostAuthor'
CREATE INDEX [IX_FK_BlogPostAuthor]
ON [dbo].[BlogPosts]
    ([Author_ID]);
GO

-- --------------------------------------------------
-- Script has ended
-- --------------------------------------------------

Now we can finally write some code to let Entity Framework do some dirty job. Let's create some authors and issue a non-trivial query to see what's what.

using (Model1Container ctx = new Model1Container())
{
	Author a1 = new Author();
	a1.Name.FirstName = "Foo";
	a1.Name.LastName = "Bar";
	a1.LastLoggedIn = DateTime.Now;
	Author a2 = new Author();
	a2.Name.FirstName = "Jiri";
	a2.Name.LastName = "Cincura";
	a2.LastLoggedIn = DateTime.Now;
	ctx.Authors.AddObject(a1);
	ctx.Authors.AddObject(a2);
				
	ctx.SaveChanges();
	Debug.Assert(a1.ID != default(int));
	Debug.Assert(a2.ID != default(int));

	var blogPosts = ctx.Authors
		.Where(a => a.Name.LastName == "Cincura")
		.SelectMany(a => a.BlogPosts)
		.Where(bp => bp.Tags.Any(t => t.Name == "Databases" || t.Name == "Azure" || t.Name == "Cloud"))
		.Select(bp => new { bp.Heading, bp.Created });
	Console.WriteLine((blogPosts as ObjectQuery).ToTraceString());
	foreach (var blogPost in blogPosts)
	{
		Console.WriteLine(blogPost.Heading);
	}
}

Surprise. Nothing bloodthirsty. 8-) It works as expected. Two authors are created. You can check that by querying the table or by looking at database size at sql.azure.com. The query itself is standard T-SQL query you can run on MS SQL Server as well, nothing magic.

SELECT 
[Extent1].[ID] AS [ID], 
[Extent2].[Heading] AS [Heading], 
[Extent2].[Created] AS [Created]
FROM  [dbo].[Authors] AS [Extent1]
INNER JOIN [dbo].[BlogPosts] AS [Extent2] ON [Extent1].[ID] = [Extent2].[Author_ID]
WHERE (N'Cincura' = [Extent1].[Name_LastName]) AND ( EXISTS (SELECT 
	1 AS [C1]
	FROM  [dbo].[BlogPostTag] AS [Extent3]
	INNER JOIN [dbo].[Tags] AS [Extent4] ON [Extent4].[ID] = [Extent3].[Tags_ID]
	WHERE ([Extent2].[ID] = [Extent3].[BlogPosts_ID]) AND ([Extent4].[Name] IN (N'Databases',N'Azure',N'Cloud'))
))

Theory tells us, that if SQL Azure is more or less MS SQL Server database using more or less same wire protocol and more or less same T-SQL it should work. And it does. Here you have a small proof. Happy database-clouding.