Lately, I have been digging deeper into the use of application domains within the .NET framework. The APIs for AppDomain are actually very simple to use and the documentation provided by Microsoft is also extremelly helpful. There were, however, a few random problems that I ran into while trying to implement them into my code.

Application Domains

For those who are not familiar with .NET, an application domain is a partition in the operating system's process where one or more applications reside. All objects within the same application domain communicate directly with each other. Any object that attempts to communicate with a different application domain is either copied directly or communicates through a proxy. Unless implemented from the MarshalByRefObject, an object is implicitly marshal by value and will be copied. For most use cases you will want your objects to inherit from MarshalByRefObject to prevent errors.

The very first time a remote domain attempts to communicate with a MarshalByRefObject, a proxy will be created and passed into the calling domain. Any future calls to that object will be passed through the newly created proxy object.

The purpose of application domains are to provide a way to isolate and execute any external code without affecting the application process itself. This provides an extra safeguard against any unstable code that could bring down the main process. Another benefit of application domains is the ability to modify the assemblies in memory of an application without shutting it down. This actually doubles as a benefit, as it allows for other domains to continue working while a new one is created!

In use

The project that I am currently working on requires heavy use of application domains. The core of the application is a very simple shell. It communicates with a server to fetch different sets of assemblies, loads them into their respected application domains, executes runnable code and delegates instructions from the server into any running tasks within the domains.

A wild remoting exception appears!

Implementing the actual application domains proved relatively easy. I was able to spin up a large amount of domains and bring them down pretty quickly too. Testing as well seemed to indicate that everything was working as intended.

But like any piece of code, something will always go wrong :)

During testing, I had left the actual process running for a long time (30+ minutes) and came back to an errored out screen. The error message looked something like this:

RemotingException: 
Object "/asdjf3_random_nasdfj_numbers_and_jf392j3_stuff.rem"
has been disconnected or does not exist at the server.

The exception was something that I had never seen before. Hell, even the error message was kind of strange since I did not have any code that was communicating with a 'server' yet. Also, it was nearly impossible to recreate the same error because I had no idea what caused it.

.NET Remoting

After a bit of research, it turns out that I did not quite understand the way communication between application domains worked. Traditionally, since .NET is a garbage collected language, manual management of an object's lifetime is not needed. However, any object that crosses the boundaries of it's application domain is considered a remote object and must be treated with extra care.

Since there is no guarantee that an object will exist once it leaves its domain, all MarshalByRefObjects implement a lifetime service. The lifetime service is an ILease type that controls the lifetime policy for a particular instance of an object. Unless specified, a remote object's lifetime is five minutes and will be released from memory if there is no activity during that time.

Overriding the lifetime service

the quickest solution is to implement the following:

public override object InitializeLifetimeService()
{
    return null;
}

Overriding the InitializeLifetimeService method from MarshalByRefObject seem to do the trick. However, by looking at the code sample, I could already tell this would cause some problems. When returning null, the object will be kept alive forever, possibly leading to memory issues.

A better implementation would be to create the lease object and return that. This will require some fine tuning so that you dont end up with too short or too long of a lifetime.

public override object InitializeLifetimeService()
{
    var lease = (ILease)base.InitializeLifetimeService();

    if (lease.CurrentState == LeaseState.Initial)
    {
        lease.InitialLeaseTime = TimeSpan.FromMinutes(30);
        lease.SponsorshipTimeout = TimeSpan.FromMinutes(5);    
        lease.RenewOnCallTime = TimeSpan.FromSeconds(30);
    }

    return lease;
}

This can also be done through the application's config:

<configuration>
  <system.runtime.remoting>
    <application>
      <lifetime leaseTime="1M" 
                renewOnCallTime="30S"
                leaseManagerPollTime="2M" />
    </application>
  </system.runtime.remoting>
</configuration>

Note that this applies the policy to all remote objects within your application. In my particular case, I have multiple remote objects with different lifetime requirements, so this implementation isn't entirely reasonable.

Final thoughts

After running through different scenarios, I believe the best approach is to override the InitializeLifetimeService method and return a lease object with a manually set policy. This allows me to really control the duration of each type of object that I'll be using and tune them to optimize performance.

The application will be launching shortly. I will be returning with a post about the successes and failures of the project :)

If you have any thoughts or better ideas, feel free to email me or contact me on twitter @zaidos.

Zaidox