Persistent storage and strong names  
Author Message
Reiner Obrecht





PostPosted: .NET Remoting and Runtime Serialization, Persistent storage and strong names Top

Hallo all

I have a little problems with serializing to a permanent file storage. I have a complex data structure and I have to serialize compatible for future program versions.

My program does its work. But when I activate strong names, after each compilation I cannot deserialize again.

I suppose the version is serialized with the assembly. But I must have absolute compatibility over versions. Is there a solution

Thanks

Reiner Obrecht




.NET Development24  
 
 
Glenn Block MSFT





PostPosted: .NET Remoting and Runtime Serialization, Persistent storage and strong names Top

Check this article....



 
 
ellipsisware





PostPosted: .NET Remoting and Runtime Serialization, Persistent storage and strong names Top

The article does not address the question which mentioned “strong names”.

Make a library:

using System;

namespace SomeLibrary

{

[Serializable]

public class Class1

{

}

}

Sign it with a strong name and place it in the GAC.

Make a console application that references SomeLibrary:

using System;

using System.IO;

using System.Runtime.Serialization.Formatters.Binary;

namespace ConsoleApplication2

{

class Program

{

static void Main(string[] args)

{

SomeLibrary.Class1 class1 = new SomeLibrary.Class1();

Stream stream = File.Open("data.bin", FileMode.Create);

BinaryFormatter formatter = new BinaryFormatter();

formatter.Serialize(stream, class1);

//BinaryFormatter formatter = new BinaryFormatter();

//Stream stream = File.Open("data.bin", FileMode.Open);

//SomeLibrary.Class1 class1 = (SomeLibrary.Class1)formatter.Deserialize(stream);

stream.Close();

}

}

}

Run the application – it will generate the file data.bin.

Change the version number of SomeLibrary and add the new version to the GAC.

Comment out the first four lines of code in Main and remove the comments from the next three lines of code so that now the application will attempt to deserialize the file written with the earlier library and you receive a System.InvalidCastException:

Unable to cast object of type 'SomeLibrary.Class1' to type 'SomeLibrary.Class1'.

I would call this version intolerant behavior.


 
 
ellipsisware





PostPosted: .NET Remoting and Runtime Serialization, Persistent storage and strong names Top

And the answer is:

SerializationBinder


 
 
Reiner Obrecht





PostPosted: .NET Remoting and Runtime Serialization, Persistent storage and strong names Top

Hi

The articel with the SerializationBinder was helpful. With a binder I found out that the assembly comes always with the version and the version changes during compilation. So the qualified type differs between serialization to file and reading back later. So I get an exception. Even when AssemblyFormat is set to "Simple", which does change nothing.

I solved my problem by replacing the version of the deserialized assembly by the version of the current assembly. But that gets much more difficult when there are collections of objects which types also contain qualified names.

But there must be a simpler solution for such a basic problem from microsoft!

I only want to serialize complex objects containing collections of complex objects to file and reread the document later. The documents should be compatible also with future program versions (no version checking). It was so simple in MFC. Why not in .NET The serialization articles all sound so easy. But it does not work in my case.

Any suggestions for a simpler solution of this trivial problem



 
 
ellipsisware





PostPosted: .NET Remoting and Runtime Serialization, Persistent storage and strong names Top

I too found that the “AssemblyFormat” did nothing to solve this problem, moreover I found at least one article predicting dire consequences in setting this to “Simple”, one wonders about its existence…

I used SerializationBinder as Reiner Obrecht mentioned – to change the version number to that of the deserializing assembly, and thought that I was done; unfortunately I had not considered generics. If the collection is not a generic collection the solution requires no modification, however any constructed generic type becomes more problematic.

I strongly agree with the previous post, Microsoft should at least document how to deal with this VERY basic issue.


 
 
Clemens Vasters - MSFT





PostPosted: .NET Remoting and Runtime Serialization, Persistent storage and strong names Top

The binary formatter is, as you are finding out, always somewhat version dependent. The reason for that is that it is trying to do create an exact dump of an object graph's private state together with embedded hints for how to find the types required to reconstruct the object graph. Therefore there will always be a certain degree of coupling between the serialized state and the respective types. That's a function of what the formatter is trying to achieve. When you think about the possible ways that an object graph can change when you make changes to the object model, staying compatible is not exactly trivial.

Version tolerant serialization is much easier achieved using XML and the XML serialization framework (or WCF's data contract serializer) and our advice is to use XML for scenarios where you need to store data longer than you'd keep a temporary file around.

The reason for that is that XML is removed enough from the specifics of a given object model that you can find N reasonable projections of an object model into XML and M reasonable projections of XML into the object model, whereby each N and M are significantly greater than 2. With the binary formatter, you will find 1 straight mapping of the object model into the binary formatter format and possible another one if you employ serialization hackery. That's where it stops.

The XML serialization infrastructure's tradeoff is that it doesn't readily know about each and every type that comes across the wire and due to hierarchical nature of XML the identity in object graphs might not be preserved. The NetDataContractSerializer in WCF and the SoapFormatter in Remoting employ an augmented XML format that steer against both effects using embedded type references and cross-references between elements, but reintroduce some degree of version coupling and a lesser degree of interoperability as a tradeoff.

Generally, binary storage formats and the resulting version dependencies become very annoying legacy baggage after several versions and depending on the industry and jurisdiction you are selling your software into, you may have to keep documents readable for 7, 20 or even 30 years. XML is at least a way out of the dependencies on a particular code artifact and because there are very rich and broadly supported mapping and transformation infrastructures throughout the industry, there is always a way to get your XML formats up to the next level (or even platform).

There are good technical reasons to use the binary formatter and most of the "good" scenarios are those where an object graph needs to be transferred between tightly collaborating application domains (Remoting) or where the graph needs to be written to disk temporarily. If you need to keep data for a long time, you need to start thinking about decoupling the data from your particular implementation that you have today; XML is the industry's current best solution for that problem.

I hope that sheds some light on the "there must be a simpler solution from Microsoft" question. We strongly believe that we have a good one, but the BinaryFormatter is not an element in that solution.

And while share the love for all sorts of nostalgic memories, mine aren't telling me that this was "so simple in MFC" -- unless you love writing a new switch/case and repeat the whole (de)serialization code over again when you make any change.

Clemens



 
 
ellipsisware





PostPosted: .NET Remoting and Runtime Serialization, Persistent storage and strong names Top

I agree that the problem is not trivial but it is important and has not truly been solved by Microsoft for as long as I can recall.

I never managed to make MFC versioning work as documented but rolling your own backwards compatible versioning system on top of the MFC system was relatively simple although it did require maintenance, however this also had a significant advantage because I knew how it worked.

I use XML serialization for the configuration of my application for most of the reasons mentioned by Clemens Vasters and because I believe that configuration files should be human readable. However my application needs to store a complete object graph with the full state of the objects including private members, a human readable format is not necessarily desirable. I also found that it was necessary to sprinkle my configuration code with various XmlType attribute statements.

The BinaryFormatter looked as if it would solve my problems and even provide forward compatibility! However I after reading the last post I am somewhat worried that it might be deprecated.

Assuming that BinaryFormatter is a viable solution for my application what would be wrong with the following solution.

namespace ClassLibrary1

{

/// <summary>

/// SerializationBinder to provide version compatibility for

/// strongly named ClassLibrary1.

/// </summary>

sealed public class VersionBinder :

System.Runtime.Serialization.SerializationBinder

{

private Regex regex = new Regex( ,

RegexOptions.Compiled);

private string thisAssemblyName = Assembly.GetExecutingAssembly().FullName;

public override Type BindToType(string assemblyName,

string typeName)

{

assemblyName = regex.Replace(assemblyName,

thisAssemblyName);

typeName = regex.Replace(typeName,

thisAssemblyName);

return Type.GetType(typeName + ", " + assemblyName);

}

}

}


 
 
Clemens Vasters - MSFT





PostPosted: .NET Remoting and Runtime Serialization, Persistent storage and strong names Top

If you store a complete object graph, by all means use the BinaryFormatter. But an object graph is exactly what it is: a graph of instances of particular classes. There is no decoupling mechanism between the state in the serialized graph and the instances in the live graph, and therefore the mechanism is inherently brittle once changes are being made to either. That's a well understood traditional problem with any implementation dependent binary format. It turns out that nobody has found a silver bullet for this problem in-situ, because it's fundamentally a coupling issue that is best solved with decoupling. That's why lots of people are XML these days, as I wrote before. I've long given up the hope that XML could ever again be "human readable", by the way.

The fix you are proposing here will circumvent the compatibility check, but it will not fix the problem that the types you are serializing are indeed different -- including their detail semantics. How can you save the private state of a class in version A and reload it into B, if the behavior of that class has changed and then make any guarantees about the runtime behavior of the class in the presence of that newly restored state What if a value that was valid in version A is no longer valid in version B and could never possibly be set through version B's property set method or another way, but now surfaces by ways of deserializing an old instance of A Well, that might appear manageable if we're just talking about versions A and B, but how about dealing with version G and H

While you commonly "only" need to ensure that the public APIs and the explicitly stored state of classes are staying compatible across versions, you now tie all private state into the compatibility play. What does that mean for testing

The need to "spinkle" XML attributes, as you call it, all across your classes is to clearly state intent of what the representation of you application's state on disk is. Just as much as it become increasingly accepted that there is a notion of "explicit boundaries" towards the network where you only want stuff to hit the wire that you explicitly mark up as such, the same very much applies to stuff that hits the disk. By being explicit about the saved state (and by what I can tell that's exactly what you did in MFC) and treating this as a public interface just as much as the network boundary and any other boundary in your app, you've got your state management and the versioning under control.

Lastly, the BinaryFormatter is not deprecated as it part of .NET Remoting, which is an inherent part of the .NET Framework. However, we are not making any significant investments into new features for that technology. The way forward is clearly the Windows Communication Foundation and the DataContract serialization infrastructure. For the serialization of complete object graphs you might want to look at the NetDataContractSerializer. That will require you to be explicit about what you want to serialize (so you will need [DataMember] attributes unless you want to wrote your own custom serialization for each class), but it'll give you full object graphs.

I like this conversation a lot.
Clemens



 
 
ellipsisware





PostPosted: .NET Remoting and Runtime Serialization, Persistent storage and strong names Top

“What if a value that was valid in version A is no longer valid in version B”, assuming that the property has some field associated with it this would result in a waste of space as the private field would still be serialized in version B – according to the “Best practices” mentioned in 'version tolerant serialization', not a perfect solution but something I can live with, particularly as I would aim to avoid this situation as much as possible. As the developer of version B it is my responsibility not to use the private deprecated field, as the public or protected property has been removed no one else can access the deprecated field – in other words it has not “resurfaced”, I do not see an issue here - for many versions.

I understand why I had to resort to using XML attributes with the XML reader, but that does not lessen the amount of work required to implement the solution, I suspect that even more work would be required for the WFC approach. I would have preferred that a “default contract” would have been in place to minimize what I had to do.

Currently each class in my application has a custom serializer, just as it had in the MFC but putting [Serializable()] on a few base classes is a much neater approach.

I will certainly take a look at NetDataContractSerializer but as I just moved to .Net 2.0 from 1.1 it would need to be pretty good for me to implement that jump, moreover my application is not necessarily web based, - it is a control for a collection of devices.

I too have enjoyed this discussion particularly because this implementation seems to work for me.

I made a minor change to the code.

The previous version of the regular expression required a code change if the Major component of the version needed to be changed, this is not unreasonable as the convention for this field is that “Assemblies with the same name but different major versions are not interchangeable” however I do not see why the code should enforce this until it is actually so.

Thirty years ago I used an editor called vi, I am amazed that its illegible syntax is still widely used today…

namespace ClassLibrary1

{

/// <summary>

/// SerializationBinder to provide version compatibility for

/// strongly named ClassLibrary1.

/// </summary>

sealed public class VersionBinder :

System.Runtime.Serialization.SerializationBinder

{

// This expects that all four components of the version are present.

private Regex regex = new Regex ,

RegexOptions.Compiled);

private string thisAssemblyName =

Assembly.GetExecutingAssembly().FullName;

public override Type BindToType(string assemblyName,

string typeName)

{

assemblyName = regex.Replace(assemblyName,

thisAssemblyName);

typeName = regex.Replace(typeName,

thisAssemblyName);

return Type.GetType(typeName + ", " + assemblyName);

}

}

}