Memory consuming during BinaryFormatter Deserialization  
Author Message
Ayooya





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

Hi,

I have a problem and I need your help please,

I have made a large aplication that takes a very long time to generate the results, so I have serialized the intermediate results periodically in a binary files using BinaryFormatter in order to save it from any suddenly problems like shutdown or restart.

Now I need to Deserialize all saved files to continue my work.

The size of the saved file is about 6.5 GB devided into about 260 files.

When I try to deserialize it one by one, the first three files were deserialized in about one or two min. , and then the speed is very slow down, and the memory is fully consumed.

I think that when the reading process starts the memory begins free and it then filled out with each reading process without releasing.

I have try to release the memory but without any response.

The following snip is the code that I have wrote to read the .bin files:

static object Deserialize( string fileName)
{
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
BinaryFormatter formatter = new BinaryFormatter();
object readObject = formatter.Deserialize(fs);

fs.Despose();

fs.Close();

GC.Collect();

return readObject;

}

I need to konw why the memory is filling out more and more without releasing, and how can I release it.

Please help me.

Thanks in advance for any help,

Aya.



.NET Development7  
 
 
Chris Lyon - MS





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

Hi Aya

Here's a better way of writing your Deserialize method:

static object Deserialize( string fileName)
{
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
BinaryFormatter formatter = new BinaryFormatter();
object readObject = formatter.Deserialize(fs);
}
return readObject;
}

The using block will automatically call Dispose/Close on your FileStream object, thus freeing unmanaged resources. The call to GC.Collect is not recommended, since the GC will collect the heap when it determines it necessary.

As for the OurOfMemoryException, as I explained to you on another thread, you won't be able to load 6GB of data into a 32-bit process all at once. I suggest you load a chunk of data, process it, then release it so the GC can collect.

If you're still seeing OutOfMemoryExceptions, check out this blog entry on tracking down managed memory leaks:
http://blogs.msdn.com/ricom/archive/2004/12/10/279612.aspx

Hope that helps

-Chris


 
 
Ayooya





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

Thanks Chris.

But I need to understand a point, you say :

"As for the OurOfMemoryException, as I explained to you on another thread, you won't be able to load 6GB of data into a 32-bit process all at once. I suggest you load a chunk of data, process it, then release it so the GC can collect.
"

If I haven't already serialize my data in binary files, from where you mean load it Since you suggest this solution in the other thread where I did not mention that I have serialized it.

Note that without serializing I couldn't create my large arraylist , because I was created a portion of it and serialize it then release my objects and starts again to compute the next portion till it was completed. Now I am in the impasse that I need to deserialize the whole object.

I have read the pooling article , but it assumes that I have an object and I can reuse it many times instead of re-generate it.

This is not the case in my problem, I have a very large arraylist of distinct items and I need to search all of them many times.

Thanks too much,

Aya.


 
 
nobugz





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

Looks to me like the returned "readObject" is the one that's consuming the memory. Track its usage and verify that you don't have a reference to it when you're ready to process your next file. You might also suffer from fragmentation in the large object heap. To solve that, call GC.Collect() *before* reading the next file.


 
 
Ayooya





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

Yes , this is right.

I have found out that the memory consuming occured due to the returned object, but I already have used the GC.Collect() *before* reading the next file.

Here is athe code snippet:

public void DeserializeBinaryFiles(String path)

{

String[] filesNames = Directory.GetFiles(path);

ArrayList tT = new ArrayList(), iT = new ArrayList();

String fileName;

ArrayList i_n_t = new ArrayList();

CharmNode[] InitialNodesTableArray = new CharmNode[8030];

for (int fn = 0; fn < filesNames.Length; fn++)

{

fileName = filesNames[fn].Remove(0, filesNames[fn].LastIndexOf("\\") + 1);

GC.Collect();

iT = (ArrayList)New_Serialize_Read1(fileName);//"InitialNodesTable" + tag_ID.ToString() + ".bin");

this.InitialNodesTable.AddRange(iT);

iT = null;

}

}

public object New_Serialize_Read1(string fileName)

{

// check to see if there is a file to read

if (File.Exists(fileName))

{

// create a binary formatter to read the class members

BinaryFormatter bf = new BinaryFormatter();

FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);

try

{

return bf.Deserialize(fs);

}

catch

{

return new object(); ;

}

finally

{

fs.Close();

bf = null;

fs = null;

GC.Collect();

}

}

else

return null;

}


 
 
nobugz





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

You are adding the "iT" reference to the InitialNodesTable collection object. Setting it to null, then calling GC.Collect() will not accomplish anything, the collection still has a reference to the object. There's no obvious way to fix your problem from seeing this code, it just looks like you're trying to load 6.5 GB of data into memory. Kaboom is the expected outcome.


 
 
Chris Lyon - MS





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

nobugz is right. It appears you're not releasing any data, which is why you're getting the OOM exception.

Incidentally, I strongly recommend you replace your Deserialize method with one similar to the one I described above. Catching all exceptions and calling GC.Collect are not good programming practice (in fact, your catch code will throw an InvalidCastException when you try to cast the new Object as an ArrayList).

-Chris


 
 
Ayooya





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

Ok, I will do.

But what about the iT arraylist

Is there any idea that helps me to do my work rather than my code

I need just to search all items of that arraylists many times, and its items were serialized in .bin files .

It is a text mining application , and that is why the data are too large

So please help me , what can I do

Thanks,

Aya.


 
 
nobugz





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

What are you searching for Do you need the entire 6.5 Gigabytes to search If you do, you need to talk to Unisys or IBM. If you can create a "synopsis" of each file, loading just the bits that you need to implement your algorithm, you can do something like figuring out your processing sequence, then reload each file one at a time to do the number crunching. This is all pretty vague, we have no idea what you're doing with the data in the files until you tell us.


 
 
Ayooya





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

Each record in my arraylist contains information about particular word in the input textual corpus .

To accomplish a mining algorithms and return the results I need to make statistics about each word with all other words, such as occurrences numbers and positions and their unions and intersects , and other computations like that.


 
 
nobugz





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

Well, you're heading into Unisys/IBM territory with that. Another way to look at it is that there just aren't that many relevant words in the language. You could build an index from a file, creating your "synopsis". This is classic "algorithm beats memory requirements" stuff. I'm sure it can be done, you'll just need to think harder about the problem...


 
 
Ayooya





PostPosted: Common Language Runtime, Memory consuming during BinaryFormatter Deserialization Top

Ok, Thanks nobugz.

I will try to do it.