Board index » Visual Studio » How to copy ONLY the updated data from a large file?

How to copy ONLY the updated data from a large file?

Visual Studio13
Hello,



I copied a large file from one disk(source) to another disk(destination).

After this I made (manually or programmatically) some changes in the file on

the source disk. Now, these changes should reflect in the file on the

destination disk also. In order to achieve this, I copied the file again from

source to destination. Since the file is a large one this method is not

efficient. Because, for any minute change, I have to copy the entire thing.



Is there any way to copy only the updated portion from the file on the

source disk to the file on the destination disk?



Hoping to get a reply soon.



Thanks in advance.



regards,

Jahfer V P.


-
 

Re:How to copy ONLY the updated data from a large file?

Sorry, without having any idea what the file actually is, it is impossible to suggest a

general means of "copying" a piece of the file. If you have a way of identifying "the

updated portion" then you have a chance of doing something that optimizes the performance,

but in general there is no way to identify the changed portions reliably in a file (short

of reading both files, doing the comparison using heuristic techniques to handle changes

in size of parts of the file, and then executing a complex update, which usually involves

copying the file and rewriting the changed parts, all of which are vastly less efficient

than simply copying the file in its entirety).



Presumably you have some idea of how to identify and record these changes. But the general

problem is not efficiently solved.

joe



On Mon, 27 Jun 2005 00:32:03 -0700, "Jahfer V P" <JahferVP@discussions.microsoft.com>

wrote:



Quote
Hello,



I copied a large file from one disk(source) to another disk(destination).

After this I made (manually or programmatically) some changes in the file on

the source disk. Now, these changes should reflect in the file on the

destination disk also. In order to achieve this, I copied the file again from

source to destination. Since the file is a large one this method is not

efficient. Because, for any minute change, I have to copy the entire thing.



Is there any way to copy only the updated portion from the file on the

source disk to the file on the destination disk?



Hoping to get a reply soon.



Thanks in advance.



regards,

Jahfer V P.



Joseph M. Newcomer [MVP]

email: newcomer@flounder.com

Web: www.flounder.com">www.flounder.com

MVP Tips: www.flounder.com/mvp_tips.htm">www.flounder.com/mvp_tips.htm

-

Re:How to copy ONLY the updated data from a large file?

Jahfer V P wrote:

Quote
I copied a large file from one disk(source) to another disk(destination).

After this I made (manually or programmatically) some changes in the file on

the source disk. Now, these changes should reflect in the file on the

destination disk also. In order to achieve this, I copied the file again from

source to destination. Since the file is a large one this method is not

efficient. Because, for any minute change, I have to copy the entire thing.



If the changes to the source file are made only programmatically (i.e.

from the code in your control), then you can save all those differences

(delta), and send them to destination where they should be applied to

the destination file.



If you have both old and new source file available then you can compare

them in order to reconstruct the delta.



One good approach is to send a file hash (MD5 or SHA-1 or similar) when

you send the delta to the destination. Then the destination creates a

new file from the old one and the delta, and compares its hash with the

received one. If they match then all is well, but if they don't match

then you have to send the whole new file.



If you use some SDK function to copy the file to the destination then

you're stuck. There isn't any SDK function for what you want.

-

Re:How to copy ONLY the updated data from a large file?

Note that the OP is concerned with "efficiency". All alternatives tend to require multiple

reads and eventually a rewrite of the file, which certainly has to be less efficient than

a single copy. That's why I was asking about the nature of the file. An unstructured text

file is going to require a complete copy of the file be made somewhere. If the connection

is low-bandwidth, then it is useful to send only the deltas, but a complete file copy will

still need to be done at the site that holds the file. If they are on the same machine,

then a complete copy of a text file is about as efficient as you can get (since all

editors do a complete copy anyway). If it is a database file, however, then the whole

picture changes, and there are lots of efficient ways of sending deltas.

joe



On Mon, 27 Jun 2005 15:29:46 +0200, Mihajlo Cvetanoviæ <mac@RnEeMtOsVeEt.co.yu>wrote:



Quote
Jahfer V P wrote:

>I copied a large file from one disk(source) to another disk(destination).

>After this I made (manually or programmatically) some changes in the file on

>the source disk. Now, these changes should reflect in the file on the

>destination disk also. In order to achieve this, I copied the file again from

>source to destination. Since the file is a large one this method is not

>efficient. Because, for any minute change, I have to copy the entire thing.



If the changes to the source file are made only programmatically (i.e.

from the code in your control), then you can save all those differences

(delta), and send them to destination where they should be applied to

the destination file.



If you have both old and new source file available then you can compare

them in order to reconstruct the delta.



One good approach is to send a file hash (MD5 or SHA-1 or similar) when

you send the delta to the destination. Then the destination creates a

new file from the old one and the delta, and compares its hash with the

received one. If they match then all is well, but if they don't match

then you have to send the whole new file.



If you use some SDK function to copy the file to the destination then

you're stuck. There isn't any SDK function for what you want.



Joseph M. Newcomer [MVP]

email: newcomer@flounder.com

Web: www.flounder.com">www.flounder.com

MVP Tips: www.flounder.com/mvp_tips.htm">www.flounder.com/mvp_tips.htm

-

Re:How to copy ONLY the updated data from a large file?

Joseph M. Newcomer wrote:

Quote
Note that the OP is concerned with "efficiency". All alternatives tend to require multiple

reads and eventually a rewrite of the file, which certainly has to be less efficient than

a single copy. That's why I was asking about the nature of the file. An unstructured text

file is going to require a complete copy of the file be made somewhere. If the connection

is low-bandwidth, then it is useful to send only the deltas, but a complete file copy will

still need to be done at the site that holds the file. If they are on the same machine,

then a complete copy of a text file is about as efficient as you can get (since all

editors do a complete copy anyway). If it is a database file, however, then the whole

picture changes, and there are lots of efficient ways of sending deltas.



I agree, special cases could be handled more efficiently. And I really

did assumed that the source and destination have a low-bandwidth

connection. If OP was talking about two disks on the same machine (or

even on the same LAN) then there's almost nothing that could be done to

improve performance (unless the file format allows logical "big change

in the middle" of the file to be handled as physical "small change in

the middle and big append" to the file).



BTW, does anybody knows, is there a way to insert a file page between

existing file pages?

-