Download Page

Program Home

Technical Problems

Introduction

This program operates with data files created with the PSR Style Database Application written by Peter Wierzba.

The duplicates list can be saved as a text file. Menu File | Save List to File.

Duplicate Search Methods

The program has four duplicate search methods:


1. True Duplicates. Very fast. 99.99% reliable
2. True Duplicates. Fast. 99.9999% reliable
3. Several identical properties. Fast
4. Identical Main part beginning. Very slow

Note that Method 2 and method 3 requires Data Files created with the "Export Style Data to CSV File" option in the File Menu of the "PSR Style Database" application. Method 1 and method 4 can use both data files created with the "Export Style Data" option and data files created with the "Export Style Data to CSV File" option.

Typical search time for a collection of 12,000 style files at a pc with a 1900 Mhz processor is:
Method 1: 1 second
Method 2: < 5 seconds
Method 3: < 5 seconds
Method 4: 5 - 50 minutes (depends on selected Main part)

Warning for users of "PSR Style Database" version 4.x and newer:
This program requires that ALL columns to be exported in the "Export Style Data to CSV File" dialog box are checked.
Search Methods in details

Method 1

This method uses entirely the CRC32 checksum which is calculated by the PSR Style Database application.

The checksum is kind of a fingerprint of the contents of the file. Changing just one single byte, i.e. an "a" to an "A" may result in a dramatic change in the checksum.

The CRC32 checksum is a number with many digits, for style files normally 10. As the number can be positive or negative you have 9,999,999,999 * 2 combinations.

Theoretically 2 different style files can have the same checksum, but even in a collection of 10,000 style files this will have a very, very small probability.

Method 2

This method uses the same method as Method 1. Furthermore this method checks for equal file size.

This is like having a fingerprint of the contents of the file AND the size of the contents.

This additional method will reduce the very, very small probability of having two style files with the same fingerprint (CRC32 checksum) to an even smaller probability.

Method 3

This method analyses 7 properties for each style file stored as data in the data file:
1. File size (corrected for length of style name and length of copyright notice)
2. Timebase
3. Time Signature
4. Number of variations
5. Presense of CASM section
6. Presense of OTS section
7. Presense of MDB section

If all these properties of style files are identical for two style files, these two style files may be duplicates, which have been revoiced or renamed or have had the tempo changed or have had the copyright notice changed or have had some other parameters changed.

File size is typically from 15,000 to 45,000 bytes. This means that there are typically 30,000 different sizes.

If some of the other properties are different (and they are!) there will be many more combinations. This means that two style files with these 7 properties equal will most probably be duplicates.

However it is recommended to copy the duplicate styles listed to a disc (or to save the list to a file) and manually check the files.

Method 4

This method analyses the first events of a Main part in the style file - and makes a kind of fingerprint of the main part.

If two style files are identical in the first events they are listed. The two style files use most probably one (or more) identical main parts.

The number of first events can be set from 10 - 100. All valid main parts (A, B, C or D) can be used for comparison.

The status bytes in the style file are not used to create the "fingerprint". This means that the fingerprint of a file saved with "Running Status On" will be identical to the fingerprint of the same file saved with "Running Status Off".

However it is recommended to copy the duplicate styles listed to a disc (or to save the list to a file) and manually check the files.

Styles with no main part as selected are skipped.

This method actually reads the style files. As the Main A part normally is found in the beginning of a style file and the Main D part normally is close to the ending (or not present at all) more off the file (or the whole file) must be read if search for Main D is selected. This has a big influence on the speed of the search process. Searching Main D compared to searching Main A will be app. 10 times slower.

Program Operation

Checking one Style file collection:

1. Create a Data file using the PSR Style Database application.
2. Select Duplicate Search Method
3. Open the data file (File | Check One File)
4. Duplicates will be listed in the Duplicate Styles Window
5. Print a list of duplicates (File | Save List to File)
6. Select a folder to move or to copy duplicates style files by clicking the "Select Folder (CSF)" button (i.e. c:\styles, A:\ or some other folder). This folder is known as the "Currently Selected Folder" (CSF).
7. Select (highlight) which of the duplicate style files (if any) to delete, to move to or to copy to the user selected folder
8. Click button "Delete Selected", or button "Move selected to CSF" or button "Copy selected to CSF".

Checking two Style file collections:

Same as above, except:
1. Create two data files
3. Open the data files (File | Check Two Files)

Search methods #2 and #3 require a data file created using the "Export Style Data to CSV File" option.
Deleted or moved style files are removed from the list.
Useful Features

The "Select listed in CSF" button

This button is very powerful. Clicking this button will select (highlight) all style files in the Currently Selected Folder (CSF).

Let us imagine that you have your primary style collection in i.e. c:\styles - and some subfolders to this folder.

Now you run the PSR Style Database program and export a data file of this primary collection.

Now you get 200 new styles. You store these in another folder i.e. c:\temp. Now you create a data file of this new collection.

To find and separate the duplicates in these two collections:
1. The two data files are opened in this program.
2. Now select c:\temp as the Currently Selected Folder.
3. Click button "Select listed in CSF"
4. All duplicates in c:\temp are now highlighted.
5. Click "Delete Selected" - and the duplicates are gone OR click Button "Move ..." or "Copy ..." to move or copy the duplicate styles.
6. Now you have all the "new" styles left in your c:\temp folder.

The "Select all but 1 of each" button

This button will leave one style file in each group of duplicates unselected.

The "Unselect all" button

This button will unselect all selected style files.

The "Clear List" button

This button will clear the list. Style files will not be deleted or moved.

Reducing Unwanted Operation Risk

This risk is reduced by:
1. You are asked to confirm deleting files every time you click the "Delete ..." button.
2. If the "Move ..." or "Copy ..." operation will find a file with the same name in the user selected folder you are asked to confirm replacing.
3. You will get an error message when trying to move files from a not erasable media (CD) and when trying to move/copy to a not writable media (CD).
4. You will get an error message when you try to delete, move or copy a non existing file. This can occur if you have moved or deleted a file AFTER creating the data file. In this situation the data file is not 100% correct. Always use the PSR Style Database application right before using this program.