Oovcde Index

Oovcde Duplicate Code Detection

Code Parsing

Many duplicate detection programs use a source line as the code chunk size (comparison chunk) for comparison. They may strip comments, remove spaces, or perform other transformations on each line. They then compare lines to see where there are matches. This could mean that long lines do not show up in the output since multiple lines must match to show up in the output.

The Oovcde project uses CLang to parse the C++ source files. Oovcde does this at the same time as it parses the project for class relationship information during the analysis phase if the -dups switch is used during the analysis phase.

As the source is parsed and converted into an Abstract Syntax Tree (AST), each statement is broken into its child parts. As an example, this means that an "if" statement will be separated into the conditional part, the body of the "if", and the body of the "else" if it exists. Another example is that a compound statement will be separated into each of the child statements. Each statement part is used as a comparison chunk, and anything else other than statements from the AST is also used as a comparison chunk.

Hashing

Each comparison chunk is hashed using the djb2 hash function and saved as a 32 bit hash. The djb2 hash function is a simple function that seems to have very few collisions when used with source code. Since the Oovcde program will only output duplicates if it finds some number of hashes in a row, then it is very suitable for this use.

For each source file, a duplicate code file is created that contains all of the hashes, and the source line number for each hash.

Outputting Duplicate Information

After the analysis phase has produced the duplicate code hash files, the user can invoke the menu to perform the comparison of hash files. Each file is compared with every other file including itself.

The output lists the number of lines that match, each source file, and the starting line number of the match for each source file. At the moment, the output may look something like the following:

lines 6  :  oovBuilder/ComponentBuilder.cpp 645  :  oovBuilder/ComponentBuilder.cpp 654
lines 5  :  oovcde/ClassDrawer.cpp 482  :  oovcde/ClassDrawer.cpp 494
lines 8  :  oovcde/ClassDrawer.cpp 277  :  oovcde/ZoneDrawer.cpp 547
lines 11  :  oovcde/ComponentDrawer.cpp 90  :  oovcde/ZoneDrawer.cpp 544
lines 6  :  oovcde/OperationDrawer.cpp 205  :  oovcde/OperationDrawer.cpp 271
lines 6  :  oovcde/ZoneDiagram.cpp 211  :  oovcde/ZoneDiagram.cpp 223
lines 5  :  oovCMaker/oovCMaker.cpp 362  :  oovCMaker/oovCMaker.cpp 404
lines 13  :  oovCommon/IncludeMap.cpp 60  :  oovCommon/IncludeMap.cpp 89
lines 6  :  oovCommon/ModelObjects.cpp 420  :  oovCommon/ModelObjects.cpp 647
lines 6  :  oovCommon/ModelObjects.cpp 1020  :  oovCommon/ModelObjects.cpp 1037
lines 8  :  oovCommon/Packages.cpp 290  :  oovCommon/Packages.cpp 349
lines 22  :  oovCovInstr/CppInstr.cpp 776  :  oovCppParser/CppParser.cpp 1208
lines 10  :  oovCovInstr/CppInstr.cpp 803  :  oovCppParser/CppParser.cpp 1233
lines 4  :  oovGuiCommon/Gui.cpp 480  :  oovGuiCommon/Gui.cpp 507