Oovcde Duplicate Code Detection
Many duplicate detection programs use a source line as the code chunk size
(comparison chunk) for comparison.
They may strip comments, remove spaces, or perform other transformations on
each line. They then compare lines to see where there are matches.
This could mean that long lines do not show up in the output since multiple
lines must match to show up in the output.
The Oovcde project uses CLang to parse the C++ source files. Oovcde does this at
the same time as it parses the project for class relationship information during
the analysis phase if the -dups switch is used during the analysis phase.
As the source is parsed and converted into an Abstract Syntax Tree (AST),
each statement is broken into its child parts. As an example, this means
that an "if" statement will be separated into the conditional part, the
body of the "if", and the body of the "else" if it exists.
Another example is that a compound statement will be separated into each
of the child statements. Each statement part is used as a comparison chunk, and
anything else other than statements from the AST is also used as a comparison chunk.
Each comparison chunk is hashed using the djb2 hash function and saved as
a 32 bit hash. The djb2 hash function is a simple function that seems to
have very few collisions when used with source code. Since the Oovcde program
will only output duplicates if it finds some number of hashes in a row, then
it is very suitable for this use.
For each source file, a duplicate code file is created that contains all of
the hashes, and the source line number for each hash.
Outputting Duplicate Information
After the analysis phase has produced the duplicate code hash files, the user
can invoke the menu to perform the comparison of hash files. Each file is
compared with every other file including itself.
The output lists the number of lines that match, each source file, and
the starting line number of the match for each source file.
At the moment, the output may look something like the following:
lines 6 : oovBuilder/ComponentBuilder.cpp 645 : oovBuilder/ComponentBuilder.cpp 654
lines 5 : oovcde/ClassDrawer.cpp 482 : oovcde/ClassDrawer.cpp 494
lines 8 : oovcde/ClassDrawer.cpp 277 : oovcde/ZoneDrawer.cpp 547
lines 11 : oovcde/ComponentDrawer.cpp 90 : oovcde/ZoneDrawer.cpp 544
lines 6 : oovcde/OperationDrawer.cpp 205 : oovcde/OperationDrawer.cpp 271
lines 6 : oovcde/ZoneDiagram.cpp 211 : oovcde/ZoneDiagram.cpp 223
lines 5 : oovCMaker/oovCMaker.cpp 362 : oovCMaker/oovCMaker.cpp 404
lines 13 : oovCommon/IncludeMap.cpp 60 : oovCommon/IncludeMap.cpp 89
lines 6 : oovCommon/ModelObjects.cpp 420 : oovCommon/ModelObjects.cpp 647
lines 6 : oovCommon/ModelObjects.cpp 1020 : oovCommon/ModelObjects.cpp 1037
lines 8 : oovCommon/Packages.cpp 290 : oovCommon/Packages.cpp 349
lines 22 : oovCovInstr/CppInstr.cpp 776 : oovCppParser/CppParser.cpp 1208
lines 10 : oovCovInstr/CppInstr.cpp 803 : oovCppParser/CppParser.cpp 1233
lines 4 : oovGuiCommon/Gui.cpp 480 : oovGuiCommon/Gui.cpp 507