NNDR Report

The nearest neighbor distance ratio (NNDR) is a measure of the similarity between a synthesized dataset and the original dataset it was generated from.

The NNDR is calculated by finding the distances between each point in the synthesized dataset and its k-nearest neighbors in the original dataset. The ratio of the distance between the point and its nearest neighbor to the distance between the point and its second-nearest neighbor is then calculated. If this ratio is below a certain threshold (typically 0.5), the point is considered to be a good match to the original dataset.

The NNDR report is a summary of the NNDR scores for each point in the synthesized dataset. It can be used to evaluate the quality of the synthesized dataset and to identify any points that are significantly different from the original dataset. If the NNDR scores are too low, it may indicate that the synthesized dataset is not a good representation of the original dataset and further refinement may be needed.

The NNDR method is a technique commonly used to evaluate the quality of synthesized datasets in comparison to the original dataset. The basic idea is to compare the distances between points in the original dataset to those in the synthesized dataset. Points in the synthesized dataset that are close to points in the original dataset are considered to be good matches, while those that are far away are considered to be poor matches.

To implement the NNDR method, one typically starts by selecting a value of k, the number of nearest neighbors to consider. For each point in the synthesized dataset, the k-nearest neighbors are identified in the original dataset. The distance between the point in the synthesized dataset and its nearest neighbor in the original dataset is then divided by the distance between the point and its second-nearest neighbor in the original dataset. This gives the NNDR value for the point.

An NNDR value close to 1 indicates that the point is a good match to the original dataset, while a value close to 0 indicates that the point is a poor match. A commonly used threshold for NNDR values is 0.5. Points with NNDR values below this threshold are considered to be poor matches and may need to be removed or further refined.

The NNDR method is useful for evaluating the quality of synthesized datasets because it provides a quantitative measure of how well the synthesized dataset matches the original dataset. By comparing NNDR values across different synthesized datasets, it is possible to identify which methods are most effective for generating high-quality synthetic data.

Last updated