Key features of FRIL include:
- Rich set of user-tunable parameters
- Advanced features of schema/data reconciliation
- User-tunable search methods (e.g. sorted neighborhood method, blocking method, nested loop join)
- Transparent support for multi-core systems
- Support for parameters configuration
- Dynamic analysis of parameters
- And many, many more...
The table below contains a comparison between FRIL and few other available linkage software packages.
FRIL | Link Plus | Link King | LinkageWiz | ||
---|---|---|---|---|---|
Schema reconciliation | On-the-fly data/schema reconciliation | Yes | No | No | Limited |
Dynamic analysis | Yes | No | No | No | |
Linkage configuration | Search method | Sorted neighborhood method, blocking search method, nested loop join | Blocking search method | Blocking search method | Blocking search method |
Configuration of search method | Yes | No | Limited | Limited | |
Automatic weights suggestion | Yes (EM method) | No | No | No | |
Dynamic analysis in distance selection | Yes | No | No | No | |
On-the-fly linkage debugging | Yes | No | Limited | No |
What does it actually mean for me?
Rich set of user-tunable parameters enables you to fine-tune all the linkage parameters. It includes linkage fields, metrics used when comparing attributes, weights for compared fields, and many, many more.
Advanced features of schema/data reconciliation will allow you to run the linkage on your data without tedious input data files preprocessing. It means that when you have data files, you are ready to go with FRIL. Features of schema/data reconciliation include merging two attributes in data source into one attribute (e.g. merging first and last names stored separately into one attribute), splitting given attribute into few columns (e.g. splitting name attribute into two attributes, first and last names) or data normalization through trimming/regular expression replacement in given attribute. Moreover, when you define the reconciliation process in FRIL, the tool will do its job on-the-fly.
User-tunable search methods are a crucial feature to find a balance between quality of linkage and time. For small input data sets, you can probably use nested loop join algorithm. For large data sets, however, sorted neighborhood method or blocking search are advised. Easy configuration of each of search methods allows you to experimentally assess the impact on linkage results of each of them.
Transparent support for multi-core systems takes from your computer as much as it can! Nowadays, more and more personal computers, or even laptops, have dual- or quad-core CPU. Most of applications do not benefit from this fact. FRIL takes full advantage of multicore architecture. For dual-core systems, you can expect speedup in runtime of 1.3-1.8 for linkage projects, depending on the linkage configuration. Moreover, FRIL exploits multicore architecture transparently from user's point of view. It means that if you give it multicore CPU, it will detect it automatically and use it.
Support for parameters configuration can save you a lot of trouble. Not sure about weights for attributes when defining linkage? No problem - there are tools to help you. And they are integrated with FRIL. Currently, we provide expectation-maximization (EM) method for doing that, but we expect more to come in the future.
Dynamic analysis of parameters makes it possible to check the impact of configuration parameters. Not sure how to configure reconciliation parameters, or which distance function is the best for given fields? Just open dynamic analysis window and you will immediately see what FRIL does. There is even more: if you change any of the configuration parameters, the dynamic analysis window will show the effect of these changes. And all presented for real data from your input file. It coould not be easier.
To check these and many more FRIL features, go ahead and download it NOW! It's free, open source, and platform independent.