If you need the relational model to analyze large datasets, look no further than SQL Server. It gives you the ability to successfully handle datasets into the terabyte range giving you the scalability and performance you need when handling monster datasets. If you are attacking problems related to the biology of large datasets from the fields of Genomics, Proteomics or large scale High Throughput Screening projects, SQL Server is your best choice.
Posted here are some T-SQL scripts that show how to generate sample data, calculate percent of control and create a table of unique pairs using SQL Server's Query Analyzer. Some of the functionality is similar to what is posted on the HTS Tools page in the HTS.xla and PercentControl3.mde add-ins. The T-SQL scripts have an advantage over those tools when it comes to handling very large datasets successfully but they may be difficult to use if you are unfamiliar with T-SQL and Query Analyzer. There are always trade-offs in using any application so hopefully there are enough percent of control solutions provided on these pages to allow you to choose the one that is best suited for the problem at hand and also one that you are comfortable with.
There are several solutions posted for finding unique pairs. They illustrate several different approaches one can take in trying to solve this problem. The set based approach performs much better than the "looping" approach in SQL Server and it is easy to see this when you view the time taken to run the different scripts. Both methods are worthwhile taking a look at though. If you can't figure out a set based solution, solving the problem using a different approach may give you clues how to rewrite a script in a more efficient manner.
I will continue to add scripts to this page that provide solutions to problems related to the biology of large datasets. I hope that what is already posted is helpful in getting scientists a start at looking at some modern techniques in analysis. Good luck problem solving and remember a fast computer always helps!
C. Eric Cashon
cecashon@aol.com.