The nucleus of this idea is Copyright©2000 by Brightidea.com, and can be found here.

## Suggested strategies for spreadsheet-like front ent for pubwan |

A cheap and easy way to do volunteer data entry is needed, as well as a cheap and easy way to mine unitabular data for clever normalizations.

A row entry to a pubwan database could be represented as a list containing an even number of elements. Each element pair could be a column name followed by a cell literal. Or each pair could be cons'ed together. Numerous pubwan users could keep archives of column names used thus far in row entries. They'd be like a DNS backbone of pubwan. Every time a row entry arrives that contains at least one not-yet-seen column name, the column-name archive is united (set union with duplicate removal) with it. A row entry with or without unprecedented column names will have a number of columns, each of which has a well defined rank (alphabetical ranking, for example) among archived column names.

A value for a key column could be selected in the following way...

Each column name has a rank. That rank divided by the number of archived columns is its quantile ranking. The mean and standard deviations of the quantile rankings for a given row entry can be calculated (hopefully efficiently), each can be scaled to [0,65535]. The two can be binomialed to a 32 bit integer, which becomes a key candidate. There may be hash tables or the like for the inevitable collisions.

I would think this would have the net effect of squeezing out whitespace (null entries) in a big table, and creating nice rectangular (given an optimal rearrangement of the columns) tables with little internal whitespace. The optimal rearrangement might be challenging due to permutations of the column name archive numbering on the order of the factorial of the number of columns.

what's