Wide datasets are fairly typical for:

  • Industrial data
    • IoT
    • Sensors data
    • Mining and processing data
    • Spectrometry data
  • Analytical data
    • Most datasets after one-hot-encoding applied
    • NLP datasets
    • Any dataset where we need to raise dimensionality
    • Media featuresets
  • Social Network/modelling schemas

I'm fairly sure there's more areas but I have not encountered them myself.

Recently I have delivered a PoC with classes more than 6400 columns wide and that's where I got my inspiration for this article (I chose approach 4).

@Renato Banzai also wrote an excellent article on his project with more than 999 properties.

Overall I'd like to say that a class with more than 999 properties is a correct design in many cases.

While I always advertise CSV2CLASS methods for generic solutions, wide datasets often possess an (un)fortunate characteristic of also being long.

In that case custom object-less parser works better.

Here's how it can be implemented.

1. Align storage schema with CSV structure

2. Modify this snippet for your class/CSV file:

Parameter GLVN = {..GLVN("Test.Record")};

Parameter SEPARATOR = ";";

ClassMethod Import(file = "source.csv", killExtent As %Boolean = {$$$YES})
{
    set stream = ##class(%Stream.FileCharacter).%New()
    do stream.LinkToFile(file)
    
    kill:killExtent @..#GLVN
    
    set i=0
    set start = $zh
    while 'stream.AtEnd {
        set i = i + 1
        set line = stream.ReadLine($$$MaxStringLength)
        
        set @..#GLVN($i(@..#GLVN)) = ..ProcessLine(line)
        
        write:'(i#100000) "Processed:", i, !
    }
    set end = $zh
    
    write "Done",!
    write "Time: ", end - start, !
}

ClassMethod ProcessLine(line As %String) As %List
{
    set list = $lfs(line, ..#SEPARATOR)
    set list2 = ""
    set ptr=0
    
    // NULLs and numbers handling.
    // Add generic handlers here.
    // For example translate "N/A" value into $lb() if that's how source data rolls
    while $listnext(list, ptr, value) {
        set list2 = list2 _ $select($g(value)="":$lb(), $ISVALIDNUM(value):$lb(+value), 1:$lb(value))
    }

    // Add specific handlers here
    // For example convert date into horolog in column4

    // Add %%CLASSNAME
    set list2 = $lb() _ list2
    
    quit list2
}