What is best approach to parsing CSV string to tab-delimited string?

Question

Question

Mack Altman · Nov 1, 2018

In PHP, I'm able to utilize the implode and str_getcsv functions to convert a comma-delimited string to a tab-delimited string. The following is an example.

$strtotab = implode("\t", str_getcsv($str, ","));

I am wondering if there is a similar way to do this in Cache and ensure that it takes into account fields enclosed in quotes. I am away of the CSVTOCLASS() method in the %SQL.Util.Procedures class. As I understand it, this is used to convert an entire file rather than a single string. The goal is to be compliant with RFC 4180.

To provide an example, the following comma-delimited string should be viewed as having 5 pieces, and not 7.

ABC Company,"123 Main St, Ste 102","Anytown, DC",10001,234-567-8901

Any guidance would be appreciated.

Discussion (33)4

Log in or sign up to continue

Vitaliy Serdtsev · Nov 1, 2018

#include %systemInclude

    s csv="ABC Company,""123 Main St, Ste 102"",""Anytown, DC"",10001,234-567-8901"
    s list=$$CSVtoList^%occLibrary(csv)

    f i=1:1:$ll(list) s:$li(list,i)["," $li(list,i)=$$$quote($li(list,i))
        
    w csv,!,$lts(list,$c(9))

Result:

USER>d ^test
ABC Company,"123 Main St, Ste 102","Anytown, DC",10001,234-567-8901
ABC Company     "123 Main St, Ste 102"  "Anytown, DC"   10001   234-567-8901

3 0

score 0 · Answer 1 · 2018-11-01T10:15:19-04:00

Vitaliy Serdtsev · Nov 1, 2018

$tr(str,",",$c(9))
or
$replace(str,",",$c(9))

0 0

score 0 · Answer 2 · 2018-11-01T10:20:59-04:00

Mack Altman Nov 1, 2018 to Vitaliy Serdtsev

This doesn't take into account fields enclosed in double quotes. The goal is to remain compliant with RFC 4180.

0 0

score 2 · Answer 3 · 2018-11-01T12:32:23-04:00

Rod Dorman Nov 1, 2018 to Vitaliy Serdtsev

Are the %occLibrary entry points documented anywhere as being available for use by applications?

2 0

score 0 · Answer 4 · 2018-11-01T12:57:59-04:00

Mack Altman Nov 1, 2018 to Vitaliy Serdtsev

This works beautifully.

Are you aware of where the %occLibrary documentation may be referenced?

0 0

score 2 · Answer 5 · 2018-11-01T13:06:32-04:00

I discourage since years the use of $ZU(...) functions as they aren't documented since 2010.
I recently had to dig back to 2009 for just a weak hint what might happen.

It is even worse with all the internal stuff around %occ* and similar.
No docs. No guaranty of the life cycle. No (external) notice of eventual changes. Mostly as a deployed code.

If it is used inside a $system.* or part of a generated code that's OK. The responsibility is not at the user side.

Verifying those "specials" with every release change can be a very heavy exercise.
(just experiencing this on a system locked down to on an older version unable to migrate)

score 0 · Answer 6 · 2018-11-01T12:56:30-04:00

Typically, I've dealt with XML, EDI, and tab-delimited files within Cache. All CSV files were typically done in PHP. It's definitely a great exercise.

I did want to note that the above solution did not work with the following string.

CSIN-01,01/01/2002,01/15/2003,,1,1002381,ABC CORP,02/22/2018,"NET ""15""","ANYTOWN, DC",JANE DOE,"ANYTOWN, DC",06/13/2018,$13900.00 ,$101.15 ,$5.13 ,$308.00 ,$281.70 ,$0.00 ,$0.00 ,$217.00 ,JOHN DOE

score 0 · Answer 7 · 2018-11-02T06:22:43-04:00

Ben Spead Nov 2, 2018 to Gerd Nachtsheim

Nice work Gerd!

0 0

score 0 · Answer 8 · 2018-11-02T09:17:09-04:00

Robert Cemper Nov 2, 2018 to Gerd Nachtsheim

GREAT !
all well-documented code!
no (dirty) Harry_Potter_Coding !

I was sure you know it

0 0

score 0 · Answer 9 · 2018-11-02T15:35:29-04:00

This is great except in the instance where you don't know the descriptors. We utilize CSVTOCLASS for situations where we do.

score 0 · Answer 10 · 2018-11-02T16:31:15-04:00

Mack Altman Nov 2, 2018 to John Murray

This wouldn't account for double quotes within the field data.

0 0

score 1 · Answer 11 · 2018-11-05T04:24:04-05:00

I suggest the following strategy:

Read line
If $Length(line,"""")#2=1 then you have a complete CSV line (an even number of doublequotes)
Otherwise read line2 and append it to line. Repeat this until $Length(line2,"""")#2=1. If you reach the end of the file before this happens, the CSV source is malformed.

In the above algorithm you may want to cater for the case where concatenation of consecutive lines would exceed the maximum local string length.

score 0 · Answer 12 · 2018-11-01T10:34:06-04:00

Vitaliy Serdtsev Nov 1, 2018 to Mack Altman

Maybe the name of the topic to specify a CSV-string instead of string ?

After clarification, the question became more clear.

0 0

score 0 · Answer 13 · 2018-11-01T10:37:43-04:00

The initial post advised that it was to take into account fields enclosed in double quotes. CSV stands for comma-separated value, so I'm not sure how that would change the information.

I did add the example since it appeared someone may not understand what taking into account double quotes means.

score 2 · Answer 14 · 2018-11-02T16:04:38-04:00

Mack- the %occLibrary calls are not documented for use by applications.

As your very interesting questions has proven the ease with which this is solved by:

$$CSVtoList^%occLibrary(csv)

I suggest you contact the WRC to request an enhancement to have this wrapped in a supported (and documented!) API that may be more future-proof than this internal and undocumented call.

score 0 · Answer 15 · 2018-11-01T15:08:47-04:00

Here's the next challenge, and one that I'm encountering using the $$CSVtoList^%occLibrary... line feeds within the field data (enclosed in double quotes).

The line ends with CRLF, and within the double quote enclosed field there appears  followed by a Line feed. The RFC 4180 standard states this should be acceptable, but I'm reviewing how to review this now. Any thoughts?

score 0 · Answer 16 · 2018-11-01T15:48:29-04:00

The Unix/Linux world often uses LF := $C(10) as line terminator
while in Win (and VMS) world CRLF := $C(C13,10) is a default.
So you depend on the source system providing the data.

Suggested approach: use LF as the (common) line terminator and just drop $C(13) or  from your input record by

$replace($translate(record,$c(13)),"","")

Before any other processing.

score 0 · Answer 17 · 2018-11-02T15:31:05-04:00

Mack Altman Nov 2, 2018 to Mack Altman

I was able to find in the documentation where it's stated that multi-line characters unsupported.

0 0

score 0 · Answer 18 · 2018-11-01T15:12:40-04:00

I attempted to use the following as the line terminator, which is in fact a CRLF. For some reason, it terminates even when only the LF occurs.

S fi.LineTerminator = $c(13,10)

score 0 · Answer 19 · 2018-11-01T12:57:11-04:00

Mack Altman Nov 1, 2018 to Rod Dorman

I'd like to know this as well. It would definitely save me a lot of time by referencing this documentation.

0 0

score 1 · Answer 20 · 2018-11-01T13:10:00-04:00

you are right. I didn't think on empty parts and doubled double quotes
which I never met from CSV.
next level exercise

score 1 · Answer 21 · 2018-11-01T13:30:30-04:00

OK, this handles double quotes. But only INSIDE a quoted string

parseCSV(string,newsep,sep=",",quote="""",newquote) { ;adjust for flexible quoting set res="",newsep=$g(newsep,$c(9)),newquote=$g(newquote,quote) for { if $g(string)[sep { if $e(string)=quote { set string=$replace(string,"""""",$c(2)) ; exclude double quotes set part=$P(string,quote,2) ,string=$replace($p(string,part_quote_sep,2,*),$c(2),"""""") ,res=res_newquote_$replace(part,$c(2),"""""")_newquote_newsep } else { set part=$P(string,sep),string=$p(string,sep,2,*) ,res=res_part_newsep } } else { set res=res_$g(string) quit } } quit res
}

-

HTH

score 0 · Answer 22 · 2018-11-05T03:31:36-05:00

John Murray Nov 5, 2018 to Mack Altman

Mack, can you give me an example of the kind of CSV line that you mean here?

0 0

score 1 · Answer 23 · 2018-11-02T05:00:06-04:00

No.
If you beware to use %occLibrary, you may look into side %Regex.Matcher

Here is another documented solution:

#include %systemInclude
  n
  try{

    #define TAB $c(9)
    
    s csv="""ABC,"_$c(10)_" Company,"",""123 Main St, Ste 102"",""Anytown, DC"",10001,234-567-8901,""Anytown"_$$$TAB_"DC"""

    d ##class(%DeepSee.TermList).%ParseCSVRecord(csv,.arr)

    w csv,!!,$$WriteCSVRecord(.arr,$$$TAB)

  }catch(ex){
    w "Error ", ex.DisplayString(),!
  }


WriteCSVRecord(&pRecord, pDelim=",", pEncode=$$$YES) {
  // made on the basis of the method %WriteCSVRecord
  Set r = ""
  Set k = $O(pRecord(""))
  While (k'="") {
    Set tValue = $G(pRecord(k))
    Set:$G(pEncode) tValue = $ZCVT(tValue,"O","UTF8") 

    Set r = r _ $S((tValue["""")||(tValue[",")||(tValue[pDelim):$$$quote(tValue),1:tValue)
    Set k = $O(pRecord(k))
    Set:k'="" r = r _ pDelim
  }
  Quit r
}

Result:

USER>d ^dc
"ABC,
      Company,","123 Main St, Ste 102","Anytown, DC",10001,234-567-8901,"Anytown        DC"
 
"ABC,
      Company," "123 Main St, Ste 102"  "Anytown, DC"   10001   234-567-8901    "Anytown        DC"

score 0 · Answer 24 · 2018-11-02T11:08:03-04:00

Well, the documentation leaves a lot of room for improvement and you can sure write this into a much shorter version but I wanted to make sure folks get the (somewhat abstract) idea.

When your line is $list'ed you can be sure further processing is not affected by delimiters. I also have to say that my solution cannot compete with John (Murray)'s crisp & elegant one-liner.

score 1 · Answer 25 · 2018-11-01T12:44:07-04:00

not being verbose in %occ* world I had this solution also allowing to change quoting

parseCSV(string,newsep,sep=",",quote="""",newquote) {    ;adjust for flexible quoting
  set res="",newsep=$g(newsep,$c(9)),newquote=$g(newquote,quote)
  for  {
if $g(string)[sep {
if $e(string)=quote {
  set part=$P(string,quote,2),string=$p(string,part_quote_sep,2,*)
  ,res=res_newquote_part_newquote_newsep }
     else  {
set part=$P(string,sep),string=$p(string,sep,2,*)
,res=res_part_newsep }
  } else  {
  set res=res_$g(string) quit }
}
  quit res
}

BTW.
It's an excellent test exercise for new COS programmers
I'll add it to my collection.

Thanks

score 3 · Answer 26 · 2018-11-02T03:48:21-04:00

You could use CSV() like this... (sorry for the weird formatting)

isdDemoCSVToList
quit


Test() public
{
set tDescriptor = "ROW(STR VARCHAR,STR2 VARCHAR,STR3 VARCHAR,STR4 VARCHAR, STR5 VARCHAR)"
set tLine = "ABC Company,""123 Main St, Ste 102"",""Anytown, DC"",10001,234-567-8901"
do CSVToList(tLine,tDescriptor,.tOutput)
write !
zw tOutput 
kill tOutput 
for i=1:1:3
{
set tLineArray($i(tLineArray)) = """eins"",""deux"",line"_i
}
do CSVToList(.tLineArray,tDescriptor,.tOutput)
write !
zw tOutput 
}

CSVToList(&pLines,pDescriptor,&pListArray) public
{
#dim tRS as %SQL.StatementResult
kill tListArray
if ($d(pLines)<10) { set pLines(1)=pLines,pLines=1 }
set tTempStream = ##class(%Stream.TmpCharacter).%New()
for tIdx = 1 : 1 : pLines
{
do tTempStream.WriteLine(pLines(tIdx))
}
do tTempStream.Rewind()
set tRS = ##class(%SQL.Statement).%ExecDirect(,"call %SQL_Util.CSV(,?,?)",.pDescriptor,tTempStream).%NextResult()
while (tRS.%Next(.tSC))
{
set tList = ""
for tCol = 1 : 1 : tRS.%GetMetadata().columnCount 
{
set tValue = tRS.%GetData(tCol) 
set tList = tList _ $lb(tValue)
} 
set pListArray($i(pListArray))=tList
}
quit 
}

Your output then looks like this

tOutput=1
tOutput(1)=$lb("ABC Company","123 Main St, Ste 102","Anytown, DC","10001","234-567-8901")

tOutput=3
tOutput(1)=$lb("eins","deux","line1","","")
tOutput(2)=$lb("eins","deux","line2","","")
tOutput(3)=$lb("eins","deux","line3","","")

score 2 · Answer 27 · 2018-11-02T05:26:09-04:00

My proposal:


USER>w csv
ABC Company,"123 Main St, Ste 102","Anytown, DC",10001,234-567-8901
USER>s i=0
 
USER>f  s i=$f(csv,",",i) q:'i  i $l($e(csv,1,i-2),"""")#2 s $e(csv,i-1)=$c(9)
 
USER>w csv
ABC Company     "123 Main St, Ste 102"  "Anytown, DC"   10001   234-567-8901
USER>

Without abbreviations:


USER>write csv
ABC Company,"123 Main St, Ste 102","Anytown, DC",10001,234-567-8901
USER>set i=0
 
USER>for  set i=$find(csv,",",i) quit:'i  if $length($extract(csv,1,i-2),"""")#2 set $extract(csv,i-1)=$char(9)
 
USER>write csv
ABC Company     "123 Main St, Ste 102"  "Anytown, DC"   10001   234-567-8901
USER>

score 0 · Answer 28 · 2018-11-02T12:50:47-04:00

/// <p>This Reader is to be used with %ResultSet. It will allow you to scan a CSV file, record by record.
/// There are three queries available for reading:</p>
/// <ol>
/// <li>CSV - comma separated
/// <li>CSV2 - semicolon separated
/// <li>TSV - tab separated
/// </ol>
/// <p>Use the query accordingly to your file type! The reader expects to find a header on the first line with the name of the fields.
/// This will be used to name the columns when you use the method Get().</p>
/// <p>The reader can deal with fields quoted and unquoted automatically.</p>  
///
///    <p>This assumes that your CSV file has a header line.
/// This custom query supports Comman Separated Values (CSV), Semicolumn Separated Values (CSV2)
/// and Tab Separated Values (TSV).</p>    
///
/// <EXAMPLE>
///
///            ; Comma Separated Values
///            ;Set oCSVFileRS=##class(%ResultSet).%New("IRISDemo.Util.FileReader:CSV")
///            ;
///            ; Semicolon Separated Values
///            ;Set oCSVFileRS=##class(%ResultSet).%New("IRISDemo.Util.FileReader:CSV2")
///            ;
///            ; Tab Separated Values
///            Set oCSVFileRS=##class(%ResultSet).%New("IRISDemo.Util.FileReader:TSV")
///            
///            Set tSC = oCSVFileRS.Execute(pFileName)
///            Quit:$System.Status.IsError(tSC)
///            
///            While oCSVFileRS.Next()
///            {
///                Set tDataField1 = oCSVFileRS.Get("Column Name 1")
///                Set tDataField2 = oCSVFileRS.Get("Column Name 2")
///                Set tDataField4 = oCSVFileRS.GetData(3)
///                
///                // Your code goes here            
///            }
///    </EXAMPLE>                    
///

Class IRISDemo.Util.FileReader Extends %RegisteredObject
{

    /// Use this Query with %ResultSet class to read comma (,) separated files.
    Query CSV(pFileName As %String, pFileEncoding As %String = "UTF8") As %Query
    {
    }
    
    /// Use this Query with %ResultSet class to read semicolon (;) separated files.
    Query CSV2(pFileName As %String, pFileEncoding As %String = "UTF8") As %Query
    {
    }
    
    /// Use this Query with %ResultSet class to read tab separated files.
    Query TSV(pFileName As %String, pFileEncoding As %String = "UTF8") As %Query
    {
    }
    
    /// Index a header from a CSV file so we can later get every row of the file
    /// and read every column by it's name. This method supports fields quoted, double 
    /// quoted or not quoted or a mix of the three. The only thing that is required for this to work
    /// is that the correct field separator is specified (Comma, Semicolon or Tab).
    ClassMethod IndexHeader(pHeader As %String, pSeparator As %String = ",", Output pHeaderIndex)
    {
        //How many columns?
        Set pHeaderIndex=$Length(pHeader,pSeparator)
        
        Set tRegexSeparator=pSeparator
        
        If tRegexSeparator=$Char(9)
        {
            Set tRegexSeparator="\x09"
        }
        
        //Let's build a regular expression to read all the data without the quotes...
        Set pHeaderIndex("Regex")=""
        For i=1:1:pHeaderIndex
        {
            Set $Piece(pHeaderIndex("Regex"),tRegexSeparator,i)="\""?'?(.*?)\""?'?"
        }
        Set pHeaderIndex("Regex")="^"_pHeaderIndex("Regex")
        
        //Let's use this regular expression to index the column names...
        Set oMatcher = ##class(%Regex.Matcher).%New(pHeaderIndex("Regex"))
        Do oMatcher.Match(pHeader)
        
        //Now let's index the colum names
        For i=1:1:oMatcher.GroupCount
        {
            Set pHeaderIndex("Columns",i)=$ZStrip(oMatcher.Group(i),"<>W")
        }
    }

    ClassMethod IndexRow(pRow As %String, ByRef pHeaderIndex As %String, Output pIndexedRow As %String)
    {
        Set oMatcher = ##class(%Regex.Matcher).%New(pHeaderIndex("Regex"))
        Do oMatcher.Match(pRow)
        
        //Now let's index the colum names
        For i=1:1:oMatcher.GroupCount
        {
            Set $List(pIndexedRow,i)=oMatcher.Group(i)
        }
    }

    ClassMethod CSVGetInfo(colinfo As %List, parminfo As %List, idinfo As %List, qHandle As %Binary, extoption As %Integer = 0, extinfo As %List) As %Status
    {
            Quit ..FileGetInfo(.colinfo, .parminfo, .idinfo, .qHandle, .extoption, .extinfo)
    }
    
    ClassMethod CSV2GetInfo(colinfo As %List, parminfo As %List, idinfo As %List, qHandle As %Binary, extoption As %Integer = 0, extinfo As %List) As %Status
    {
            Quit ..FileGetInfo(.colinfo, .parminfo, .idinfo, .qHandle, .extoption, .extinfo)
    }
    
    ClassMethod TSVGetInfo(colinfo As %List, parminfo As %List, idinfo As %List, qHandle As %Binary, extoption As %Integer = 0, extinfo As %List) As %Status
    {
            Quit ..FileGetInfo(.colinfo, .parminfo, .idinfo, .qHandle, .extoption, .extinfo)
    }
    
    ClassMethod FileGetInfo(colinfo As %List, parminfo As %List, idinfo As %List, qHandle As %Binary, extoption As %Integer = 0, extinfo As %List) As %Status
    {
        Merge tHeaderIndex = qHandle("HeaderIndex")
        Set colinfo = ""
        For i=1:1:tHeaderIndex
        {
            Set tColName = tHeaderIndex("Columns",i)
            Set colinfo=colinfo_$LB($LB(tColName))    
        }
        
        Set parminfo=$ListBuild("pFileName","pFileEncoding")
        Set extinfo=""
        Set idinfo=""
        Quit $$$OK
    }

    ClassMethod CSVExecute(ByRef qHandle As %Binary, pFileName As %String, pFileEncoding As %String = "UTF8") As %Status
    {
        Set qHandle("Separator")=","
        Quit ..FileExecute(.qHandle, pFileName, pFileEncoding)
    }

    ClassMethod CSV2Execute(ByRef qHandle As %Binary, pFileName As %String, pFileEncoding As %String = "UTF8") As %Status
    {
        Set qHandle("Separator")=";"
        Quit ..FileExecute(.qHandle, pFileName, pFileEncoding)
    }

    ClassMethod TSVExecute(ByRef qHandle As %Binary, pFileName As %String, pFileEncoding As %String = "UTF8") As %Status
    {
        Set qHandle("Separator")=$Char(9)
        Quit ..FileExecute(.qHandle, pFileName, pFileEncoding)
    }

    ClassMethod FileExecute(ByRef qHandle As %Binary, pFileName As %String, pFileEncoding As %String = "UTF8") As %Status
    {
        Set tSC = $System.Status.OK()
        Try
        {
            Set oFile = ##class(%Stream.FileCharacter).%New()
            If pFileEncoding'="" Set oFile.TranslateTable=pFileEncoding
            Set oFile.Filename=pFileName
            
            Set tHeader=oFile.ReadLine()
            Do ..IndexHeader(tHeader, qHandle("Separator"), .tHeaderIndex)
            
            Merge qHandle("HeaderIndex")=tHeaderIndex
            Set qHandle("File")=oFile
        }
        Catch (oException)
        {
            Set tSC = oException.AsStatus()
        }
        
        Quit tSC
    }

    ClassMethod CSVClose(ByRef qHandle As %Binary) As %Status [ PlaceAfter = FileExecute ]
    {
        Quit ..FileClose(.qHandle)
    }

    ClassMethod CSV2Close(ByRef qHandle As %Binary) As %Status [ PlaceAfter = FileExecute ]
    {
        Quit ..FileClose(.qHandle)
    }

    ClassMethod TSVClose(ByRef qHandle As %Binary) As %Status [ PlaceAfter = FileExecute ]
    {
        Quit ..FileClose(.qHandle)
    }

    ClassMethod FileClose(ByRef qHandle As %Binary) As %Status [ PlaceAfter = FileExecute ]
    {
        #Dim oFile As %Library.FileCharacterStream
        Set tSC = $System.Status.OK()
        Try
        {
            Kill qHandle("File")
            Kill qHandle("HeaderIndex")
        }
        Catch (oException)
        {
            Set tSC = oException.AsStatus()
        }
        
        Quit tSC
    }

    ClassMethod CSVFetch(ByRef qHandle As %Binary, ByRef Row As %List, ByRef AtEnd As %Integer = 0) As %Status [ PlaceAfter = FileExecute ]
    {
        Quit ..FileFetch(.qHandle, .Row, .AtEnd)
    }

    ClassMethod CSV2Fetch(ByRef qHandle As %Binary, ByRef Row As %List, ByRef AtEnd As %Integer = 0) As %Status [ PlaceAfter = FileExecute ]
    {
        Quit ..FileFetch(.qHandle, .Row, .AtEnd)
    }

    ClassMethod TSVFetch(ByRef qHandle As %Binary, ByRef Row As %List, ByRef AtEnd As %Integer = 0) As %Status [ PlaceAfter = FileExecute ]
    {
        Quit ..FileFetch(.qHandle, .Row, .AtEnd)
    }

    ClassMethod FileFetch(ByRef qHandle As %Binary, ByRef Row As %List, ByRef AtEnd As %Integer = 0) As %Status [ PlaceAfter = FileExecute ]
    {
        Set tSC = $System.Status.OK()
        Try
        {
            Set Row = ""
            
            Set oFile = qHandle("File")
            If oFile.AtEnd
            {
                Set AtEnd=1
                Quit
            }
    
            Merge tHeaderIndex=qHandle("HeaderIndex")
            
            While 'oFile.AtEnd
            {
                Set tRow=oFile.ReadLine()
                Continue:tRow=""
                Quit
            }
                    
            Do ..IndexRow(tRow, .tHeaderIndex, .Row)
            
            
        }
        Catch (oException)
        {
            Set tSC = oException.AsStatus()
        }
        
        Quit tSC
    }
}

score 0 · Answer 29 · 2018-11-02T15:59:39-04:00

It's amazing how many different ways everyone's taken to read a standard format. I thought I'd add this excerpt from the RFC 4180. So far, I haven't found a way to read a CSV line that takes into account all of the requirements of the escaped field, specifically with the LF character. To see this for yourself, create a new Excel workbook, data into a field and use Alt+Enter a few times, and then save the file as a CSV file. The line feeds are within the field data.

The ABNF grammar [2] appears as follows:
file = [header CRLF] record *(CRLF record) [CRLF]
header = name *(COMMA name)
record = field *(COMMA field)
name = field
field = (escaped / non-escaped)
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
non-escaped = *TEXTDATA
COMMA = %x2C
CR = %x0D ;as per section 6.1 of RFC 2234 [2]

score 0 · Answer 30 · 2018-11-03T11:27:55-04:00

I've created a Gist containing the source code for a CSV reader.

It will load CSV data from a file, stream or string, exposing each record via a Next() record method and Get() and GetAt() field methods.

It implements various options for changing the field and EOL delimiter, automatic encoding from UTF8 files, quote stripping and normalisation of EOL character(s) to a preferred internal EOL character(s).

https://gist.github.com/SeanConnelly/cb4ff970c2df5266d24c8802d56f48da

I've been using some ugly data and looks to be working good enough, here is output from the Display() method...

Record No : 1
 address = 1 long street, somewhere, north pole
 poetry =
 
"Poetry"
========
 
From the "Anglo-Saxon" Rune Poem (Rune version):
"ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ
ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾ
ᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩᚱ᛫ᛞᚱᛁᚻᛏᚾᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚᛇᛏᚪᚾ"᛬
 
From Laȝamon's Brut (The Chronicles of England, Middle English, West Midlands):
 
"An preost wes on leoden, Laȝamon was ihoten
He wes Leovenaðes sone -- liðe him be Drihten.
He wonede at Ernleȝe at æðelen are chirechen,
Uppen Sevarne staþe, sel þar him þuhte,
Onfest Radestone, þer he bock radde."
 
(The third letter in the author's name is Yogh, missing from many fonts; CLICK HERE for another Middle English sample with some explanation of letters and encoding).
 
From the Tagelied of Wolfram von Eschenbach (Middle High German):
 
"Sîne klâwen durh die wolken sint geslagen,
er stîget ûf mit grôzer kraft,
ich sih in grâwen tägelîch als er wil tagen,
den tac, der im geselleschaft
erwenden wil, dem werden man,
den ich mit sorgen în verliez.
ich bringe in hinnen, ob ich kan.
sîn vil manegiu tugent michz leisten hiez."
 
Some lines of Odysseus Elytis (Greek):
 
Monotonic:
 
"Τη γλώσσα μου έδωσαν ελληνική
το σπίτι φτωχικό στις αμμουδιές του Ομήρου.
Μονάχη έγνοια η γλώσσα μου στις αμμουδιές του Ομήρου.
από το Άξιον Εστί
του Οδυσσέα Ελύτη"
 
Polytonic:
 
"Τὴ γλῶσσα μοῦ ἔδωσαν ἑλληνικὴ
τὸ σπίτι φτωχικὸ στὶς ἀμμουδιὲς τοῦ Ὁμήρου.
Μονάχη ἔγνοια ἡ γλῶσσα μου στὶς ἀμμουδιὲς τοῦ Ὁμήρου.
ἀπὸ τὸ Ἄξιον ἐστί
τοῦ Ὀδυσσέα Ἐλύτη
The first stanza of Pushkin's Bronze Horseman (Russian):
На берегу пустынных волн
Стоял он, дум великих полн,
И вдаль глядел. Пред ним широко
Река неслася; бедный чёлн
По ней стремился одиноко.
По мшистым, топким берегам
Чернели избы здесь и там,
Приют убогого чухонца;
И лес, неведомый лучам
В тумане спрятанного солнца,
Кругом шумел."
 
Šota Rustaveli's Veṗxis Ṭq̇aosani, ̣︡Th, The Knight in the Tiger's Skin (Georgian):
 
"ვეპხის ტყაოსანი შოთა რუსთაველი
ღმერთსი შემვედრე, ნუთუ კვლა დამხსნას სოფლისა შრომასა, ცეცხლს, წყალსა და მიწასა, ჰაერთა თანა მრომასა; მომცნეს ფრთენი და აღვფრინდე, მივჰხვდე მას ჩემსა ნდომასა, დღისით და ღამით ვჰხედვიდე მზისა ელვათა კრთომაასა."
 
Tamil poetry of Subramaniya Bharathiyar: சுப்ரமணிய பாரதியார் (1882-1921):
 
"யாமறிந்த மொழிகளிலே தமிழ்மொழி போல் இனிதாவது எங்கும் காணோம்,
பாமரராய் விலங்குகளாய், உலகனைத்தும் இகழ்ச்சிசொலப் பான்மை கெட்டு,
நாமமது தமிழரெனக் கொண்டு இங்கு வாழ்ந்திடுதல் நன்றோ? சொல்லீர்!
தேமதுரத் தமிழோசை உலகமெலாம் பரவும்வகை செய்தல் வேண்டும்."
 
Kannada poetry by Kuvempu — ಬಾ ಇಲ್ಲಿ ಸಂಭವಿಸು
 
"ಬಾ ಇಲ್ಲಿ ಸಂಭವಿಸು ಇಂದೆನ್ನ ಹೃದಯದಲಿ
ನಿತ್ಯವೂ ಅವತರಿಪ ಸತ್ಯಾವತಾರ
ಮಣ್ಣಾಗಿ ಮರವಾಗಿ ಮಿಗವಾಗಿ ಕಗವಾಗೀ...
ಮಣ್ಣಾಗಿ ಮರವಾಗಿ ಮಿಗವಾಗಿ ಕಗವಾಗಿ
ಭವ ಭವದಿ ಭತಿಸಿಹೇ ಭವತಿ ದೂರ
ನಿತ್ಯವೂ ಅವತರಿಪ ಸತ್ಯಾವತಾರ || ಬಾ ಇಲ್ಲಿ ||"
 
 spaceship =
 
 
       _________
      (=========)
      |=========|
      |====_====|
      |== / \ ==|
      |= / _ \ =|
   _  |=| ( ) |=|
  /=\ |=|,,,,,|=| /=\
  |=| |=| USA |=| |=|
  |=| |=|,,,,,|=| |=|
  |=| |=| | | |=| |=|
  |=| |=| | | |=| |=|
  |=| |=| | | |=| |=|
  |=| |/  | |  \| |=|
  |=|/    | |    \|=|
  |=/NASA |_| NASA\=|
  |(_______________)|
  |=| |_|__|__|_| |=|
  |=|   ( ) ( )   |=|
 /===\           /===\
|||||||         |||||||
-------         -------
 (""")           (""")
 
 size = Big
 qty = 5.7

score 0 · Answer 31 · 2018-11-04T09:50:00-05:00

Still another approach using your original "solution" in PHP following the idea of a Micro-Service.

Instead of a mimic of what PHP might do I use it directly for this purpose.
That way more sophisticated functionalities that can be used without recoding.

I extended your test to include doubled double quotes

USER>write %dstr
ABC Company,"123 Main St, Ste 102","Anytown, DC",10001,234-567-8901,"hi ""rcc"" was here"

USER>set reply=$$^phpCSV(%dstr) write reply,!! zwritereply ABC Company 123 Main St, Ste 102 Anytown, DC 10001 234-567-8901 hi "rcc" was here reply="ABC Company"_$c(9)_"123 Main St, Ste 102"_$c(9)_"Anytown, DC"_$c(9)_"10001"_$c(9)_"234-567-8901"_$c(9)_"hi ""rcc"" was here"

*

and here the code:

phpCSV(str) { ; use PHP for conversion #define php "............\php.exe " ; add location of php.exe #define pipe "|CPIPE|1" set file="myTest.php" open file:"WN" use file write "<?php " ,!,"$str='"_str_"';" ,!,"$strtotab = implode('\t', str_getcsv($str, ','));" ,!,"print_r($strtotab);" ,!,"?>",! close file open $$$pipe:$$$php_file use $$$pipe read result close $$$pipe use 0 ; write result quit $replace(result,"\t",$c(9)) }

score 0 · Answer 32 · 2024-08-05T02:13:46-04:00

I had a similar requirement, so created a simple method to achieve this. May be someone else is also looking for something similar

ClassMethod CreateCSVArray(
	csvrec = "",
	Output csvarray)
{
	s csvarray=0
	i csvrec'=""{
		s csvlen=$l(csvrec)
		i csvlen>1{
			s token=""
			f i=1:1:csvlen{
				s char=$we(csvrec,i)
				i char=""""{
					s closequotepos=$find(csvrec,"""",i+1)
					s csvarray($i(csvarray))=$we(csvrec,i+1,closequotepos-2)
					s i=closequotepos
					s token=""
				}else{
					i char'=","{
						s token=token_char
					}else{
						s csvarray($i(csvarray))=token
						s token=""
					}
				}
			}
			s:token'="" csvarray($i(csvarray))=token
		}
	}
}ObjectScript
ObjectScript

Which produced the output:

myarr=5
myarr(1)="ABC Company"
myarr(2)="123 Main St, Ste 102"
myarr(3)="Anytown, DC"
myarr(4)=10001
myarr(5)="234-567-8901"