Hi Developers!
Sometimes we need to import CSV data programmatically to InterSystems IRIS either from CSV or from URL. And we expect the class with proper datatypes to be created and the data to be imported.
I published a module csvgen on Open Exchange which does exactly that.
If you just need the CSV file be imported into IRIS you can do the following:
USER>do ##class(community.csvgen).Generate("/usr/data/titanic.csv",,"Data.Titanic") Class name: Data.Titanic Header: PassengerId INTEGER,Survived INTEGER,Pclass INTEGER,Name VARCHAR(250),Sex VARCHAR(250),Age INTEGER,SibSp INTEGER,Parch INTEGER,Ticket VARCHAR(250),Fare MONEY,Cabin VARCHAR(250),Embarked VARCHAR(250) Records imported: 891 USER>
Or if you have the CSV on the internet, e.g. COVID-19 Data on Github you can get the data in the following way:
USER>d ##class(community.csvgen).GenerateFromURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/05-29-2020.csv",",","Data.Covid19")
Class name: Data.Covid19
Header: FIPS INTEGER,Admin2 VARCHAR(250),Province_State VARCHAR(250),Country_Region VARCHAR(250),Last_Update DATE,Lat MONEY,Long_ DOUBLE,Confirmed INTEGER,Deaths INTEGER,Recovered INTEGER,Active INTEGER,Combined_Key VARCHAR(250),Incidence_Rate DOUBLE,Case-Fatality_Ratio DOUBLE
Records imported: 3522
USER>
Installation
You can install the package with ZPM:
USER>zpm zpm:USER>install csvgen
The csvgen module is just a wrapper of CSV2CLASS method .
The ObjectScript Quality profile.
It's not ideal, so collaboration is very welcome!
Hi Evgeny.
If the separator in the file is a semicolon, then import does not occur and the class is not generated.
https://github.com/evshvarov/csvgen/issues/2
Hi Mikhail!
Thank you! This is fixed with the latest release!
There appears to be an issue on Windows as it doesn't strip out the ":" from the filename when auto-generating the class( https://github.com/evshvarov/csvgen/issues/13).
But beyond that I am stuck because is autodetecting my data/time column as Date and it is failing validation:
USER>w $system.Status.GetErrorText(sc)
ERROR #5540: SQLCODE: -400 Message: ERROR #5002: ObjectScript error: <ZODAT>zWhenOdbcToLogical+1^my.data.1
This is that the file looks like:
Granted Access [31358],"MYSPACE ""IN"" READER","Smith, John",12/06/2021 03:46AM,31358,314
So it is blowing up when it tries to read "12/06/2021 03:46AM" into the When field, which it auto-generated as:
I could obviously hand-correct the auto-generated class but that wouldn't help if it just re-generates it the wrong way again when I try to do the import.
Any suggestions on a way around this? Is there a way to force it to %Library.DateTime?
Thanks!
Ben
Partially answered my own question ... if I edit the auto-generated class' When property and set it to %Library.DateTime, I can then call the generated Import() classmethod of the auto-generated class and import will succeed. So I am able to push forward ... is this the intended path to do this?
Should I create an issue for the DateTime column to be property set as a DateTime object and not a Date object?
Hi Ben!
This is fixed with the new release.
As for the Library.DateTime, there is no support for this type yet.
You can add the support and PR - it's an open-source library and PRs are very welcome.
Also, you can consider csvgen-ui library by @Guillaume Rongier - it has a very nice UI to guess types and choose your own if guessing fails or doesn't apply.
Thanks for quick fix @Evgeny Shvarov!! Also, thanks for the details that some types are not supported, that is good to know. At this point I think I am good with letting it take it's best guess and then editing the class afterwards if needed. With SQL LOAD coming hopefully there will be less need to one-off utilities that do this but I am thankful that it was available for what I needed this week!!
Hi @Ben Spead!
I've introduced the support for DATETIME type in 1.3.1 release for "guesstype" functionality :)
So this should work now, and via LOAD DATA if you are on 2021.2+
Thank you very much @Evgeny Shvarov !
LOAD DATA (SQL)2021.2
Wow - this looks like it is going to be an incredibly helpful feature!! Thank you for pointing out the new docs on it :)
Thanks, Vitaliy! A new LOAD DATA command doesn't generate class, right?
added LOAD DATA support to csvgen 1.3.0
With the csvgen 1.2.1 release added the Append support as @Yuri Marx requested.
Thanks!
I tried install csvgen and got ERROR! 'csvgen' not found in any repository. Anyone else having this issue?
IRIS for Windows (x86-64) 2022.1 (Build 209U) Tue May 31 2022 12:16:40 EDT
What is the ZPM client version?