After so many years of waiting, we finally got an official driver available on Pypi
Additionally, found JDBC driver finally available on Maven already for 3 months, and .Net driver on Nuget more than a month.
As an author of so many implementations of IRIS support for various Python libraries, I wanted to check it. Implementation of DB-API means that it should be replaceable and at least functions defined in the standard. The only difference should be in SQL.
And the beauty of using already existing libraries, that they already implemented other databases by using DB-API standard, and these libraries already expect how driver should work.
I decided to test InterSystems official driver by implementing its support in SQLAlchemy-iris library.
executemany
Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters.
Very helpful function, whish let insert multiple rows at once. Let's start with a simple example
import iris
host = "localhost"
port = 1972
namespace = "USER"
username = "_SYSTEM"
password = "SYS"
conn = iris.connect(
host,
port,
namespace,
username,
password,
)
with conn.cursor() as cursor:
cursor = conn.cursor()
res = cursor.execute("DROP TABLE IF EXISTS test")
res = cursor.execute(
"""
CREATE TABLE test (
id IDENTITY NOT NULL,
value VARCHAR(50)
) WITH %CLASSPARAMETER ALLOWIDENTITYINSERT = 1
"""
)
cursor = conn.cursor()
res = cursor.executemany(
"INSERT INTO test (id, value) VALUES (?, ?)", [
(1, 'val1'),
(2, 'val2'),
(3, 'val3'),
(4, 'val4'),
]
)
PythonPython
This is working fine, but what if we need to insert only one value per row.
res = cursor.executemany(
"INSERT INTO test (value) VALUES (?)", [
('val1', ),
('val2', ),
('val3', ),
('val4', ),
]
)
PythonPython
This unfortunately leads to an unexpected exception
RuntimeError: Cannot use list/tuple for single values
By some reason, one value per row is allowed, and InterSystems requires using a different way
res = cursor.executemany(
"INSERT INTO test (value) VALUES (?)", [
'val1',
'val2',
'val3',
'val4',
]
)
PythonPython
This way it's working fine
fetchone
Fetch the next row of a query result set, returning a single sequence, or
None
when no more data is available.
For instance simple example on sqlite
import sqlite3
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("SELECT 1 one, 2 two")
onerow = cur.fetchone()
print('onerow', type(onerow), onerow)
cur.execute("SELECT 1 one, 2 two union all select '01' as one, '02' as two")
allrows = cur.fetchall()
print('allrows', type(allrows), allrows)
PythonPython
gives
onerow <class 'tuple'> (1, 2) allrows <class 'list'> [(1, 2), ('01', '02')]
And with InterSystems driver
import iris
con = iris.connect(
hostname="localhost",
port=1972,
namespace="USER",
username="_SYSTEM",
password="SYS",
)
cur = con.cursor()
cur.execute("SELECT 1 one, 2 two")
onerow = cur.fetchone()
print("onerow", type(onerow), onerow)
cur.execute("SELECT 1 one, 2 two union all select '01' as one, '02' as two")
allrows = cur.fetchall()
print("allrows", type(allrows), allrows)
PythonPython
by some reasons gives
onerow <class 'iris.dbapi.DataRow'> <iris.dbapi.DataRow object at 0x104ca4e10> allrows <class 'tuple'> ((1, 2), ('01', '02'))
What is DataRow, why not tuple or at least a list
Standard describes a variety of exception classes that the driver is supposed to use, in case if something is wrong. And the InterSystems driver does not use it at all, just raising RunTime error for any reason, which is not part of the standard anyway.
Application may rely on the exception type happening, and behave accordingly. But InterSystems driver does not provide any difference. And another issue, SQLCODE would help, but it needs to be parsed out of error message
Conclusion
So, during testing I found multiple bugs
- Random errors happening at any time <LIST ERROR> Incorrect list format, unsupported type for IRISList; Details: type detected : 32
- will work ok, if you try again right after the error
- Caught some segmentation faults, don't even know how it happens
- Unexpected result from fetchone function
- Unexpected way of working of executemany function, for one value rows
- Exceptions not implemented at all, different errors should raise different exceptions, and applications rely on it
- Can break Embedded Python if installed next to IRIS
- due to the same name used by Embedded Python and this driver, it overrides what's already installed with IRIS and may break it
SQLAlchemy-iris now supports the official InterSystems driver, but due to incompatibility with Embedded Python and several bugs discovered during testing. Install with this command, with the defined extra
pip install sqlalchemy-iris[intersystems]
And simple usage, URL should be iris+intersystems://
from sqlalchemy import Column, MetaData, Table
from sqlalchemy.sql.sqltypes import Integer, VARCHAR
from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase
DATABASE_URL = "iris+intersystems://_SYSTEM:SYS@localhost:1972/USER"
engine = create_engine(DATABASE_URL, echo=True)
# Create a table metadata
metadata = MetaData()
class Base(DeclarativeBase):
pass
def main():
demo_table = Table(
"demo_table",
metadata,
Column("id", Integer, primary_key=True, autoincrement=True),
Column("value", VARCHAR(50)),
)
demo_table.drop(engine, checkfirst=True)
demo_table.create(engine, checkfirst=True)
with engine.connect() as conn:
conn.execute(
demo_table.insert(),
[
{"id": 1, "value": "Test"},
{"id": 2, "value": "More"},
],
)
conn.commit()
result = conn.execute(demo_table.select()).fetchall()
print("result", result)
main()
PythonPython
Due to bugs in InterSystems driver, some features may not work as expected. And I hope it will be fixed in the future
Thank you for getting this out there, has been super frustrating understanding some of these errors while using these drivers with the promise of DB-API. At least the distribution has gone public with maven, pypi, etc as a step in the right direction and hopefully will help things, especially with the drivers being the "front door" to alot of the cloud offerings.
The alchemy workaround for compatibility is especially great.
Thanks for this post, I noticed that doing insert from SQLAlchemy is slower than when is done directly from Python. So I wrote following code, in one case I can getting connection string from SQLAlchemy and in second case getting it from Python. Exact code is 50% slower when connection is obtained from SQLAlchemy. Any thoughts?
The code is below, only difference between the two run is how I am setting the connection variable.
import pandas as pd
from sqlalchemy import create_engine
import numpy as np
import time
import iris
# Create the SQLAlchemy engine
DATABASE_URL = "iris+intersystems://XXX:XXX@localhost:1972/USER"
engine = create_engine(DATABASE_URL, echo=True)
args = {'hostname':'localhost', 'port':1972, 'namespace':'USER', 'username':'XXX', 'password':'XXX'}
# connection = iris.connect(**args)
connection = engine.raw_connection()
# Generate data for each row (49 fields)
num_records = 100000
# Define SQL insert statement
sql_insert = "INSERT INTO SQLUser . test_table ( field_0 , field_1 , field_2 , field_3 , field_4 , field_5 , field_6 , field_7 , field_8 , field_9 , field_10 , field_11 , field_12 , field_13 , field_14 , field_15 , field_16 , field_17 , field_18 , field_19 , field_20 , field_21 , field_22 , field_23 , field_24 , field_25 , field_26 , field_27 , field_28 , field_29 , field_30 , field_31 , field_32 , field_33 , field_34 , field_35 , field_36 , field_37 , field_38 , field_39 , field_40 , field_41 , field_42 , field_43 , field_44 , field_45 , field_46 , field_47 , field_48 ) VALUES ( ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? ) "
record_values = []
# Execute SQL insert
try:
start_time = time.perf_counter() # Capture start time
batch = 0
cursor = connection.cursor()
for _ in range(num_records):
record_values = [np.random.rand() for _ in range(49)]
cursor.execute(sql_insert, record_values)
batch = batch + 1
if batch >= 10000:
connection.commit()
batch = 0
# print("Record inserted successfully!")
connection.commit()
end_time = time.perf_counter() # Capture end time
elapsed_time = end_time - start_time
print(f"Time taken: {elapsed_time} seconds to insert {num_records} at ", num_records/elapsed_time, " records per second.")
except Exception as e:
print("Error inserting record:", e)
finally:
cursor.close()
connection.close()
engine.dispose()
I don't see any difference
Time taken: 23.218561416957527 seconds to insert 100000 at 4306.899045302852 records per second.
Time taken: 23.179011167027056 seconds to insert 100000 at 4314.247889152987 records per second.
from sqlalchemy import create_engine import numpy as np import time import iris hostname = "localhost" port = 1972 namespace = "USER" username = "_SYSTEM" password = "SYS" # Create the SQLAlchemy engine DATABASE_URL = ( f"iris+intersystems://{username}:{password}@{hostname}:{port}/{namespace}" ) engine = create_engine(DATABASE_URL, echo=True) args = { "hostname": hostname, "port": port, "namespace": namespace, "username": username, "password": password, } connection = iris.connect(**args) # connection = engine.raw_connection() # Generate data for each row (50 fields) columns_count = 50 drop_table_sql = f"DROP TABLE IF EXISTS test_table" columns = [f"field_{i + 1}" for i in range(columns_count)] create_table_sql = f"CREATE TABLE test_table ({', '.join([f'{col} DOUBLE' for col in columns])})" num_records = 100000 # Define SQL insert statement sql_insert = f"INSERT INTO SQLUser . test_table ({', '.join(columns)}) VALUES ({', '.join(['?'] * columns_count)})" record_values = [] # Execute SQL insert try: start_time = time.perf_counter() # Capture start time batch = 0 cursor = connection.cursor() cursor.execute(drop_table_sql) cursor.execute(create_table_sql) connection.commit() for _ in range(num_records): record_values = [np.random.rand() for _ in range(columns_count)] cursor.execute(sql_insert, record_values) batch = batch + 1 if batch >= 10000: connection.commit() print("Batch inserted successfully!") batch = 0 connection.commit() end_time = time.perf_counter() # Capture end time elapsed_time = end_time - start_time print( f"Time taken: {elapsed_time} seconds to insert {num_records} at ", num_records / elapsed_time, " records per second.", ) except Exception as e: print("Error inserting record:", e) finally: cursor.close() connection.close() engine.dispose()
I still see the difference, maybe it is the version of IRIS and drivers.
Following are the version of various software I am using (and IRIS and Python code is running on same host).
IRIS version: IRIS for Windows (x86-64) 2024.3 (Build 206U) Fri Oct 11 2024 17:13:28 EDT
Python: Python 3.11.9
SQLAlchemy: Version: 2.0.0
sqlalchemy-iris: 0.17.0
intersystems_irispython: 5.1.0
returns:
Could this be because I'm on an M2 Mac?
you have to install it with quote :
pip install "sqlalchemy-iris[intersystems]"
Btw, if you want to avoid conflict between Embedded Python, Community driver and Official one, you can use iris embedded python wrapper :
Thank you!