SQLAchemy-iris with the latest version Python driver

Article

Open Exchange

#Contest #Python #SQL #InterSystems IRIS

After so many years of waiting, we finally got an official driver available on Pypi

Additionally, found JDBC driver finally available on Maven already for 3 months, and .Net driver on Nuget more than a month.

As an author of so many implementations of IRIS support for various Python libraries, I wanted to check it. Implementation of DB-API means that it should be replaceable and at least functions defined in the standard. The only difference should be in SQL.

And the beauty of using already existing libraries, that they already implemented other databases by using DB-API standard, and these libraries already expect how driver should work.

I decided to test InterSystems official driver by implementing its support in SQLAlchemy-iris library.

executemany

Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters.

Very helpful function, whish let insert multiple rows at once. Let's start with a simple example

import iris

host = "localhost"
port = 1972
namespace = "USER"
username = "_SYSTEM"
password = "SYS"
conn = iris.connect(
    host,
    port,
    namespace,
    username,
    password,
)

with conn.cursor() as cursor:
    cursor = conn.cursor()

    res = cursor.execute("DROP TABLE IF EXISTS test")
    res = cursor.execute(
        """
    CREATE TABLE test (
            id IDENTITY NOT NULL,
            value VARCHAR(50)
    ) WITH %CLASSPARAMETER ALLOWIDENTITYINSERT = 1
    """
    )

    cursor = conn.cursor()
    res = cursor.executemany(
        "INSERT INTO test (id, value) VALUES (?, ?)", [
            (1, 'val1'),
            (2, 'val2'),
            (3, 'val3'),
            (4, 'val4'),
        ]
    )

This is working fine, but what if we need to insert only one value per row.

    res = cursor.executemany(
        "INSERT INTO test (value) VALUES (?)", [
            ('val1', ),
            ('val2', ),
            ('val3', ),
            ('val4', ),
        ]
    )

This unfortunately leads to an unexpected exception

RuntimeError: Cannot use list/tuple for single values

By some reason, one value per row is allowed, and InterSystems requires using a different way

    res = cursor.executemany(
        "INSERT INTO test (value) VALUES (?)", [
            'val1',
            'val2',
            'val3',
            'val4',
        ]
    )

This way it's working fine

fetchone

Fetch the next row of a query result set, returning a single sequence, or None when no more data is available.

For instance simple example on sqlite

import sqlite3
con = sqlite3.connect(":memory:")

cur = con.cursor()
cur.execute("SELECT 1 one, 2 two")
onerow = cur.fetchone()
print('onerow', type(onerow), onerow)
cur.execute("SELECT 1 one, 2 two union all select '01' as one, '02' as two")
allrows = cur.fetchall()
print('allrows', type(allrows), allrows)

gives

onerow <class 'tuple'> (1, 2)
allrows <class 'list'> [(1, 2), ('01', '02')]

And with InterSystems driver

import iris

con = iris.connect(
    hostname="localhost",
    port=1972,
    namespace="USER",
    username="_SYSTEM",
    password="SYS",
)

cur = con.cursor()
cur.execute("SELECT 1 one, 2 two")
onerow = cur.fetchone()
print("onerow", type(onerow), onerow)
cur.execute("SELECT 1 one, 2 two union all select '01' as one, '02' as two")
allrows = cur.fetchall()
print("allrows", type(allrows), allrows)

by some reasons gives

onerow <class 'iris.dbapi.DataRow'> <iris.dbapi.DataRow object at 0x104ca4e10>
allrows <class 'tuple'> ((1, 2), ('01', '02'))

What is DataRow, why not tuple or at least a list

Exceptions

Standard describes a variety of exception classes that the driver is supposed to use, in case if something is wrong. And the InterSystems driver does not use it at all, just raising RunTime error for any reason, which is not part of the standard anyway.

Application may rely on the exception type happening, and behave accordingly. But InterSystems driver does not provide any difference. And another issue, SQLCODE would help, but it needs to be parsed out of error message

Conclusion

So, during testing I found multiple bugs

Random errors happening at any time <LIST ERROR> Incorrect list format, unsupported type for IRISList; Details: type detected : 32
- will work ok, if you try again right after the error
Caught some segmentation faults, don't even know how it happens
Unexpected result from fetchone function
Unexpected way of working of executemany function, for one value rows
Exceptions not implemented at all, different errors should raise different exceptions, and applications rely on it
Can break Embedded Python if installed next to IRIS
- due to the same name used by Embedded Python and this driver, it overrides what's already installed with IRIS and may break it

SQLAlchemy-iris now supports the official InterSystems driver, but due to incompatibility with Embedded Python and several bugs discovered during testing. Install with this command, with the defined extra

pip install sqlalchemy-iris[intersystems]

And simple usage, URL should be iris+intersystems://

from sqlalchemy import Column, MetaData, Table
from sqlalchemy.sql.sqltypes import Integer, VARCHAR
from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase


DATABASE_URL = "iris+intersystems://_SYSTEM:SYS@localhost:1972/USER"
engine = create_engine(DATABASE_URL, echo=True)

# Create a table metadata
metadata = MetaData()


class Base(DeclarativeBase):
    pass
def main():
    demo_table = Table(
        "demo_table",
        metadata,
        Column("id", Integer, primary_key=True, autoincrement=True),
        Column("value", VARCHAR(50)),
    )

    demo_table.drop(engine, checkfirst=True)
    demo_table.create(engine, checkfirst=True)
    with engine.connect() as conn:
        conn.execute(
            demo_table.insert(),
            [
                {"id": 1, "value": "Test"},
                {"id": 2, "value": "More"},
            ],
        )
        conn.commit()
        result = conn.execute(demo_table.select()).fetchall()
        print("result", result)


main()

Due to bugs in InterSystems driver, some features may not work as expected. And I hope it will be fixed in the future

Arun Maskara · Mar 12

Thanks for this post, I noticed that doing insert from SQLAlchemy is slower than when is done directly from Python. So I wrote following code, in one case I can getting connection string from SQLAlchemy and in second case getting it from Python. Exact code is 50% slower when connection is obtained from SQLAlchemy. Any thoughts?

The code is below, only difference between the two run is how I am setting the connection variable.

import pandas as pd

from sqlalchemy import create_engine

import numpy as np

import time

import iris

# Create the SQLAlchemy engine

DATABASE_URL = "iris+intersystems://XXX:XXX@localhost:1972/USER"

engine = create_engine(DATABASE_URL, echo=True)

args = {'hostname':'localhost', 'port':1972, 'namespace':'USER', 'username':'XXX', 'password':'XXX'}

# connection = iris.connect(**args)

connection = engine.raw_connection()

# Generate data for each row (49 fields)

num_records = 100000

# Define SQL insert statement

sql_insert = "INSERT INTO SQLUser . test_table ( field_0 , field_1 , field_2 , field_3 , field_4 , field_5 , field_6 , field_7 , field_8 , field_9 , field_10 , field_11 , field_12 , field_13 , field_14 , field_15 , field_16 , field_17 , field_18 , field_19 , field_20 , field_21 , field_22 , field_23 , field_24 , field_25 , field_26 , field_27 , field_28 , field_29 , field_30 , field_31 , field_32 , field_33 , field_34 , field_35 , field_36 , field_37 , field_38 , field_39 , field_40 , field_41 , field_42 , field_43 , field_44 , field_45 , field_46 , field_47 , field_48 ) VALUES ( ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? ) "

record_values = []

# Execute SQL insert

try:

start_time = time.perf_counter() # Capture start time

batch = 0

cursor = connection.cursor()

for _ in range(num_records):

record_values = [np.random.rand() for _ in range(49)]

cursor.execute(sql_insert, record_values)

batch = batch + 1

if batch >= 10000:

connection.commit()

batch = 0

# print("Record inserted successfully!")

connection.commit()

end_time = time.perf_counter() # Capture end time

elapsed_time = end_time - start_time

print(f"Time taken: {elapsed_time} seconds to insert {num_records} at ", num_records/elapsed_time, " records per second.")

except Exception as e:

print("Error inserting record:", e)

finally:

cursor.close()

connection.close()

engine.dispose()

0 0

from sqlalchemy import create_engine import numpy as np import time import iris hostname = "localhost" port = 1972 namespace = "USER" username = "_SYSTEM" password = "SYS" # Create the SQLAlchemy engine DATABASE_URL = ( f"iris+intersystems://{username}:{password}@{hostname}:{port}/{namespace}" ) engine = create_engine(DATABASE_URL, echo=True) args = { "hostname": hostname, "port": port, "namespace": namespace, "username": username, "password": password, } connection = iris.connect(**args) # connection = engine.raw_connection() # Generate data for each row (50 fields) columns_count = 50 drop_table_sql = f"DROP TABLE IF EXISTS test_table" columns = [f"field_{i + 1}" for i in range(columns_count)] create_table_sql = f"CREATE TABLE test_table ({', '.join([f'{col} DOUBLE' for col in columns])})" num_records = 100000 # Define SQL insert statement sql_insert = f"INSERT INTO SQLUser . test_table ({', '.join(columns)}) VALUES ({', '.join(['?'] * columns_count)})" record_values = [] # Execute SQL insert try: start_time = time.perf_counter() # Capture start time batch = 0 cursor = connection.cursor() cursor.execute(drop_table_sql) cursor.execute(create_table_sql) connection.commit() for _ in range(num_records): record_values = [np.random.rand() for _ in range(columns_count)] cursor.execute(sql_insert, record_values) batch = batch + 1 if batch >= 10000: connection.commit() print("Batch inserted successfully!") batch = 0 connection.commit() end_time = time.perf_counter() # Capture end time elapsed_time = end_time - start_time print( f"Time taken: {elapsed_time} seconds to insert {num_records} at ", num_records / elapsed_time, " records per second.", ) except Exception as e: print("Error inserting record:", e) finally: cursor.close() connection.close() engine.dispose()