Wednesday, June 17, 2026

Designing a Governed AI Assistant for eLIMS

When I started thinking about this solution, the problem was very simple.

In eLIMS, users have a lot of data, but getting quick answers is not always easy. A user may want to know something like:

Show me studies not completed on time.

To answer this, the system has to check study details, planned completion date, completion data, actual completion date, user access, legal entity, and then decide whether the study is on time, delayed, or missing information.

So the question I had was:

Can we allow users to ask this in simple English, but still keep the backend safe and controlled?

That is the main problem this solution is solving.

Overall Design



The Main Design Decision

The most important decision I made was:

AI should not directly query the database.

AI is only used to understand what the user is asking.

It creates a plan.

That plan may be right or wrong. So I do not execute it directly. First, I send it to the validator.

Only if the validator says the plan is safe, the backend will execute it.

This is the main difference between this solution and a normal chatbot.

A chatbot may answer directly.
Here, AI only prepares the plan.
The application still controls what can run.

Why I Designed It This Way

In a system like eLIMS, we cannot allow AI to freely access data.

There are roles.
There are legal entities.
There are allowed services.
There are allowed fields.
There are audit needs.
There should not be any write operation from this assistant.

So I kept a clear boundary.

AI can understand the question.
Validator decides whether the plan is allowed.
Execution engine decides what data to read.
Audit service records what happened.

This keeps the user experience simple, but the backend is still controlled.

What Happens When User Asks a Question

Let us take this example:

Show me studies not completed on time.

The Angular UI sends this question to the .NET API.

The API asks the plan generator to convert this question into a structured plan. The plan may say that study service and core lab service are needed, and the output should include delayed or indeterminate studies.

But before doing anything with data, the validator checks the plan.

It checks:


This validator is very important. Even if AI gives a wrong service name or tries to use a field that is not allowed, the request will stop there.

No data will be read.

Execution Logic

Once the plan is valid, the execution engine takes over.

It checks whether the user has the right role and legal entity access.

After that, it reads the approved data and applies the business logic.

For the current use case, it checks study planned completion date and TestP actual completion date.

The classification is simple:

CaseResult
Actual completion date is before or equal to planned dateOn Time
Actual completion date is after planned dateDelayed
Planned date or actual completion date is missingIndeterminate

I kept Indeterminate separately because missing data should not be hidden. In real systems, missing data is also an important signal.

Why Service Registry Is Used

I did not want service names and fields to be hardcoded everywhere.

So I introduced a service registry.

The service registry tells the system:


This makes the design easier to extend.

Tomorrow, if we want to add sample service or protocol service, we do not have to change the full design. We add the service contract, allowed fields, and purpose. Then the same pattern can continue.

Why Audit Is Important

I also added audit as part of the flow.

For every request, we should know:

What did the user ask?
What plan was generated?
Which validation checks passed?
Which services were called?
What result was returned?

This is useful for support, debugging, and review.

In production systems, we cannot just show an answer and forget how it was created. We need traceability.

Final Architecture Pattern

The full pattern is this:





This is the pattern I wanted to prove with this solution.

Not AI directly reading everything.
Not a chatbot giving random answers.
Not a hardcoded report.

It is a controlled assistant.

The user gets a simple way to ask questions.
The system keeps control of validation, access, execution, and audit.

That is the balance I wanted in this design.

My View

For enterprise applications, AI should be used carefully.

It should help users ask better questions and get faster insights. But it should not bypass the application rules.

In this solution, AI is useful because it understands the user’s question. But the actual responsibility still stays with the application.

That is why I designed the eLIMS Insight Assistant as:

Question → Plan → Validate → Authorize → Execute → Audit → Result

This keeps it simple for the user and safe for the system.



Saturday, June 6, 2026

How I Review an FSD: A Practical Framework from Enterprise Delivery

A Practical Framework from Enterprise Delivery

For me, an FSD review is not just a document review. It is an early architecture checkpoint to identify design gaps, integration risks, data issues, security gaps, performance concerns, and production support challenges.

A good solution architect should not only ask, “Can we build this?”
We should ask, “Can we build this correctly, securely, scalably, testably, and supportably?”


1. Understand the Business Problem First

Before reviewing screens, fields, buttons, or validations, I first check:

  • Why is this requirement needed?

  • Which user or business problem is it solving?

  • What process does it support?

  • What happens if this feature is not delivered?

  • Is the requirement solving the real problem or only automating an unclear process?


2. Validate the Domain Logic

The FSD should be aligned with the business/domain model.

I check:

  • Are the right business entities defined?

  • Are relationships between entities clear?

  • Is the lifecycle/status flow explained?

  • Are exceptions and edge cases covered?

  • Is historical data or versioning required?

In regulated or scientific systems, even a small configuration change can impact calculation, audit, reporting, or approval.


3. Confirm Source of Truth

This is very important in microservices.

The FSD should clearly answer:

  • Which module owns the data?

  • Who can create or update it?

  • Which modules only consume it?

  • Is data fetched live, synced, cached, or stored as a snapshot?

  • What happens when the source data changes?

Lack of source-of-truth clarity can lead to duplicate data, inconsistent behavior, and production issues.


4. Review Data Model and Traceability

Even if the requirement looks simple, the data design should support future needs.

I check:

  • Mandatory and optional fields

  • Status and lifecycle

  • Audit requirements

  • History/versioning

  • Deactivation versus deletion

  • Reporting and downstream usage

A simple question I ask is:

After six months, can we still explain why the system behaved in a particular way?


5. Evaluate Integration Impact

A small change in one module may impact many systems.

I check:

  • APIs impacted

  • Events/messages required

  • Synchronous vs asynchronous flow

  • Retry and failure handling

  • Cache impact

  • Downstream reports or jobs

  • Impact on other microservices

This helps avoid surprises during development, UAT, and production.


6. Review Non-Functional Requirements

Functional requirements are not enough.

I also check:

  • Performance and volume

  • Security and authorization

  • Audit and compliance

  • Error handling

  • Monitoring and logging

  • Data retention

  • Production support needs

Example: If the FSD says export data, I ask:

  • How many records?

  • Should it be async?

  • Who can download?

  • Where will the file be stored?

  • How long should it be retained?


7. Check Security and Access Control

Access should not be defined only at UI level.

I check:

  • Page access

  • API access

  • Record-level access

  • Field-level access

  • Workflow-step access

  • Export/report access

What the UI hides, the backend must also protect.


8. Make Sure It Is Testable and Supportable

The FSD should help QA and support teams.

I check whether it supports:

  • Positive and negative test cases

  • Integration testing

  • Regression testing

  • Security testing

  • Performance testing

  • Audit testing

  • Production troubleshooting

A feature is not complete only because it works in UAT. It should also be monitorable, supportable, and recoverable in production.


Final Thought

A solution architect reviews an FSD to reduce ambiguity early.

The goal is to help BA, FA, developers, QA, DBA, DevOps, security, and support teams move with the same understanding.

A good FSD review prevents rework, reduces production risk, and ensures the solution is practical for real enterprise delivery.

Friday, March 20, 2026

Understanding How Certificates Are Used in Applications


Certificates are used to establish trust in secure communication. In simple terms, they help prove identity when two systems connect.

1. Server certificate

This is the most common use.

The server presents the certificate to the client so the client knows it is talking to the right system.

Example use cases

  • public website

  • internal web portal

  • REST API endpoint

  • CDN custom domain

  • load balancer HTTPS endpoint

Example names

  • portal.company.com

  • api.company.com

  • admin.internal.company.com

If a user opens https://portal.company.com, the server or load balancer presents the certificate. This is normal server-side TLS.

Where it may be installed

  • Application Load Balancer

  • CloudFront

  • API Gateway

  • IIS / nginx / Apache

  • Network Load Balancer with TLS listener


2. Client certificate

Here, the client presents a certificate to the server.

This is used when the server wants to verify who the calling system is.

Example use cases

  • machine-to-machine integration

  • secure partner API access

  • device authentication

  • VPN authentication

  • service account authentication without username/password

Example names

  • integration-client.pfx

  • partner-api-client-cert

  • device-auth-cert

If an order processing service calls a supplier API and sends a client certificate during the HTTPS connection, that is client authentication.


3. Mutual TLS (mTLS)

In mTLS, both sides present certificates.

  • server proves its identity to client

  • client also proves its identity to server

Example use cases

  • B2B integrations

  • secure internal service-to-service calls

  • healthcare or banking APIs

  • zero-trust internal APIs

Example names

  • payments-api.company.com

  • partner-gateway.vendor.com

  • inventory-service.internal.company.com

If inventory-service calls partner-gateway.vendor.com and both sides validate certificates, that is mTLS.


Where certificates can be installed

On a Load Balancer

Used when TLS terminates at the load balancer.

Example use cases

  • one entry point for many websites

  • centralized HTTPS management

  • host-based routing for multiple apps

Example names

  • shop.company.com

  • careers.company.com

  • support.company.com

A load balancer presents the right certificate based on the hostname.


On the Application Server

Used when TLS terminates directly on the server.

Example use cases

  • legacy applications

  • internal admin tools

  • direct server-hosted portals

  • applications not behind a centralized ingress

Example names

  • reports.internal.company.local

  • admin-node-07.corp.local

  • legacy-app.company.local

The certificate is installed directly on the server and bound in IIS or nginx.


In the Windows Certificate Store

Applications or IIS can load certificates from the Windows certificate store.

Example use cases

  • IIS-hosted website

  • .NET Windows service

  • internal scheduler calling external API

Example names

  • client-auth-service-cert

  • portal-web-cert

  • erp-integration-cert


As Files

Certificates may also exist as:

  • .pfx

  • .pem

  • .crt

  • .key

  • .jks

Example use cases

  • Linux web servers

  • Java applications

  • containerized services

  • outbound secure API integrations

Example names

  • server.crt

  • server.key

  • client-auth.pfx

  • service-keystore.jks


In Secrets Manager or Config

Some applications load certificates at runtime from secret or config stores.

Example use cases

  • microservices

  • containerized apps

  • outbound client-auth integrations

  • automated batch jobs

Example names

  • PAYMENT_GATEWAY_CLIENT_CERT

  • MTLS_CERT_PATH

  • PARTNER_API_KEYSTORE


How to know what a certificate is doing

If it is attached to:

  • load balancer HTTPS listener

  • CDN custom domain

  • API custom domain

  • IIS HTTPS binding

then it is usually being used as a server certificate.

If the application is configured with:

  • .pfx

  • keystore

  • thumbprint

  • ClientCertificates

  • X509Certificate2

then it may be used as a client certificate.


Important note

A certificate may contain both:

  • Server Authentication

  • Client Authentication

But that does not mean both are actually used.

The real question is:
Is the certificate being used only for server TLS, or also for client authentication?


Quick examples

Example 1: Public website

  • portal.company.com

  • certificate attached to load balancer

  • users access over HTTPS

Result:
server TLS only

Example 2: Internal portal

  • admin.internal.company.com

  • certificate installed in IIS

  • client certificates not required

Result:
server TLS only

Example 3: Secure partner integration

  • order service calls partner API

  • app loads partner-client.pfx

  • cert attached to outbound HTTPS client

Result:
client certificate usage

Example 4: B2B mutual TLS

  • partner-gateway.vendor.com

  • both systems exchange and validate certificates

Result:
mTLS


Final takeaway

Certificates can be installed on load balancers, CDNs, API gateways, servers, applications, secret stores, or appliances. Their role depends on who presents the certificate and where TLS terminates.

In short:

  • server presents certificate → server TLS

  • client presents certificate → client authentication

  • both present certificates → mTLS

Monday, March 16, 2026

Investigating S3 Cost Growth in AWS – How We Identified the Issue and What We Found

S3 Investigation Summary

We started this analysis because S3 cost was not reducing as expected, even after cleanup activities.

Cost signal

Recent S3 monthly cost stayed around:

  • Oct 2025: ~$5.8k

  • Nov 2025: ~$5.8k

  • Dec 2025: ~$4.7k

  • Jan 2026: ~$5.3k

  • Feb 2026: ~$5.9k

So S3 was still running at roughly $5k–$6k/month.

How we validated it

We did not rely only on folder view in S3 console.
We validated using:

  • bucket size metrics

  • Storage Lens

  • storage class split

  • prefix drill-down

This confirmed:

  • bucket size is around 260–270 TB

  • most of the data is still in Standard

  • storage is concentrated under a non-prod backup prefix

Key finding

This is not mainly a versioning issue or another-region issue.
The main problem is:

  • large backup data retained in Standard

  • under a non-prod backup path

  • with likely retention / lifecycle gap

Incomplete multipart upload issue

One additional contributor may be incomplete multipart uploads.

In this case, large SQL backup files are uploaded to S3 in parts.
If a backup/upload job fails in the middle and the upload is not completed or aborted, S3 keeps the uploaded parts.

That means:

  • no final usable backup object

  • but storage is still consumed

  • and cost continues in Standard storage

This usually happens when:

  • backup job fails midway

  • retry starts a new upload

  • old partial upload is not cleaned up

Fix direction

Main actions:

  • review retention for non-prod backup data

  • apply / correct lifecycle rules for affected prefixes

  • move older backups from Standard to Glacier / archive

  • enable abort incomplete multipart uploads

  • validate whether old backup copies can be deleted

Expected saving

Based on current S3 run-rate, expected saving is roughly:

  •  up to ~$3k/month, if retention can be reduced further

Short root cause statement

Root cause: S3 growth is mainly driven by large non-prod backup data remaining in Standard storage longer than required. In addition, failed multipart backup uploads may be leaving orphaned uploaded parts in S3, adding to storage without creating final usable backup files.

Wednesday, September 24, 2025

Information Classification and Role Management

 In continuation of the previous post..

Approach for Implementation

 

Step 1: Assign Role Clearance Level (One-time activity)

o   Assign a single, high-level clearance to every role in the User Management screen and its purpose defines the highest sensitivity of data each role is permitted to access.

Role

Clearance Level

Admin

Secret

Finance Manager

Confidential

Lab Manager

Confidential

Lab Technician

Internal

 

Step 2: Classify Attributes (One-time activity via DACPAC)

o   Classify each attribute once, stored directly in the database via DACPAC seed scripts:

o   Attributes will appear labelled clearly with their classification levels within the Role Management UI.

Entity

Attribute Name

Classification

Account

Account Number

Secret

Account

Financial Data

Confidential

Instrument

Serial Number

Confidential

Contact

Email Address

Internal

 

Step 3: Automatic Permission (Minimal effort)

o   View Permissions:

ü  Default "View" permission granted automatically if the Role's assigned Clearance Level >= Attribute's Classification Level.

Role Clearance

Attribute Classification

Automatically Assigned View Permission

Secret

Secret, Confidential, Internal, Public

Yes

Confidential

Confidential, Internal, Public

Yes

Internal

Internal, Public

Yes

 

o   Edit Permissions:

ü  No automatic edit permission will be inherited based on clearance level.

ü  Explicitly assigned by administrators via the Role Management UI.

Note:

o   If a Role has Feature-level Edit permission, it implicitly has View permissions, and all attributes within that feature (entity) will automatically inherit Edit permission by default.

 

Step 4: Role Management (Overrides Only, Exception-based)

o   Role Management UI can override any permissions inherited by the system

·        Inherited permissions are clearly displayed by default (based on classification and role clearance).

·        Admins explicitly manage deviations only (typically minimal).

o   Planned Usability Enhancements:

·        "View All" and "Edit All" toggles for bulk actions.

  

Specific Use Case Examples

  1. Use Case 1: Feature-level Edit Permission
    • Granting Edit permission to a feature (entity) automatically grants View and Edit permissions to all attributes within that entity.
  1. Use Case 2: Partial Edit Permissions (Attribute-level exceptions)

UI Form Behavior:

    • Fields without Edit permission are read-only.
    • On Save (POST/PUT), only authorized edited attributes are submitted.
    • The backend explicitly validates submissions. Unauthorized edits trigger explicit authorization errors.
    • Transactions succeed if no unauthorized attributes are submitted.
  1. Use Case 3: Bulk Imports (CSF Import Scenario)
    • A CSF import contains 4 samples; the user lacks edit permissions on 2 attributes.
    • The import succeeds for samples that do not involve restricted attributes.
    • The import explicitly fails or skips samples(which doesn’t have edit access to those attributes), providing clear feedback to the user:

"Some attributes weren't imported due to insufficient permissions."

  1. Use Case 4: Masking Behaviour
    • If a user has sufficient clearance but the View permission is disabled, the attribute appears masked in the UI (e.g., "****").
    • If a user does not have sufficient clearance but the View permission is enabled, the attribute appears masked in the UI (e.g., "****").
    • Both Clearance and View permissions are needed to view the attributes

Scenario

Outcome

UX benefit

Partial edit permissions

Non-editable fields disabled

Prevents confusion

No view permission

Attribute visible as masked (****)

Clearly indicates permission/clearance issue

No sufficient clearance

Attribute visible as masked (****)

Clearly indicates permission/clearance issue

 


Field Level Authorization and Data Classification

 Authorization architecture 

•   Implement Field-Level Role management allowing administrators to define View and Edit permissions for individual data fields based on user roles.

•  Incorporate Data Classification (e.g., Secret, Confidential, Internal, Public) for data fields and corresponding Clearance Levels for roles.

•  Implement Data Masking for sensitive fields when a user lacks sufficient permission or clearance to view the actual data.










UI Enhancements

  1. Extend the Hierarchy Tree:

The main tree structure needs to be extended to include Fields as the lowest level under Entities.

  • New Structure: Module -> Entity -> Field

The existing "Action" items (like Edit Account, Export Account) representing feature-level permissions should likely remain, grouped under Module.

  1. Introduce Field-Level View/Edit Controls:

New Checkboxes: Add specific View and Edit checkboxes that only appear next to the Field-level items in the tree.

  • These checkboxes control the permissions stored in the new RoleField Permission table.
  • These could be new columns aligned similarly to the existing Read/Write, but clearly designated for fields.
  1. Existing Read/Write Controls:

The existing Read and Write checkboxes should remain for the Module, Feature Group, and Action level items.

  1. Add Classification Column:

Introduce a new read-only column in the tree/list view, aligned with the rows.

  • For rows corresponding to Fields, this column should display the field's ClassificationLevel (e.g., 'Public', 'Internal', 'Confidential', 'Secret') fetched from the FieldClassification table.
  1. Display Role Clearance:

Somewhere near the selected Role name (e.g., below the dropdown), display the MaxClearanceLevel assigned to that Role (read-only).

  1. Implement Checkboxes for Fields:

Add aggregate View and Edit checkboxes at the Entity and Module levels within the tree.

This is to reflect and control the state of all underlying field permissions within that scope. Clicking them would allow bulk grant/revoke operations for fields

  1. Add Search/Filtering:

Include a search input field above the tree to allow administrators to quickly filter the potentially long list fields by name.

  1. Update Save/Copy Logic:

The Save button's action now needs to update potentially both the RolePermission (for feature-level changes) and RoleFieldPermission (for field-level changes) tables.

The Copy Privileges functionality needs to be updated to intelligently copy both sets of permissions if desired.

 

User Management

Role Clearance Levels: Each role is assigned a maximum sensitivity level it can access.

For instance, a Lab Technician role might have clearance = Internal (can see Class 3 and Class 4 data, but not Confidential or Secret), whereas a Lab Manager might have clearance = Confidential (Class 2) and an Admin clearance = Secret (Class 1).

We will represent this in the Role table (e.g. a column MaxSensitivity or numeric level). The Authorization service will interpret this such that:

  • A user’s effective clearance is the highest of any role they possess.
  • If a user has multiple roles, consider whichever role grants higher clearance prevails, since that user is trusted up to that level.

 Masking

Server-Side Masking Implementation: When a microservice determines that a field’s value should not be revealed, instead of dropping it, it replaces the value with a masked representation:

  • Use a constant string or asterisks of matching length. E.g., Customer Name Peter Lynch might be sent as *********ch (preserving last 2 digits)

 Authorization Logic:

When a microservice is determining whether to show a field, it now must consider both the user’s explicit field permission and the Users Clearance level:

  • Role-based check: Does the user’s role normally allow access to this field? (From the field-level permissions discussed above.)
  • Classification check: Is the field’s sensitivity <= the user’s clearance level? If not, the field must be treated as disallowed, even if the role would otherwise permit it.
ER



Designing a Governed AI Assistant for eLIMS

When I started thinking about this solution, the problem was very simple. In eLIMS, users have a lot of data, but getting quick answers is ...