Saturday, November 6, 2021

Slowness in long running load tests

 I would like to share another interesting Database issue

 Performance Testing team had shared the below report where they had observed slowness in long running test after 1.5 hours, which they don’t notice in shorter tests for an hour long


I could figure out that the issue was with  the Database.  If you see the below snapshot from AppDynamics, Resource semaphore wait stats in sql server had increased during the middle of the load test.





I had gathered few additional statistics from Grafana to understand the root cause and found that the applications slowed down and server memory reached its limit (84 GB )and resource semaphore events were getting triggered as shown in the image below





Next step is to determine which query is causing the issue. So I ran DMV in SQL server to determine the same.

 SELECT der.session_id ,

DB_NAME(der.database_id) AS database_name ,

deqp.query_plan ,

SUBSTRING(dest.text, der.statement_start_offset / 2,

( CASE WHEN der.statement_end_offset = -1

THEN DATALENGTH(dest.text)

ELSE der.statement_end_offset

END - der.statement_start_offset ) / 2)

AS [statement executing] ,

der.cpu_time

--der.granted_query_memory

--der.wait_time

--der.total_elapsed_time

--der.reads

FROM sys.dm_exec_requests der

INNER JOIN sys.dm_exec_sessions des

ON des.session_id = der.session_id

CROSS APPLY sys.dm_exec_sql_text(der.sql_handle) dest

CROSS APPLY sys.dm_exec_query_plan(der.plan_handle) deqp

WHERE des.is_user_process = 1

AND der.session_id <> @@spid

ORDER BY der.cpu_time DESC ;

-- ORDER BY der.granted_query_memory DESC ;

-- ORDER BY der.wait_time DESC;

-- ORDER BY der.total_elapsed_time DESC;

-- ORDER BY der.reads DESC;

 

 Output of the DMV gave us the Culprit. it  is a query requesting around 16GB of Server memory (memory grant) for execution. This query needs server memory to store temporary data while sorting and joining rows.

Note:  The amount of the workspace memory for a query is called a memory grant. A memory grant is calculated during the query compilation and then, when the execution should start, this amount is requested and, depending on the available memory, granted to a query.

Solution : Tune the sql query to reduce memory usag. Development team had given the fix in  a day and that solved the problem





Tuesday, October 26, 2021

Antivirus software and performance impact

Sometime back, I received a request  about Appdynamics not fetching data in windows server 2016  but it works fine in other operating systems. To give you a background, Appdynamics fetches data via WMI scripts, which inturn obtain data from perfmon counters in regular time intervals.

I was presented with the following metrics for WMI scritps(written in VB) used by appdynamics and clearly shows windows 2016 is slower. 






Appdynamics vendor provided proof that it works in other companies where same OS was hosted.  

So the challange is to debug in our environment, I took an approach to trace all the system calls using procmon to determine the rootcause.




Root cause of this WMI script slowness in 2016 is due to MacAfee interfering with the process. If you see the snapshot above, McAfee took 352 sec which attributes to 90% of the total time. The same behavior is not found in 2012 version.

Mcafee accepted the issue and gave us a patch which resolved the problem. 

Debugging is always fun :)


Tuesday, October 19, 2021

IIS Application app pool getting crashed intermittently

I was debugging another interesting issue  where after doing a specific business transaction, Application pool getting stopped and service become unavailable.

Application logs doesn’t give much details

2021-10-08 19:48:25.8085 4.10.191 INFO Beginning Startup...

2021-10-08 19:48:25.8505 4.10.191 INFO CORS configuration found, allowing the following Origins: 

2021-10-08 19:48:25.8505 4.10.191 INFO Allowed Origin: https://xxx.hostname..local

2021-10-08 19:48:25.8505 4.10.191 INFO Applying CORS Configurations

2021-10-08 19:48:25.8505 4.10.191 WARN `Auth Configuration not found, defaulting to IWA

2021-10-08 19:48:25.8505 4.10.191 INFO Configuring for Intergrated Windows Authentication

2021-10-08 19:48:28.0246 4.10.191 INFO Authorization.Extension: Start parsing : "D:\inetpub\wwwroot \1.0\bin\Authorization.rules".

2021-10-08 19:48:28.1396 4.10.191 INFO Authorization Role Definitions have been created.

2021-10-08 19:48:28.1396 4.10.191 INFO Authorization.Extension: Successfully finished parsing : "D:\inetpub\wwwroot\1.0\bin\Authorization.rules".

2021-10-08 19:48:31.2079 4.10.191 INFO Startup Completed.

Another Http exception was also observed as below 

Message: Request is not available in this context
Stack:
[HelperMethodFrame]
System.Web.HttpContext.get_Request()
System.Web.HttpContextWrapper.get_Request()
NLog.Web.Internal.HttpContextExtensions.TryGetRequest(System.Web.HttpContextBase)
NLog.Web.LayoutRenderers.AspNetRequestValueLayoutRenderer.DoAppend(System.Text.StringBuilder, NLog.LogEventInfo)
NLog.LayoutRenderers.LayoutRenderer.RenderAppendBuilder(NLog.LogEventInfo, System.Text.StringBuilder)
NLog.Layouts.SimpleLayout.RenderAllRenderers(NLog.LogEventInfo, System.Text.StringBuilder)
NLog.Layouts.Layout.RenderAppendBuilder(NLog.LogEventInfo, System.Text.StringBuilder, Boolean)
NLog.Layouts.SimpleLayout.PrecalculateBuilder(NLog.LogEventInfo, System.Text.StringBuilder)
NLog.Targets.Target.PrecalculateVolatileLayoutsConcurrent(NLog.LogEventInfo)
NLog.Targets.Wrappers.AsyncTargetWrapper.Write(NLog.Common.AsyncLogEventInfo)
NLog.Targets.Wrappers.AsyncTargetWrapper.WriteAsyncThreadSafe(NLog.Common.AsyncLogEventInfo)
NLog.Targets.Target.WriteAsyncLogEvent(NLog.Common.AsyncLogEventInfo)
NLog.LoggerImpl.WriteToTargetWithFilterChain(NLog.Targets.Target, NLog.Filters.FilterResult, NLog.LogEventInfo, NLog.Common.AsyncContinuation)
NLog.LoggerImpl.Write(System.Type, NLog.Internal.TargetWithFilterChain, NLog.LogEventInfo, NLog.LogFactory)
NLog.Logger.Trace(System.String)
System.Web.HttpApplication.InitModulesCommon()
System.Web.HttpApplication.InitInternal(System.Web.HttpContext, System.Web.HttpApplicationState, System.Reflection.MethodInfo[])
System.Web.HttpApplicationFactory.GetNormalApplicationInstance(System.Web.HttpContext)
System.Web.HttpRuntime.ProcessRequestNotificationPrivate(System.Web.Hosting.IIS7WorkerRequest, System.Web.HttpContext)
System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32)
System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32)
DomainNeutralILStubClass.IL_STUB_ReversePInvoke(Int64, Int64, Int64, Int32)

 

Events logs

Events logs has below entries and it doesn’t have stack trace to debug

  •         A process serving application pool name.Psh' suffered a fatal communication error with the Windows Process Activation Service. The process id was '53196'. The data field contains the error number.
  •        Application pool name.Psh ' is being automatically disabled due to a series of failures in the process(es) serving that application pool.
  •         Application popup: w3wp.exe - System Error : A new guard page for the stack cannot be created.

 

To debug, I have configured debugdiag to capture memory dumps on crash and found couple of exceptions on the threads

  45   60 1ef18 00000214ce10fff0  1029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Worker)

  46   62 7884 00000214cdf937a0  1029220 Cooperative 00000214B811CB00:00000214B811D8D0 00000214af95f900 0     Ukn (Threadpool Worker)

  47   47 8b50 00000214ce0f28c0  8029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Completion Port)

  48   65 17114 00000214ce10a1f0  8029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Completion Port)

  49   63 7bd4 00000214cdbafd90  1029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Worker)

  50   61 1598 00000214ce0ab520  1029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Worker)

  51   58 16c04 00000214ce0f20f0  1029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Worker)

  52   57 1314c 00000214cdfaf0e0  1029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Worker)

  53   55 1ecfc 00000214cdfad060  1029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Worker)

  54   54 12004 00000214ce10c130  1029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Worker)

  55   53 128f0 00000214ce0f3860  1029220 Preemptive  00000214B811A580:00000214B811B8D0 00000214af95f900 0     Ukn (Threadpool Worker) System.Web.HttpException 00000214b811a498

  56   52 c7a0 00000214ce0fa0d0  1029220 Preemptive  00000214B80E8168:00000214B80E98D0 00000214af95f900 1     Ukn (Threadpool Worker) System.StackOverflowException 00000214afc91158

  57   48 e110 00000214ce0c0950  1029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Worker)

  58   34 7a4c 00000214ce0f0980  1029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Worker)

  59   39 3450 00000214ce0f1150  1029220 Preemptive  0000000000000000:0000000000000000 00000214af874010 0     Ukn (Threadpool Worker)

 

from the dump, Its clear that app pool crashed due to System.StackOverflowException on thread 56

 

Review the stack trace of thread 56 gives hint that automapper causing stack overflow


RootCause

Application has circular reference in its classes which are used on the workflow because of this when user attempt to perform type casting via automapper, stack is getting filled up.

Solution to the problem is to Avoid type conversion via auto mapper instead converting types using manual mapping.

 


Tuesday, October 12, 2021

Fiddler improves Service Response times?

One of the development teams approached me with an interesting issue. The issue description states the response time of a service call is faster by 100% when ever Fiddler is running.  I was requested to suggest if the same performance can be achieved without fiddler.

I took some Network traces  at the client with and with out fiddler and found some interesting observations

Without fiddler, Only 2 TCP connections were used between the client and the server



With Fiddler - there were 4 active parallel TCP connections resulting  faster response time




With the data it is clear more the parallel connections the better the response time. With the issue identified solution is even simpler

By default, we can create only two simultaneous connections to an HTTP server.   This can be increased to avoid a backlog of requests during times of very high transaction volume. It can be done in many ways as given in the link below

https://developer.cybersource.com/library/documentation/dev_guides/Simple_Order_API_Clients/html/Topics/Setting_the_Connection_Limit.htm


Monday, September 20, 2021

Dynamic Compression in IIS

 Dynamic compression doesn't work by default in IIS and there are few config changes that are needed to be done for it to working. Please refer the below steps to implement the same

Enable Compression in IIS application

 Prerequisite: Dynamic content compression module should be installed. If it is not installed, please install. It can be done using Server Manager.

Do the followings for Webclient and other sites related to the application.

·       Enable Dynamic Compression at site level.

 


·       Traverse Configuration Editor to check and configure appropriate compression settings.


·       Traverse to “system.webServer/httpCompression” section.

 








·      
Click on dynamic Types and add application/json mimeType if missing. That is responsible to compress api level responses. Add this in Webclient and API site levels.


 





·       Save and restart the app pool.

·       Verify in the client side that dynamic compression is working or not. It should have “Content-Encoding: gzip” in Response Headers.





Sunday, August 22, 2021

Monitor SQL server Installed in Azure VM

 

SQL server Installed in Azure VM

1.    Introduction

When planning for a new SQL database One of the primary decisions that one need to make is where it’s going to reside. In the past, only option was an on premise installation of sql server, but with the capabilities of Azure we now have additional options to be considered. One such option is SQL server installed on Azure VM.  This option is described as infrastructure as Service or IaaS.

SQL Server IaaS helps FAST migration of the existing applications to the cloud with minimal changes. With this method we use cloud platform to provide computing resources in the form of a VM. With IaaS Microsoft provides the hardware and OS and SQL server installation (if selected). We will have full control of OS and SQL Server installations

2.    Sizing of SQL Server VM

Sizing SQL Server Azure IaaS VMs is similar to the traditional way we size on-premises systems. However, there are a couple of concepts we need to understand about storage and how to approach cloud systems to ensure you select the right size to host your workloads.

Azure Storage There are multiple types of Azure storage available from Standard to Premium disks, both managed and un-managed. The type of storage selected for SQL VMs will have a bearing not only on performance but also on whether we will be able to meet the requirements covered by the SLAs

The size and class of VM selected will also dictate the number of data disks you can use and the performance profile for them regarding IOPS and throughput. Monitoring the VM and Database server will help in optimizing the performance and cost

3.    Monitoring of Database in Azure VM

We have to make sure we have great insight into our database, and insight comes from monitoring. If there are undetected database problems, then problems will be exhibited throughout the entire application, and it will be very hard to diagnose.

Database monitoring should cover

·        Performance

·        SLA adherence and Service Health

·        Spikes and unexpected errors

·        Capacity –CPU, Memory, IOPS etc.

4.    Types of Monitoring Data in Azure

These are different types of monitoring data in Azure

4.1       Metrics

o   Virtual Machine Host metrics like CPU percentage, IO, Disk space allocated etc. can be captured. These are available by default

Note: SQL server Metrics are not available for SQL Server IaaS Version

 

 

4.2       Logs

Logs are broadly of two types

4.2.1       Activity logs

It shows information about what's happened at the fabric level to a resource, a resource was created, it was modified, it had a new key generated to it etc.

 

4.2.2       Diagnostic logs

These come from inside the resource, and can send those to various targets.  

 

Ø  Enable Guest Level monitoring as shown below

Ø  Once Enabled, it provides several options to capture diagnostic logs

5.    Database Metrics and Monitoring

 

5.1       Azure SQL databases and Monitoring tools

The below tables has the tools available in Azure for monitoring database performance metrics

Azure SQL Server types

Purpose

Monitoring Tools

Comments

Azure SQL Database

Managed, intelligent SQL in the cloud

Azure Monitor

Azure SQL Database includes database advisors that provide performance tuning recommendations for single and pooled databases.

Azure SQL Managed Instance

Managed, always up-to-date SQL instance in the cloud

Azure Monitor

SQL Server on Virtual Machines

Migrate your SQL workloads to Azure while maintaining complete SQL Server compatibility and operating system-level access

Custom Logging via Log Analytics

Commercial tools  are also available like SolarWinds SQL sentry, Dynatrace, AppDynamics etc which installs the agent in SQL server machine for monitoring

 

5.2       Analytical Tools

Analytical tools are meant to derive meaning insights from the metrics captured using monitoring tools






6.    Custom Logging via Log Analytics

There are no built in solution to monitor performance of SQL Server hosted in Azure VM but it is just a normal SQL Server database, and so many of the metrics and alerts that one need for monitoring SQL performance are the same as those need for any other SQL Server instance.

In case of Azure SQL Database, then we have the option of using the built-in monitoring that is supplied in the Azure Portal to obtain the overview of your entire Azure SQL Server estate, and to see it within the context of our overall Azure resources

In SQL Server VM, Dynamic management views can be used to monitor the health of a server instance, diagnose problems, and tune performance.  These views can be triggered via AZURE functions at period intervals to get performance and diagnostic information

  •  Enable VIEW SERVER STATE permission on the SQL server.
  • Connect to SQL SERVER via AZURE Timer Trigger Function and custom log output of the SQL DMV queries to Azure Log Analytics Workspace












  • Configure sql connection string, Loganalytics workspaceid and sharedkey in the keyvault. Details of workspaceid and  key can be found in the the azure portal. Please refer the below screenshot
  • Make sure all the sensitive information is stored on the keyvault and access is provided to the required function resource through the service principal









  • Custom logs are stored in Log analytics Workspace as shown below














  • Data from logs can be queried and saved to a dashboard as shown below








  • These DMV queries can be configured in json file stored in a blob storage and can be customized by adding/deleting new ones as needed









  • Dashboard with such metrics can be collated and displayed as below

 










7.    Conclusion

There are several commercial tools available in the market which can also be deployed in the Database server to get performance metrics and diagnostic details for debugging purposes. However, the solution approach provided in the document is easy and simple and can be deployed with limited overhead on the servers without extra license cost (Excluding storage cost and Azure function cost)

Storage Cost

Ø  Data injection – 5 GB free per month - $2.76 per GB

Ø  Data Retention – 31 days free - $0.12 per GB per month

Azure Function Cost

Consumption plan pricing includes a monthly free grant of 1 million requests and 4,00,000 GB-s of resource consumption per month per subscription in pay-as-you-go pricing across all function apps in that subscription. Details refer below

https://azure.microsoft.com/en-in/pricing/details/functions/

 

Claim Based Authorization

  1.      Claim Based Authorization ·         Token Validation: As requests come into the Ocelot API Gateway, the first step is to validat...