Wednesday, January 15, 2014

Favorites Browser is live in Windows App store

My first Windows store App is live in App store. Please download from Link. and provide your valuable feedback.
This App is first step towards creating a service and client Apps for storing your favorites or bookmarks at one location. Which can be accessed from any platform.



The storage location is your SkyDrive, so you are in control of your data and it is not accessible to anybody else other than you. This is a great App if you are concerned about your privacy.

The current version of App provide deep integration with your desktop IE. you just need to follow the instruction on all of your desktop and you have your favorites linked and stored at one place.
And you may access and manage them through this App.



 

Thursday, November 10, 2011

Coding, Debugging and more: Out of Memory problem solved :)

Coding, Debugging and more: Out of Memory problem solved :): Sorry for posting the solution so late. But you know better late then never :). As i mentioned in my earlier blog that i don't want to cha...

Tuesday, July 26, 2011

Out of Memory problem solved :)

Sorry for posting the solution so late. But you know better late then never :).
 
As i mentioned in my earlier blog that i don't want to change the code, so i was looking for some configurational change which would solve the problem. I know that the system resouces like cpu utilization and memory utilization of the server was fairly low, so i was looking for the way to use the hardware resource to correct the issue.
 
The Solution:
I first tried with splitting the big csv file into smaller csv files, but it did not worked for long as system started thrashing a lot after an hour or so and finally become un-responsive. Now i know this solution will not work until i control/reduce the number of request coming to a server/process.
Now i have two ways to control/reduce the number of request per process:
  1. Either increase the number of server or
  2. Have multiple processes running on same server (Web Garden in IIS 6.0).
Since hardware resources on server are under utilized so i decided to have multiple processes running on each server. The request coming to each server is divided among different processes running on that server which reduces the among of request coming to each process.
This trick worked and the problem got resolved. Thanks to IIS 6.0 for providing Web Garden features. It really provided what i was looking for.

Please refer below article for configuring Web Garden in IIS 6.0

Be careful when deciding on using web garden as it is resource intensive and also have the risk of contention if you are accessing a shared resouce.

Wednesday, March 30, 2011

Out of Memory Exception: A simple string.Split can become fatal if not used properly

Introduction: Recently I was working on an issue caused by OOM exception and found out interesting finding about String.Split() function. I thought I should share the finding with others and they may also benefit from this analysis in solving OOM exceptions.
Its a step by step article, i have tried to cover the topic required before starting with the technical details.
Problem: The service causing the issue reads records from a csv file, keeps them in ASP.NET cache and queries the records using LINQ.  Everything was working fine until the records in CSV files was increase from 200k to 1350K (exact 1355049). Within an hour of this change the service was flooded with OOM exception. The CSV file contains Hexadecimal numbers of length 16 characters in each line separated by comma:

00123456789ABCDE,

0123456789ABCDEF,

…..

Investigation: I knew it was the increase in records which triggered this issue. So without wasting a second I asked my OPS team to through up more RAM on servers (now they are 4 GB RAM each server). To my surprise it did not resolve the issue.
Why increasing RAM size of server did not solve the problem?

Before I answer this question, a little background is needed:
Virtual Memory: Each process on 32-bit Microsoft Windows has its own virtual address space that enables addressing up to 4 gigabytes of memory which is divided into two partitions: one for use by the process (user mode) and the other reserved for use by the system.  Generally each partition has 2GB of address space. Process requests memory from Virtual address space reserved for that process. A virtual address does not represent the actual physical location of an object in memory; instead, the system maintains a page table for each process, which is an internal data structure used to translate virtual addresses into their corresponding physical addresses. Each time a thread references an address, the system translates the virtual address to a physical address (Address in RAM). The virtual address space is typically divided into pages (a page is block of contiguous virtual address spaces) of size 4KB.

Well now I know every process can have 4GB of address space but my Physical memory (aka RAM) is limited typically 2GB. Then how can I accommodate 4GB per process in my Physical memory?

The answer is Page File, a configurable space on hard disk. Page file is used to store pages of a process which can’t reside in Physical Memory.


Let’s draw the picture now (simplified for clarity):

1)      A process requests memory from OS 


2)      OS reserves the requested memory from free pages of virtual address space. (Reserved)

3)      When process uses (read/write/access) part of this memory space then this memory space is mapped to physical memory (aka RAM). (Committed)

4)      In case there is not enough space in physical memory to bring in the new pages then OS remove least used pages from RAM and save them in page file (on hard disk). The space created in RAM will be used for mapping newly requested memory. (Page Fault)

In case OS could not found the requested memory (contiguous unused pages) from virtual address space then it raises out of memory exception. This is exactly what happened in this case, though there was enough free RAM available but there were not enough contiguous unused pages in w3wp.exe processes virtual address space to fulfill the memory request.
e State in Virtual Address Space of a Process

The pages of a process's virtual address space can be in one of the following states. (For detials refer http://msdn.microsoft.com/en-us/library/aa366794(v=VS.85).aspx )

State
Description
Free
The page is neither committed nor reserved. The page is not accessible to the process. It is available to be reserved, committed, or simultaneously reserved and committed. Attempting to read from or write to a free page results in an access violation exception.
Reserved
The page has been reserved for future use. The range of addresses cannot be used by other allocation functions. The page is not accessible and has no physical storage associated with it. It is available to be committed.
Committed
Memory charges have been allocated from the overall size of RAM and paging files on disk. The page is accessible and access is controlled by one of the memory protection constants. The system initializes and loads each committed page into physical memory only during the first attempt to read or write to that page. When the process terminates, the system releases the storage for committed pages.


Now, we know OOM exception is always caused by insufficient contiguous virtual address space not by insufficient RAM.  But to fix the problem I wanted to see what object is taking most of the space and what is the size of each chunk it requests. Lets move to the next step:


Finding the root cause:

Approach :
  • Start with Perfmon: collect revelant counters and analyze them.
  • Debug Memory dump to find the culprit. Perfmon is very good analysis tool but to find the exact root cuase you have to use memory analysis tool. In this case i used Windbg. Do you know a better tool then Windbg? 
What are the causes of OOM exception? Seems one of them  :
  1.       Memory Leak in heap.
  2.       Fragmentation of virtual memory.
  3.       Constantly requesting big chunks of memory.
Perfmon Analysis:

Collect counters:
  1. If more than one application pool running on IIS then select w3wp.exe process with the largest working set for perf mon monitor. You may use Task Manager for identying the process consuming largest memory .
  2. Collect following performance counters to identify the issue:
    1. All under .NET CLR Memory\
    2. Memory\Available MBytes
    3. Memory\Pages/sec
    4. Process\Private Bytes
    5. Process\working set
    6. Process\virtual Bytes
    7. Processor(_Total)\% Processor Time
    8. Web Service\Current Connections
Analysis:
Look if we have enough physical RAM
  1. If memory/ available Mbytes are more than 100 MB then the system has enough Physical Memory (RAM). So physical RAM is not an issue.
Symptoms of allocating large amount of data frequently
  1. Process Counter: If private bytes (memory private to process not shared with any other process), working set (page currently loaded in physical RAM) and virtual bytes (virtual address space currently used by process) all are consistently high then check pages/sec. If all these counter are showing high values then it means you are demanding a lot of memory from process, to fulfill your demands process does a lot of page-in/page-out activity. If private bytes are anywhere near to 1 GB value then expect OOM exception to happen any time.
  2. .NET CLR Memory Counter: Further look at heap memory and checkout 1) committed bytes (memory used by process) 2) reserved bytes (memory reserved by GC for process) . If the value of reserved bytes is much higher than committed byte then your program is requesting memory in big chunks because of which GC would not be able to effectively fill small memory holes in segments, and create a new segment. Though GC is smart enough to adjust the holes in next Gen1/2 collection.
  3. .NET CLR Memory Counter – Gen0/1/2, Large Object Heap: GC allocates memory in segments and each segment has 3 sections Gen0, Gen1 and Gen2. All objects less than 87KB allocated from Gen0. If there is not enough space in Gen0 then collection for Gen0 is initiated, any object which is not ready for collection during Gen0 is moved to Gen1. And same applies for Gen1 object. Any object larger than 87 Kb allocated on a separate Large Object Heap (LOH) segment. So increase in size of LOH is a clear indication of large allocation. And increase in size of Gen2 indicates that there are lot of objects which are surviving Gen0 and Gen1 GC. Which indicates memory leak?

In my case there was a big difference between committed bytes Vs reserved bytes and LOH was high which indicated large allocations. So I decided to analyze memory dump of process to find out the object allocating huge memory.


Memory Dump Analysis using Windbg

Take memory dump of w3wp.exe by following steps:
  • Download and install debugger from: www.microsoft.com/whdc/devtools/debugging/default.mspx
  • Open cmd and cd to “C:\Program Files\Debugging Tools for Windows (x86)”.
  • When you see OOM exceptions in evenlog then
  • Run command: adplus -hang -pn w3wp.exe –o c:\dump
  • Dump files will be stored in directory c:\dump. You may change the directory.
  • Also take a copy sos.dll and mscordacwks.dll from the server.




I am going to use lot of SOS extension in analysis, you may get yourself aquaintance with them by reading article on SOS Extension.
Step1:
Open the memory dump in windbg. 
IMP: Load sos.dll and mscordacwks.dll  (the one you copied from server). In case find difficulty loading sos or mscordacwks.dll then refer article on troubleshooting SOS extension loading issue.

 And let’s see how virtual address space of the w3wp.exe is distributed. I have divided the output of !address -summary into 3 parts for better explaination of related parts.









There is some very interesting information here. This is the 2GB (user mode) virtual address space representation for a process. The virtual address space is logically divided into memory regions. Please see the table below (from windbg help on !address) for description of each region:

  
Memory Region
Description
VAR or <unclassified>
Busy regions. These regions include all virtual allocation blocks, the SBH heap, memory from custom allocators, and all other regions of the address space that fall into no other classification.
Free
Free memory. This includes all memory that has not been reserved.
Image
Memory that is mapped to a file that is part of an executable image.
Stack
Memory used for thread stacks.
Teb
Memory used for thread environment blocks (TEBs).
Peb
Memory used for the process environment block (PEB).
Heap
Memory used for heaps.
PageHeap
The memory region used for the full-page heap.
CSR
CSR shared memory.
Actx
Memory used for activation context data.
NLS
Memory used for National Language Support (NLS) tables.
FileMap
Memory used for memory-mapped files. This filter is applicable only during live debugging.


 The following 3 tables have detailed out busy part of virtual memory. Busy part is the part of virtual address space which process is using. Generally represents by Process\virtual Bytes counter.


The below table is very informative. This table shows the largest contiguous allocated or available memory in each region. Wow!!! The process is requesting 103 MB in one chunk whereas it now has only 48 MB chunk available. So OOM is expected. Remember OOM occurs when virtual address space does not have large enough contiguous address to fulfill process memory requirement.

Step 2:
Let’s find out the object who is consuming 103 MB of virtual address space. Since my application is a .net application, so I would be inspecting Garbage Collector’s heap using SOS extension command.  

 My server has 4 processors, that’s why output has 4 heap entries.
See where most of Heap memory going!!!

Step 2.1: Analyze System.String Type

 Let’s analyze last (biggest consumers comes last -)) 3-4 entries (the green highlighted) from dumpheap. I will start with System.String type, as I know I am reading a large CSV file in string and caching it. So I will try to first find it, and I know it contain ~1350K strings, each of size 17 characters (16 hexadecimal character + comma). So the minimum size of string should be around 1,350,000*17*2 (2 byte for each Unicode character) = 45,900,000. I will try find that string using !dumpheap command. The first argument of this command is method table, since I want to analyze string so I am passing MT for string.


Great, I found a big string ( which could be my CSV file ) at address 20540038, whose size is 67,010,584 which is bigger than what we calculated for CSV. Let’s confirm whether its a CSV file on not by verifying the contents. Try with !DO (dumpobject command)


Since it’s a very large string !DO did not print the data. Ok, I will try good old windbg print memory command du (using u for Unicode as I know .net by default uses Unicode for strings)

0:000> du 20540038
20540038  "礳䀁ǿ.ƈ00010158279b91d4,..00019b0"
20540078  "74c2f8cb1,..00010b074c2f9e5b,..0"
205400b8  "0014b074c2fc01f,..00017b074c2fc9"
……
TRIM
……

Great, I found my huge string, look at character string of size 17 (00010158279b91d4,) also look at 2 extra whitespace characters represented by (.) dot at the end of each string. They represent line feed and new line characters, because of them the actual size is  67,010,584 then what i initially calculated 45,900,000.

Well, but there is only one instance of this huge string in Heap0 as we have seen in the output of command !dumpheap -mt 79330b24 -min 45900000 . So this huge string (CSV file) is not the issue as it is only initialized once. So out of 285,212,628 byte allocated to string Type we only found 67,010,584 byte, Now find where is my rest of system typ memory, i will find out the size range in which most of the sting objects falls. It’s kind of approximation so I started with min size of 5000


Ok, we have majority of allocation below 5000 byte. I decrease min several times and then found out that majority of allocation happened between 50 to 100 bytes

Interesting ~4 million string objects between (50 to 100) exists in GC heap, this is not normal. Let’s see what they contain:


We got the address from above command, let’s see what it contains.



It contains 00177ffe81d91fa2 the same 16 character string pattern we used in huge string cache (CSV file) we have in memory. All the 3,882,186 string objects are of same size 54 byte. Which suggest there is something like string.split is used on that huge string (CSV file) which contain comma separated 16 character string. let’s see where this object is rooted.

The string is rooted to a System.Object[] array having address 148b0048.  Ignore invalid objects
Let’s dump this object and see what is contain.
Great, we found out the array resulted from string.split; the number of element matches the number of records in CSV file 1355049. So before we proceed further let’s regroup what we have seen so far by answering below question:

What happens when you do a split on a string containing 1355049 comma separated strings of 16 characters each?

  1. Memory is allocated for an Array to store string object references: Contiguous virtual address space of 4 (address pointer to string object)*1355049 = 5420196 (arrays size) + 16 (for book keeping) = 5420212.
  2.  Memory 1355049 strings:  1355049*54 = ~69 MB.
  3. <Place holder for surprise> at least to me J.

Is having so many small string objects in memory a problem? Since every string takes 54 bytes so a split on 1355049 records will take ~69.5MB. Is this a problem?
No, because this space is not allocated contiguously on heap, so this is not directly causing OOM. Why?
GC allocates anything beyond 87K (contiguous) on LOH (large object heap). Then why data of string.split (total 54*1355049 = 73172646. ~69 MB) is not stored on LOH. The reason is String.Split creates different string object which is not required to be allocated contiguously, so GC could store them wherever it found space in heap. Since the memory requirement of these small string object goes to OS at same time so most of the time they are allocated nearby in (SOH) Gen0 Heap. The example proves it:



Look at the highlighted array elements. The address locations of string records are not contiguous.

So far so good. Then what is the issue ? The process is only demanding ~5MB of contiguous memory whereas we have 48 MB contiguous free Memory.

Look back largest region by usage. 


And what we see from above is that we only need ~5MB contiguous virtual address space at any given point of time. I agree with 5MB allocation per split we might see OOM as the number of current connections reaches in the range of 1000’s connection. But in the present dump there is something else which is causing OOM. Let’s see what it is. Do you remember !dumpheap –stat ?

Step 2.2: Analyze System.Int32[] Type
We have not yet analyzed the largest consumer of .NET heap ->System.Int32[]


We analyzed System.Object and System.String. But not the largest consumer of GC heap System.Int32[].

Let’s start with finding the range where maximum memory consumed.


OOPS!!! These are some really big chunk of memory. But who on earth is consuming them? All of them are of equal size, let’s do dump object of the first one.

Remember what is the size of CSV file when loaded in memory cache? Iterate through the array using !dumparray will reveal that this array contains integer data in the sequence of 16, 35, 54, 73,….. 25745909,0,0,0…0

This tells the character position for next split for string.split function.

The last split happened at 25745909 at index 1355047. So (25745930 – 1355047 =) 24390883 elements contains zero value. I don’t understand why string.split has been designed such a way?

So the size of array is 4*25745930 = 102983720 + 16 (for book keeping) = 102983736.

We found the issue

I guess we found the major cause of OOM in this case. IT’S THE implementation of SPLIT function (of course doing multiple split on unchanged large in-memory string is equally bad).

Let’s revisit:

What happens when you do a split on a string containing 1355049 comma separated strings of 16 characters each, having total character length of 25745930 ?

1)      An Array of pointers to string object: Contiguous virtual address space of 4 (address pointer)*1355049 =  5420196 (arrays size) + 16 (for book keeping) = 5420212.

2)      Non-contiguous virtual address space for 1355049 strings, each of 54 bytes. It does not mean all those 1.3 million strings would be scattered all across the heap, but they will not be allocated on LOH. GC will allocate them on bunches on Gen0 heap.

3)      Split.Function will create internal array of System.Int32[] of size 25745930, consuming (102983736 bytes) ~98MB of LOH, which is very expensive L.

Well i know what is the issue. But how can i get away with this issue without changing the code???
I will cover the solution in next blog.
Virtual Memory:
For further reading I would recommend reading the great article on OOM by Eric Lippert.