Everhart, Glenn From: Michael McAllister [mcallister@annapolis.net] Sent: Thursday, October 15, 1998 1:32 AM To: 'srinivasarao' Cc: 'ntdev@atria.com' Subject: RE: [ntdev] slow data transfer Oh, my. And I thought it was just me was going crazy a few months ago! I wrote to this list a while back about slow accesses to memory. In my case, I wrote a driver (for a PCI device) that need to map a buffer from the kernel's virtual space into a process in user virtual space. I, too used ZwMapViewOfSection. And I too saw VERY slow accesses to this memory. For example, performing a memcpy() in the User Process (to or from) this "ZwMap'd" buffer took 4 TIMES as long as a memcpy() to a second buffer that was just malloc'd normally in that user process. I wrote to this list asking for help. I got a lot of responses (thanks to all!!). The one that fixed my problem came from George Blat... it read (in part): "Allocate contiguous physical memory in your driver with MmAllocateContiguousMemory. Get the system virtual address. call IoAllocateMdl to get a MDL pointer. Do not associate it with any Irp. call MmBuildMdlForNonPagedPool to fill the page information call MmMapLockedP ages with UserMode to get an user reachable address that you can return on your IOCTL." As soon as I did the mapping between Kernel Virtual and User Virtual with "MmMapLockedPages" instead of with "ZwMapViewOfSection", the thing took off... the buffer accesses to the mapped Kernel Contiguous Buffer were JUST AS FAST as buffers malloc'd from the user process. It was that simple. I had spent weeks going off on all kinds of tangents (and learning A LOT in the process) and all it took was that simple change to see the buffer access improve to where they should have been all along. I never did figure out WHY ZwMapViewOfSection was such a dog.... I am now working on a different project. I don't know if you can use the same mapping technique... I was trying to map contiguous memory (only allocatable from the kernel) into user space so that the user program could fill it with data, then a DMA engine on my hardware could efficiently move it to the hardware device's 32 meg buffer. But it is something to maybe take a look at. It increased my buffer accesses dramatically! And to anyone that remembers my question about whether people really see 132 MB/s on the PCI bus? YES! I average 125MB/s throughput on my 333MHz Dell Pentium II, but I have hit 130 on occasion. Again, I don't know if this will help you... I really don't know as much as the rest of the guys on this list about how NT does things, and their responses are probably much better than mine. But if you are stuck, maybe the above info will help. Thanks again Mr. Blat & all those who helped me! -Michael McAllister -----Original Message----- From: srinivasarao [SMTP:dsrao@ada.ernet.in] Sent: Wednesday, October 14, 1998 1:55 PM To: Dave Harvey; srinivasarao; 'Bhaduri, Arnab'; ntdev@atria.com; 'Mark Roddy' Subject: RE: [ntdev] slow data transfer Hi friends, Thank U somuch for taking pain on behalf of me . At present I am following the Example "MAPMEM" in DDK exactly . I am not using any cache.In fact I have declared carefully in Tranalation commands . My Hardware consists of Device Memory of 64 k size and some ports at 0x140 and an Interrupt . I have mapped both ports and memory seperately . Then I made use of "ZwMapViewOfSection" in kernelmode to get "Virtual Address" in User Space . So, I get a address from kernel mode to my Application through DeviceIoControl command . Using this address as a Base , I am trying to write to different locations by varying offset . The reason why I am believeing my driver is working is, I am following standard method using examples . I am checking at each level the return values . Initially I too was a having doubt that whether I am writing to my driver or not . Later I found that it needs some delay for proper operation . Anyway this is my understanding only . Please Let me know if I am doing anything wrong. I think Mark has given some suggestion , I don't know how to implement it . In the example "MAPMEM" , he is using "ioaccess.h" to write into registers . These macros as good as using i/o calls . Here is the code // // I/O space read and write macros. // // The READ/WRITE_REGISTER_* calls manipulate MEMORY registers. // (Use x86 move instructions, with LOCK prefix to force correct behavior // w.r.t. caches and write buffers.) // // The READ/WRITE_PORT_* calls manipulate I/O ports. // (Use x86 in/out instructions.) // #if defined(_X86_) #define READ_REGISTER_UCHAR(Register) (*(volatile UCHAR *)(Register)) #define READ_REGISTER_USHORT(Register) (*(volatile USHORT *)(Register)) #define READ_REGISTER_ULONG(Register) (*(volatile ULONG *)(Register)) #define WRITE_REGISTER_UCHAR(Register,Value) (*(volatile UCHAR *)(Register) = (Value)) #define WRITE_REGISTER_USHORT(Register, Value) (*(volatile USHORT *)(Reg ister) = (Value)) #define WRITE_REGISTER_ULONG(Register, Value) (*(volatile ULONG *)(Register) = (Value)) #define READ_PORT_UCHAR(Port) inp (Port) #define READ_PORT_USHORT(Port) inpw (Port) #define READ_PORT_ULONG(Port) inpd (Port) #define WRITE_PORT_UCHAR(Port, Value) outp ((Port), (Value)) #define WRITE_PORT_USHORT(Port, Value) outpw ((Port), (Value)) #define WRITE_PORT_ULONG(Port, Value) outpd ((Port), (Value)) #elif defined(_PPC_) || defined(_MIPS_) #define READ_REGISTER_UCHAR(x) (*(volatile UCHAR * const)(x)) #define READ_REGISTER_USHORT(x) (*(volatile USHORT * const)(x)) #define READ_REGISTER_ULONG(x) (*(volatile ULONG * const)(x)) #define WRITE_REGISTER_UCHAR(x, y) (*(volatile UCHAR * const)(x) = (y)) #define WRITE_REGISTER_USHORT(x, y) (*(volatile USHORT * const)(x) = (y)) #define WRITE_REGISTER_ULONG(x, y) (*(volatile ULONG * const)(x) = (y)) #define READ_PORT_UCHAR(x) READ_REGISTER_UCHAR(x) #define READ_PORT_USHORT(x) READ_REGISTER_USHORT(x) #define READ_PORT_ULONG(x) READ_REGISTER_ULONG(x) // // All these macros take a ULONG as a parameter so that we don't // force an extra typecast in the code (which will cause the X86 to // generate bad code). // #define WRITE_PORT_UCHAR(x, y) WRITE_REGISTER_UCHAR(x, (UCHAR) (y)) #define WRITE_PORT_USHORT(x, y) WRITE_REGISTER_USHORT(x, (USHORT) (y)) #define WRITE_PORT_ULONG(x, y) WRITE_REGISTER_ULONG(x, (ULONG) (y)) In the above code I am using "write_register_*() calls to write and read_register_*() for reading from registers using the virtual address what I got from kernel mode . Hope U can give me some solution with this . Thanks in advance srinivas - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [ To unsubscribe, send email to ntdev-request@atria.com with body UNSUBSCRIBE (the subject is ignored). ] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [ To unsubscribe, send email to ntdev-request@atria.com with body UNSUBSCRIBE (the subject is ignored). ]