Discussion:
Mysql's Innodb immensely slow on ZFS
Nico Sabbi
2008-04-05 08:15:59 UTC
Permalink
Hi,
yesterday I installed Solaris 01/08 on a Sun's server with two 2Ghz Opterons and 2 GB ram
(I don't remember the model of the server but it's one of those 1U rack-mountable
servers with -hideous- rocket-loud fans that never slow down, maybe VX2200 ? )

I configured the bundled Mysql 5.0.xx to work on a ZFS pool made on a raid1 on the
2 bundled disks, but although I followed all the tuning tips found in various Sun bloggers,
that can be resumed as:
- limiting zfs's usage of memory to 256MB (appending to /etc/system
set zfs:zfs_arc_max = 0x8000000
and rebooting, hopefully I did the right thing)

- increasing up to 1.5GB the parameter innodb_buffer_pool_size
and up to 48M innodb_additional_mem_pool_size (in steps, of course)

- trying innodb_flush_method = O_DIRECT

- zfs's recordsize=16k on that particular pool, set before creating a single
file on it

the performance is simply despairing. Just to give you and idea: an import from a 7 MB
SQL dumpe takes 9 seconds if I set engine=myisam, and 98 seconds if engine=innodb.
Considering that both myisam and innodb tables are stored on the same zfs pool
and on the same directory I daresay that there's something extremely wrong in the
interaction of my innodb with zfs.

I hope that someone can help me.
Thanks,
Nico
Lurie
2008-04-05 10:23:19 UTC
Permalink
http://dev.mysql.com/tech-resources/articles/mysql-zfs.html


This message posted from opensolaris.org
Nico Sabbi
2008-04-05 12:04:13 UTC
Permalink
Post by Lurie
http://dev.mysql.com/tech-resources/articles/mysql-zfs.html
that is the first document I read, but none of the tips helped in any way.
I forgot to mention that I disabled the doublewrite, but without any improvement
Brandon High
2008-04-10 00:21:36 UTC
Permalink
Post by Nico Sabbi
- limiting zfs's usage of memory to 256MB (appending to /etc/system
set zfs:zfs_arc_max = 0x8000000
and rebooting, hopefully I did the right thing)
My gut reaction is that you've shot yourself in the foot by making the
ARC cache so small. Have you done any tests with the default settings?

The size of the ARC should vary on memory pressure from other apps
running on the server, but should not exceed 1GB. Unless you know that
the MySQL instance and other apps on the system will always consume
1.5GB of memory, you're limiting your performance.

MySQL with InnoDB performs better with a smaller buffer pool size on
ZFS, relying instead on the ZFS cache. Increasing the buffer pool size
and decreasing the ARC cache size is the exact opposite of what you
want to do.

-B
--
Brandon High bhigh-UUz7JMH5g5LQT0dZR+***@public.gmane.org
"The good is the enemy of the best." - Nietzsche
Nico Sabbi
2008-04-10 07:59:22 UTC
Permalink
Post by Brandon High
Post by Nico Sabbi
- limiting zfs's usage of memory to 256MB (appending to
/etc/system set zfs:zfs_arc_max = 0x8000000
and rebooting, hopefully I did the right thing)
My gut reaction is that you've shot yourself in the foot by making
the ARC cache so small. Have you done any tests with the default
settings?
yes, I limited the ARC only after having verified that the default
setting (unspecified in /etc/system) was too slow.
What would be a good value considering that the server has 2 GB
of ram?
Post by Brandon High
The size of the ARC should vary on memory pressure from other apps
running on the server, but should not exceed 1GB. Unless you know
that the MySQL instance and other apps on the system will always
consume 1.5GB of memory, you're limiting your performance.
when I set a virtual memory limit of 4GB to mysql it died
very soon, so I guess it really needs a lot of memory
(it's full of frequently accessed databases)
Post by Brandon High
MySQL with InnoDB performs better with a smaller buffer pool size
on ZFS, relying instead on the ZFS cache. Increasing the buffer
pool size and decreasing the ARC cache size is the exact opposite
of what you want to do.
-B
terrible. I'll run some test and report. Thanks for your help
Brandon High
2008-04-10 22:59:53 UTC
Permalink
Post by Nico Sabbi
yes, I limited the ARC only after having verified that the default
setting (unspecified in /etc/system) was too slow.
What would be a good value considering that the server has 2 GB
of ram?
Without knowing the specifics of your system, I can't begin to guess.
The ARC's upward limit is about 1/2 of physical memory, and it will
decrease as required by memory pressure from other applications. In
theory, you should never have to adjust it. There are some instances
where this is not the case, but for most scenarios don't touch it.

See http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache
for more info on tuning the ARC cache, and why you should or should
not touch it.
Post by Nico Sabbi
when I set a virtual memory limit of 4GB to mysql it died
very soon, so I guess it really needs a lot of memory
(it's full of frequently accessed databases)
Was this before or after you adjusted the buffer pool size in mysql?
If you have a huge buffer pool defined, of course the database is
going to consume a lot of memory.

Try reducing the buffer pool size in mysql and remove the "tuning"
you've done to the ARC cache. The link that Lurie posted earlier has
some good information - I suggest reviewing it again.

A really easy and probably the best solution that will fix your
problem is to install more memory. Bringing the server up to 4GB or
8GB will probably smooth things out. Crucial has an 2GB kit for $105,
4GB for $300, or 8GB for $550.

-B
--
Brandon High bhigh-UUz7JMH5g5LQT0dZR+***@public.gmane.org
"The good is the enemy of the best." - Nietzsche
Jürgen Keil
2008-04-11 09:20:38 UTC
Permalink
Post by Brandon High
Post by Nico Sabbi
yes, I limited the ARC only after having verified that the default
setting (unspecified in /etc/system) was too slow.
What would be a good value considering that the server has 2 GB
of ram?
Without knowing the specifics of your system, I can't begin to guess.
The ARC's upward limit is about 1/2 of physical memory,
According to the source, it's "3/4 of all memory, or all but 1G,
whichever is more" (64-bit kernel only, 32-bit is different).

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#3231

3231 /* set min cache to 1/32 of all memory, or 64MB, whichever is more */
3232 arc_c_min = MAX(arc_c / 4, 64<<20);
3233 /* set max to 3/4 of all memory, or all but 1GB, whichever is more */
3234 if (arc_c * 8 >= 1<<30)
3235 arc_c_max = (arc_c * 8) - (1<<30);
3236 else
3237 arc_c_max = arc_c_min;
3238 arc_c_max = MAX(arc_c * 6, arc_c_max);
Post by Brandon High
and it will
decrease as required by memory pressure from other
applications.
In theory, you should never have to adjust it. There are
some instances
where this is not the case, but for most scenarios
don't touch it.
Hmm, in theory it should...

A few days ago I tried to install snv_85 into a
virtualbox guest, running on a 64-bit amd dual core box
with 2GB of physical memory which is using snv_85.
The snv_85 guest's virtual hdd was a file on a zfs filesystem.
And since the new GUI installer was complaining that
the guest's 768MB memory wasn't enough for a graphical
installation, I increased the guest's memory allocation to
1GB. The result was heavy thrashing of the hdd during the
attempt to install snv_85 into that 1GB virtualbox guest!
I think it took over half an hour to stop that 1GB virtualbox
guest through the GUI, until I got back control over the machine.

I guess that was a case there the automatic arc cache tuning
didn't work too well (didn't work at all).

I completed the virtual box gui install test using a little less
memory, IIRC I used 768+64 = 832 mbytes.



And another case was an attempt to use "xsltproc(1)" on a
big xml file, this time on an amd64 x2 machine with 4GB of
memory, using zfs, and the xsltproc process had grown to
use > 2GB of memory. Again heavy disk trashing, and I
didn't had the impression that the arc cache did shrink enough
to prevent that thrashing.


This message posted from opensolaris.org
Brandon High
2008-04-11 18:47:05 UTC
Permalink
Post by Jürgen Keil
According to the source, it's "3/4 of all memory, or all but 1G,
whichever is more" (64-bit kernel only, 32-bit is different).
For some reason 1/2 stuck in my head as the max, but I'll believe you. :-)
Post by Jürgen Keil
1GB. The result was heavy thrashing of the hdd during the
attempt to install snv_85 into that 1GB virtualbox guest!
I think it took over half an hour to stop that 1GB virtualbox
guest through the GUI, until I got back control over the machine.
I'm not sure how much overhead virtualbox has, and you didn't say what
else was running on the system at the time.

Did you check how much memory the ARC and other applications were
using during the install? You can find the size of the ARC with kstat
by looking at the zfs module (kstat -m zfs).
Post by Jürgen Keil
use > 2GB of memory. Again heavy disk trashing, and I
didn't had the impression that the arc cache did shrink enough
to prevent that thrashing.
Again, what else was running, and what was the size of the ARC?

-B
--
Brandon High bhigh-UUz7JMH5g5LQT0dZR+***@public.gmane.org
"The good is the enemy of the best." - Nietzsche
Jürgen Keil
2008-04-11 19:21:24 UTC
Permalink
Post by Jürgen Keil
And another case was an attempt to use "xsltproc(1)" on a
big xml file, this time on an amd64 x2 machine with 4GB of
memory, using zfs, and the xsltproc process had grown to
use > 2GB of memory. Again heavy disk trashing, and I
didn't had the impression that the arc cache did
shrink enough to prevent that thrashing.
It looks like this, in top:

load averages: 0.01, 0.02, 0.05 21:11:47
75 processes: 74 sleeping, 1 on cpu
CPU states: 99.4% idle, 0.1% user, 0.5% kernel, 0.0% iowait, 0.0% swap
Memory: 4031M real, 108M free, 2693M swap in use, 1310M swap free

PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
7587 jk 1 60 0 2430M 2306M sleep 0:25 0.24% xsltproc
7588 jk 1 59 0 3916K 1472K cpu/0 0:00 0.02% top
44 root 1 59 0 0K 0K sleep 0:10 0.01% Xorg
7613 jk 1 59 0 2912K 1792K sleep 0:00 0.01% iostat
15634 postgres 1 59 0 19M 1152K sleep 0:53 0.00% postgres
80 root 1 59 0 13M 1644K sleep 0:07 0.00% dtgreet
4276 daemon 2 60 -20 2756K 676K sleep 3:48 0.00% nfsd
27392 root 1 59 0 6300K 1680K sleep 3:17 0.00% ypserv
3307 root 1 59 0 7080K 2716K sleep 3:13 0.00% intrd
7137 root 1 100 -20 2716K 1364K sleep 1:39 0.00% xntpd
2876 root 5 59 0 3568K 1780K sleep 1:21 0.00% automountd
579 daemon 1 59 0 3588K 1384K sleep 1:20 0.00% rpcbind
9 root 15 59 0 20M 1316K sleep 1:20 0.00% svc.configd
26803 root 35 59 0 5576K 2944K sleep 0:52 0.00% nscd
305 root 7 59 0 3996K 1000K sleep 0:42 0.00% devfsadm


The resident set size of xsltproc is *slowly* increasing.

truss on the xsltproc shows no system calls.

% truss -p 7587
^C


But there are lots of page faults:

% truss -m all -p 7587
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F19A024
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F19B018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F19D004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F19E00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1A1008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1A3004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1A4024
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1AC00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1AD018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1AE014
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1AF018
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1B3008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1C0020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1C101C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1CD018
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1CE010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1D400C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1DB17C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1DE020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1E1018
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1E2010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1E3004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1E5008
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1EA010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1F1004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1F3008
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1F6010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1F800C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1F9014
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1FA01C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1FC010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1FD004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1FE00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F201008
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F206010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F20800C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F20A010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F20B004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F20C00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F20D018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F20F004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F212014
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F21600C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F21D004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F22501C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F227008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F229004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F22A00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F22C010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F22D004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F236008
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F23B008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F23D004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F240010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F246000
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F249004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F24B018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F251018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F253000
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F256010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F25800C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F25A020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F25B018
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F25F008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F26200C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F266020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F267004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F269008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F26B004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F26F008
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F271008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F272020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F27E00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F27F008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F280020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F283018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F28C00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F28F008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F290014
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F292010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F295018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F296020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F297004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F2A0020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2A200C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2A3054
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2AB004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2AD004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F2B3018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2B5004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F2B8000
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2BF004
^C Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F2E5008



In iostat, we see lots of 4K transfers, probably page-ins from the swap device:

% iostat -xnzc 5
cpu
us sy wt id
1 4 0 95
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
5.3 1.2 67.4 22.2 0.1 0.0 18.6 2.9 0 1 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 9.5 0 0 c6t0d0
0.0 0.0 0.1 0.1 0.0 0.0 0.0 9.3 0 0 c7t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 11.7 0 0 c8t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.7 0 0 leo:/files/jk
cpu
us sy wt id
0 0 0 100
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
144.6 0.2 578.4 0.9 0.0 1.0 0.0 6.9 0 100 c4d0
cpu
us sy wt id
0 0 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
149.0 0.0 596.0 0.0 0.0 1.0 0.0 6.7 0 100 c4d0
cpu
us sy wt id
0 0 0 100
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
143.6 0.0 574.4 0.0 0.0 1.0 0.0 7.0 0 100 c4d0
cpu
us sy wt id
0 0 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
149.4 0.2 597.6 1.3 0.0 1.0 0.0 6.7 0 100 c4d0
cpu
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
148.4 0.0 593.6 0.0 0.0 1.0 0.0 6.7 0 100 c4d0
cpu
Post by Jürgen Keil
::arc
hits = 25265436
misses = 1907270
demand_data_hits = 19051747
demand_data_misses = 983231
demand_metadata_hits = 5766668
demand_metadata_misses = 99264
prefetch_data_hits = 219895
prefetch_data_misses = 812873
prefetch_metadata_hits = 227126
prefetch_metadata_misses = 11902
mru_hits = 12030281
mru_ghost_hits = 55142
mfu_hits = 12896265
mfu_ghost_hits = 143230
deleted = 1997843
recycle_miss = 115508
mutex_miss = 1716
evict_skip = 8146750
hash_elements = 59206
hash_elements_max = 132131
hash_collisions = 2110982
hash_chains = 15058
hash_chain_max = 11
p = 1155 MB
c = 1356 MB
c_min = 377 MB
c_max = 3017 MB
size = 1356 MB
hdr_size = 9954840
l2_hits = 0
l2_misses = 0
l2_feeds = 0
l2_rw_clash = 0
l2_writes_sent = 0
l2_writes_done = 0
l2_writes_error = 0
l2_writes_hdr_miss = 0
l2_evict_lock_retry = 0
l2_evict_reading = 0
l2_free_on_write = 0
l2_abort_lowmem = 0
l2_cksum_bad = 0
l2_io_error = 0
l2_size = 0
l2_hdr_size = 0
arc_no_grow = 1
arc_tempreserve = 0 MB
arc_meta_used = 448 MB
arc_meta_limit = 754 MB
arc_meta_max = 464 MB
Post by Jürgen Keil
::arc
hits = 25265436
misses = 1907270
demand_data_hits = 19051747
demand_data_misses = 983231
demand_metadata_hits = 5766668
demand_metadata_misses = 99264
prefetch_data_hits = 219895
prefetch_data_misses = 812873
prefetch_metadata_hits = 227126
prefetch_metadata_misses = 11902
mru_hits = 12030281
mru_ghost_hits = 55142
mfu_hits = 12896265
mfu_ghost_hits = 143230
deleted = 1997843
recycle_miss = 115508
mutex_miss = 1716
evict_skip = 8146750
hash_elements = 59206
hash_elements_max = 132131
hash_collisions = 2110982
hash_chains = 15058
hash_chain_max = 11
p = 1155 MB
c = 1356 MB
c_min = 377 MB
c_max = 3017 MB
size = 1356 MB
hdr_size = 9954840
l2_hits = 0
l2_misses = 0
l2_feeds = 0
l2_rw_clash = 0
l2_writes_sent = 0
l2_writes_done = 0
l2_writes_error = 0
l2_writes_hdr_miss = 0
l2_evict_lock_retry = 0
l2_evict_reading = 0
l2_free_on_write = 0
l2_abort_lowmem = 0
l2_cksum_bad = 0
l2_io_error = 0
l2_size = 0
l2_hdr_size = 0
arc_no_grow = 1
arc_tempreserve = 0 MB
arc_meta_used = 448 MB
arc_meta_limit = 754 MB
arc_meta_max = 464 MB


This message posted from opensolaris.org
Brandon High
2008-04-11 21:58:20 UTC
Permalink
Post by Jürgen Keil
Post by Jürgen Keil
And another case was an attempt to use "xsltproc(1)" on a
big xml file, this time on an amd64 x2 machine with 4GB of
memory, using zfs, and the xsltproc process had grown to
use > 2GB of memory. Again heavy disk trashing, and I
didn't had the impression that the arc cache did
shrink enough to prevent that thrashing.
prstat can give more accurate info on Solaris, fyi.
Post by Jürgen Keil
load averages: 0.01, 0.02, 0.05 21:11:47
75 processes: 74 sleeping, 1 on cpu
CPU states: 99.4% idle, 0.1% user, 0.5% kernel, 0.0% iowait, 0.0% swap
Memory: 4031M real, 108M free, 2693M swap in use, 1310M swap free
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
7587 jk 1 60 0 2430M 2306M sleep 0:25 0.24% xsltproc
Only 124M of the xsltproc process is paged - The rest is resident, but
there is 2693M of swap in use. It appears that something other than
your xsltproc is paged out. AFAIK, the ARC should not page, so what
else is using that memory? Your ARC + xsltproc are only equal to
2754M. There's something else on your system consuming 3862M.

Before you ran the process, what did things look like, memory wise?
Also keep in mind that /tmp is an in memory filesystem. Things written
there will use physical memory, or page out if required.

Also, could you sort the output by memory usage? "prstat -s rss" and
"prstat -s size" will sort by resident size or total image size.

-B
--
Brandon High bhigh-UUz7JMH5g5LQT0dZR+***@public.gmane.org
"The good is the enemy of the best." - Nietzsche
Jürgen Keil
2008-04-14 14:25:44 UTC
Permalink
Post by Brandon High
Post by Jürgen Keil
And another case was an attempt to use "xsltproc(1)" on a
big xml file, this time on an amd64 x2 machine with 4GB of
memory, using zfs, and the xsltproc process had grown to
use > 2GB of memory. Again heavy disk trashing, and I
didn't had the impression that the arc cache did
shrink enough to prevent that thrashing.
prstat can give more accurate info on Solaris, fyi.
Yep, top doesn't work well with 64-bit processes.

But top's summary information includes quite a few
things that are missing in prstat, like swap space
usage, memory totals, total cpu usage, ...
Post by Brandon High
load averages: 0.01, 0.02, 0.05 21:11:47
75 processes: 74 sleeping, 1 on cpu
CPU states: 99.4% idle, 0.1% user, 0.5% kernel, 0.0% iowait, 0.0% swap
Memory: 4031M real, 108M free, 2693M swap in use, 1310M swap free
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
7587 jk 1 60 0 2430M 2306M sleep 0:25 0.24% xsltproc
Only 124M of the xsltproc process is paged - The rest is resident, but
there is 2693M of swap in use. It appears that something other than
your xsltproc is paged out.
Nope, most of the 2693M of "swap in use" is reserved swap
space for the heap data that is allocated by the big xsltproc
process.

I also had a few old / big files in /tmp, but not more than 50mbytes.
Post by Brandon High
AFAIK, the ARC should not page,
Correct, that's in the kernel's heap.
Post by Brandon High
so what else is using that memory?
The big xsltproc process, and a few files in /tmp

Plus the X11 server & dtlogin, but no user
logged in into JDS.
Post by Brandon High
Your ARC + xsltproc are only equal to
2754M. There's something else on your system consuming 3862M.
That's not what I see...

I'd say the ARC consumes arc.c 1356 MB
+ arc.arc_meta_used 448 MB (?);
and there is a 2306MB RSS for the xsltproc process.

That's a total of 4110 MB.

And in case the arc meta data is already included
in arc.c:
1356 MB + 2306MB = 3662 MB used.
Post by Brandon High
Before you ran the process, what did things look like, memory wise?
Before running xsltproc:

% swap -s
total: 105256k bytes allocated + 16804k reserved = 122060k used, 4133532k available

And while it is running:

% swap -s
total: 2590132k bytes allocated + 16604k reserved = 2606736k used, 1668740k available
Post by Brandon High
Also keep in mind that /tmp is an in memory filesystem. Things written
there will use physical memory, or page out if required.
Also, could you sort the output by memory usage?
"prstat -s rss" and
"prstat -s size" will sort by resident size or total
image size.
% prstat -s size
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP

22642 jk 2430M 2029M sleep 60 0 0:00:22 19% xsltproc/1
22498 root 40M 15M sleep 59 0 0:00:01 0.0% Xorg/1
9 root 20M 1784K sleep 59 0 0:01:23 0.0% svc.configd/15
15634 postgres 19M 1116K sleep 59 0 0:01:06 0.0% postgres/1
15632 postgres 19M 1380K sleep 59 0 0:00:00 0.0% postgres/1
3036 root 13M 748K sleep 59 0 0:00:27 0.0% fmd/17
22535 root 13M 4516K sleep 59 0 0:00:00 0.0% dtgreet/1
7 root 11M 540K sleep 59 0 0:00:29 0.0% svc.startd/12
3108 root 10M 0K sleep 59 0 0:00:00 0.0% smbd/1
4192 root 10M 0K sleep 59 0 0:00:00 0.0% smbd/1
15635 postgres 9636K 952K sleep 59 0 0:00:01 0.0% postgres/1
15636 postgres 8900K 0K sleep 59 0 0:00:00 0.0% postgres/1
22500 root 8504K 1652K sleep 59 0 0:00:00 0.0% dtlogin/1
4207 smmsp 8036K 552K sleep 59 0 0:00:03 0.0% sendmail/1
4238 root 8028K 1140K sleep 59 0 0:00:48 0.0% sendmail/1
3699 root 7776K 1004K sleep 59 0 0:00:00 0.0% dtlogin/1
3307 root 7080K 1572K sleep 59 0 0:03:22 0.0% intrd/1
365 root 6784K 1976K sleep 59 0 0:00:06 0.0% hald/4
146 daemon 6568K 900K sleep 59 0 0:00:00 0.0% kcfd/3
27392 root 6300K 1796K sleep 59 0 0:03:38 0.0% ypserv/1
26803 root 5576K 2608K sleep 59 0 0:00:56 0.0% nscd/35
137 root 5572K 1052K sleep 59 0 0:00:00 0.0% syseventd/15
2859 root 5068K 1544K sleep 59 0 0:00:27 0.0% inetd/4
430 root 4712K 0K sleep 59 0 0:00:00 0.0% hald-addon-netw/1
22645 root 4508K 2856K cpu1 59 0 0:00:00 0.0% prstat/1
3038 root 4420K 0K sleep 59 0 0:00:00 0.0% sshd/1
6764 root 4108K 1176K sleep 59 0 0:00:11 0.0% syslogd/13
541 root 4032K 732K sleep 59 0 0:00:00 0.0% rmvolmgr/1

% prstat -s rss
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP

22642 jk 2430M 2088M sleep 60 0 0:00:22 0.6% xsltproc/1
22498 root 40M 15M sleep 59 0 0:00:01 0.0% Xorg/1
22535 root 13M 4520K sleep 59 0 0:00:00 0.0% dtgreet/1
26803 root 5576K 2964K sleep 59 0 0:00:56 0.0% nscd/35
22651 root 4508K 2784K cpu1 59 0 0:00:00 0.0% prstat/1
3307 root 7080K 2624K sleep 59 0 0:03:22 0.0% intrd/1
9 root 20M 2216K sleep 59 0 0:01:23 0.0% svc.configd/15
22638 root 3336K 2096K sleep 49 0 0:00:00 0.0% tcsh/1
27393 root 3112K 1980K sleep 59 0 0:00:20 0.0% rpc.nisd_resolv/1
365 root 6784K 1976K sleep 59 0 0:00:06 0.0% hald/4
22600 jk 3524K 1920K sleep 59 0 0:00:00 0.0% tcsh/1
22608 jk 3512K 1900K sleep 59 0 0:00:00 0.0% tcsh/1
22634 jk 3400K 1864K sleep 59 0 0:00:00 0.0% tcsh/1
27392 root 6300K 1820K sleep 59 0 0:03:38 0.0% ypserv/1
22500 root 8504K 1652K sleep 59 0 0:00:00 0.0% dtlogin/1
2876 root 3568K 1612K sleep 59 0 0:01:23 0.0% automountd/5
2859 root 5068K 1544K sleep 59 0 0:00:27 0.0% inetd/4
305 root 3996K 1524K sleep 59 0 0:00:44 0.0% devfsadm/7
22607 root 2632K 1476K sleep 59 0 0:00:00 0.0% in.rlogind/1
22599 root 2632K 1464K sleep 59 0 0:00:00 0.0% in.rlogind/1
15632 postgres 19M 1412K sleep 59 0 0:00:00 0.0% postgres/1
7137 root 2716K 1364K sleep 100 - 0:01:44 0.0% xntpd/1
4274 root 3844K 1324K sleep 59 0 0:00:00 0.0% mountd/5
18588 root 2536K 1312K sleep 59 0 0:00:00 0.0% hald-addon-stor/1
579 daemon 3588K 1292K sleep 59 0 0:01:27 0.0% rpcbind/1
6764 root 4108K 1176K sleep 59 0 0:00:11 0.0% syslogd/13
4238 root 8028K 1156K sleep 59 0 0:00:48 0.0% sendmail/1



Btw, when the xsltproc process starts when memory usage on the
machine looks like this, it completes in less than a minute
(this is after several xsltproc runs, zfs arc has released enough
memory so that the big process fits into free memory):

load averages: 0.00, 0.05, 0.06 16:14:03
71 processes: 70 sleeping, 1 on cpu
CPU states: 99.8% idle, 0.0% user, 0.2% kernel, 0.0% iowait, 0.0% swap
Memory: 4031M real, 2589M free, 123M swap in use, 4057M swap free

PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
22993 jk 1 59 0 3788K 1556K cpu/0 0:00 0.02% top
27392 root 1 59 0 6300K 1860K sleep 3:38 0.00% ypserv
22498 root 1 59 0 0K 0K sleep 0:01 0.00% Xorg
22535 root 1 59 0 13M 4520K sleep 0:00 0.00% dtgreet
4276 daemon 2 60 -20 2756K 720K sleep 5:35 0.00% nfsd
3307 root 1 59 0 7080K 2732K sleep 3:22 0.00% intrd
7137 root 1 100 -20 2716K 1364K sleep 1:44 0.00% xntpd
579 daemon 1 59 0 3588K 1296K sleep 1:27 0.00% rpcbind
9 root 15 59 0 20M 2340K sleep 1:23 0.00% svc.configd
2876 root 5 59 0 3568K 1836K sleep 1:23 0.00% automountd
15634 postgres 1 59 0 19M 1152K sleep 1:06 0.00% postgres
26803 root 35 59 0 5576K 3072K sleep 0:56 0.00% nscd
4238 root 1 59 0 8028K 1192K sleep 0:48 0.00% sendmail
305 root 7 59 0 3996K 1524K sleep 0:44 0.00% devfsadm
7 root 12 59 0 11M 592K sleep 0:29 0.00% svc.startd



Now all I have to do is fill the arc cache with
"find /path/to/a/zfs/filesystem -type f -exec grep does_not_exist {} +",
after that findmemory usage changes like this:

load averages: 0.12, 0.20, 0.14 16:19:44
71 processes: 70 sleeping, 1 on cpu
CPU states: 99.7% idle, 0.1% user, 0.2% kernel, 0.0% iowait, 0.0% swap
Memory: 4031M real, 669M free, 123M swap in use, 2137M swap free

PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
22993 jk 1 59 0 3788K 1556K cpu/1 0:00 0.02% top
22498 root 1 59 0 0K 0K sleep 0:01 0.00% Xorg
22535 root 1 59 0 13M 4520K sleep 0:00 0.00% dtgreet
4276 daemon 2 60 -20 2756K 720K sleep 5:35 0.00% nfsd
27392 root 1 59 0 6300K 1860K sleep 3:38 0.00% ypserv
3307 root 1 59 0 7080K 2732K sleep 3:22 0.00% intrd
7137 root 1 100 -20 2716K 1364K sleep 1:44 0.00% xntpd
579 daemon 1 59 0 3588K 1296K sleep 1:27 0.00% rpcbind
9 root 15 59 0 20M 2340K sleep 1:23 0.00% svc.configd
2876 root 5 59 0 3568K 1836K sleep 1:23 0.00% automountd
15634 postgres 1 59 0 19M 1152K sleep 1:06 0.00% postgres
26803 root 35 59 0 5576K 3072K sleep 0:56 0.00% nscd
4238 root 1 59 0 8028K 1192K sleep 0:48 0.00% sendmail
305 root 7 59 0 3996K 1524K sleep 0:44 0.00% devfsadm
7 root 12 59 0 11M 592K sleep 0:29 0.00% svc.startd


... and in this environment xsltproc needs > 15 minutes,
because it doesn't get enough RSS:

load averages: 0.30, 0.23, 0.15 16:20:31
72 processes: 71 sleeping, 1 on cpu
CPU states: % idle, % user, % kernel, % iowait, % swap
Memory: 4031M real, 32M free, 2013M swap in use, 1151M swap free

PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
23076 jk 1 60 0 1894M 1556M sleep 0:17 19.38% xsltproc
23078 jk 1 58 0 3724K 1456K cpu/1 0:00 0.09% top
22498 root 1 60 0 0K 0K sleep 0:01 0.01% Xorg
27392 root 1 60 0 6300K 1600K sleep 3:38 0.01% ypserv
15634 postgres 1 59 0 19M 1116K sleep 1:06 0.00% postgres
4238 root 1 59 0 8028K 1044K sleep 0:48 0.00% sendmail
22535 root 1 59 0 13M 1504K sleep 0:00 0.00% dtgreet
4276 daemon 2 60 -20 2756K 564K sleep 5:35 0.00% nfsd
3307 root 1 59 0 7080K 1512K sleep 3:22 0.00% intrd
7137 root 1 100 -20 2716K 1364K sleep 1:44 0.00% xntpd
579 daemon 1 59 0 3588K 1048K sleep 1:27 0.00% rpcbind
2876 root 5 59 0 3568K 1592K sleep 1:23 0.00% automountd
9 root 15 59 0 20M 624K sleep 1:23 0.00% svc.configd
26803 root 35 59 0 5576K 2096K sleep 0:56 0.00% nscd
305 root 7 59 0 3996K 1000K sleep 0:44 0.00% devfsadm


....

load averages: 0.32, 0.24, 0.16 16:21:00
72 processes: 71 sleeping, 1 on cpu
CPU states: 99.3% idle, 0.1% user, 0.6% kernel, 0.0% iowait, 0.0% swap
Memory: 4031M real, 118M free, 2549M swap in use, 952M swap free

PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
23076 jk 1 60 0 2430M 1821M sleep 0:22 8.97% xsltproc
23078 jk 1 59 0 3788K 1408K cpu/1 0:00 0.03% top
22498 root 1 59 0 0K 0K sleep 0:01 0.01% Xorg
27392 root 1 59 0 6300K 1656K sleep 3:38 0.00% ypserv
3307 root 1 59 0 7080K 2356K sleep 3:22 0.00% intrd
9 root 15 59 0 20M 1056K sleep 1:23 0.00% svc.configd
15634 postgres 1 59 0 19M 1152K sleep 1:06 0.00% postgres
22535 root 1 59 0 13M 1504K sleep 0:00 0.00% dtgreet
4276 daemon 2 60 -20 2756K 564K sleep 5:35 0.00% nfsd
7137 root 1 100 -20 2716K 1364K sleep 1:44 0.00% xntpd
579 daemon 1 59 0 3588K 1104K sleep 1:27 0.00% rpcbind
2876 root 5 59 0 3568K 1512K sleep 1:23 0.00% automountd
26803 root 35 59 0 5576K 2404K sleep 0:56 0.00% nscd
4238 root 1 59 0 8028K 1144K sleep 0:48 0.00% sendmail
305 root 7 59 0 3996K 1000K sleep 0:44 0.00% devfsadm


PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
23076 jk 2430M 1831M sleep 60 0 0:00:22 4.6% xsltproc/1
22498 root 40M 8948K sleep 59 0 0:00:01 0.0% Xorg/1
23080 jk 4556K 2956K cpu0 59 0 0:00:00 0.0% prstat/1
26803 root 5576K 2404K sleep 59 0 0:00:56 0.0% nscd/35
3307 root 7080K 2356K sleep 59 0 0:03:22 0.0% intrd/1
22634 jk 3400K 2064K sleep 49 0 0:00:00 0.0% tcsh/1
27393 root 3112K 1732K sleep 59 0 0:00:20 0.0% rpc.nisd_resolv/1
27392 root 6300K 1656K sleep 59 0 0:03:38 0.0% ypserv/1
2876 root 3568K 1512K sleep 59 0 0:01:23 0.0% automountd/5
22535 root 13M 1508K sleep 59 0 0:00:00 0.0% dtgreet/1
15632 postgres 19M 1416K sleep 59 0 0:00:00 0.0% postgres/1
7137 root 2716K 1364K sleep 100 - 0:01:44 0.0% xntpd/1
22500 root 8504K 1332K sleep 59 0 0:00:00 0.0% dtlogin/1
365 root 6784K 1332K sleep 59 0 0:00:06 0.0% hald/4
2859 root 5068K 1308K sleep 59 0 0:00:27 0.0% inetd/4
4238 root 8028K 1156K sleep 59 0 0:00:48 0.0% sendmail/1
15634 postgres 19M 1152K sleep 59 0 0:01:06 0.0% postgres/1
18588 root 2536K 1140K sleep 59 0 0:00:00 0.0% hald-addon-stor/1
3449 root 2536K 1124K sleep 59 0 0:00:00 0.0% hald-addon-stor/1
579 daemon 3588K 1104K sleep 59 0 0:01:27 0.0% rpcbind/1
4274 root 3844K 1100K sleep 59 0 0:00:00 0.0% mountd/5
6764 root 4108K 1072K sleep 59 0 0:00:11 0.0% syslogd/13
9 root 20M 1056K sleep 59 0 0:01:23 0.0% svc.configd/15
305 root 3996K 1000K sleep 59 0 0:00:44 0.0% devfsadm/7
22632 jk 1512K 980K sleep 59 0 0:00:00 0.0% script/1
15635 postgres 9636K 960K sleep 59 0 0:00:01 0.0% postgres/1
146 daemon 6568K 900K sleep 59 0 0:00:00 0.0% kcfd/3
4531 daemon 2664K 896K sleep 60 -20 0:00:00 0.0% nfs4cbd/2
22633 jk 1528K 892K sleep 59 0 0:00:00 0.0% script/1
601 root 2840K 748K sleep 59 0 0:00:00 0.0% keyserv/3
137 root 5572K 748K sleep 59 0 0:00:00 0.0% syseventd/15
2832 root 3040K 612K sleep 59 0 0:00:00 0.0% cron/1
7 root 11M 588K sleep 59 0 0:00:29 0.0% svc.startd/12
Total: 72 processes, 212 lwps, load averages: 0.25, 0.23, 0.15


This message posted from opensolaris.org
_______________________________________________
opensolaris-
Jason J. W. Williams
2008-04-13 01:12:44 UTC
Permalink
Hi Jurgen,

On our Thumpers running MySQL, we limit the ARC to 4GB. On systems
with less RAM we limit the ARC to 1GB. What kind of disk is backing
the pools? Might be some kind of array cache flushing issues going on.
Also, you might try putting your InnoDB log files on a UFS partition
and see if the performance issues go away.

Best Regards,
Jason
Post by Jürgen Keil
Post by Jürgen Keil
And another case was an attempt to use "xsltproc(1)" on a
big xml file, this time on an amd64 x2 machine with 4GB of
memory, using zfs, and the xsltproc process had grown to
use > 2GB of memory. Again heavy disk trashing, and I
didn't had the impression that the arc cache did
shrink enough to prevent that thrashing.
load averages: 0.01, 0.02, 0.05 21:11:47
75 processes: 74 sleeping, 1 on cpu
CPU states: 99.4% idle, 0.1% user, 0.5% kernel, 0.0% iowait, 0.0% swap
Memory: 4031M real, 108M free, 2693M swap in use, 1310M swap free
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
7587 jk 1 60 0 2430M 2306M sleep 0:25 0.24% xsltproc
7588 jk 1 59 0 3916K 1472K cpu/0 0:00 0.02% top
44 root 1 59 0 0K 0K sleep 0:10 0.01% Xorg
7613 jk 1 59 0 2912K 1792K sleep 0:00 0.01% iostat
15634 postgres 1 59 0 19M 1152K sleep 0:53 0.00% postgres
80 root 1 59 0 13M 1644K sleep 0:07 0.00% dtgreet
4276 daemon 2 60 -20 2756K 676K sleep 3:48 0.00% nfsd
27392 root 1 59 0 6300K 1680K sleep 3:17 0.00% ypserv
3307 root 1 59 0 7080K 2716K sleep 3:13 0.00% intrd
7137 root 1 100 -20 2716K 1364K sleep 1:39 0.00% xntpd
2876 root 5 59 0 3568K 1780K sleep 1:21 0.00% automountd
579 daemon 1 59 0 3588K 1384K sleep 1:20 0.00% rpcbind
9 root 15 59 0 20M 1316K sleep 1:20 0.00% svc.configd
26803 root 35 59 0 5576K 2944K sleep 0:52 0.00% nscd
305 root 7 59 0 3996K 1000K sleep 0:42 0.00% devfsadm
The resident set size of xsltproc is *slowly* increasing.
truss on the xsltproc shows no system calls.
% truss -p 7587
^C
% truss -m all -p 7587
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F19A024
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F19B018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F19D004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F19E00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1A1008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1A3004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1A4024
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1AC00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1AD018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1AE014
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1AF018
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1B3008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1C0020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1C101C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1CD018
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1CE010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1D400C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1DB17C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1DE020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F1E1018
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1E2010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1E3004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1E5008
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1EA010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1F1004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1F3008
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1F6010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1F800C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1F9014
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1FA01C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F1FC010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1FD004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F1FE00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F201008
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F206010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F20800C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F20A010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F20B004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F20C00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F20D018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F20F004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F212014
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F21600C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F21D004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F22501C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F227008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F229004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F22A00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F22C010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F22D004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F236008
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F23B008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F23D004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F240010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F246000
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F249004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F24B018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F251018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F253000
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F256010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F25800C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F25A020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F25B018
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F25F008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F26200C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F266020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F267004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F269008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F26B004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F26F008
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F271008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F272020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F27E00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F27F008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F280020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F283018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F28C00C
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F28F008
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F290014
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F292010
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F295018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F296020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F297004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F2A0020
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2A200C
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2A3054
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2AB004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2AD004
Incurred fault #11, FLTPAGE %pc = 0xFEE68B9A addr = 0x0F2B3018
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2B5004
Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F2B8000
Incurred fault #11, FLTPAGE %pc = 0xFEE68B92 addr = 0x0F2BF004
^C Incurred fault #11, FLTPAGE %pc = 0xFEE68BA8 addr = 0x0F2E5008
% iostat -xnzc 5
cpu
us sy wt id
1 4 0 95
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
5.3 1.2 67.4 22.2 0.1 0.0 18.6 2.9 0 1 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 9.5 0 0 c6t0d0
0.0 0.0 0.1 0.1 0.0 0.0 0.0 9.3 0 0 c7t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 11.7 0 0 c8t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.7 0 0 leo:/files/jk
cpu
us sy wt id
0 0 0 100
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
144.6 0.2 578.4 0.9 0.0 1.0 0.0 6.9 0 100 c4d0
cpu
us sy wt id
0 0 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
149.0 0.0 596.0 0.0 0.0 1.0 0.0 6.7 0 100 c4d0
cpu
us sy wt id
0 0 0 100
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
143.6 0.0 574.4 0.0 0.0 1.0 0.0 7.0 0 100 c4d0
cpu
us sy wt id
0 0 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
149.4 0.2 597.6 1.3 0.0 1.0 0.0 6.7 0 100 c4d0
cpu
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
148.4 0.0 593.6 0.0 0.0 1.0 0.0 6.7 0 100 c4d0
cpu
Post by Jürgen Keil
::arc
hits = 25265436
misses = 1907270
demand_data_hits = 19051747
demand_data_misses = 983231
demand_metadata_hits = 5766668
demand_metadata_misses = 99264
prefetch_data_hits = 219895
prefetch_data_misses = 812873
prefetch_metadata_hits = 227126
prefetch_metadata_misses = 11902
mru_hits = 12030281
mru_ghost_hits = 55142
mfu_hits = 12896265
mfu_ghost_hits = 143230
deleted = 1997843
recycle_miss = 115508
mutex_miss = 1716
evict_skip = 8146750
hash_elements = 59206
hash_elements_max = 132131
hash_collisions = 2110982
hash_chains = 15058
hash_chain_max = 11
p = 1155 MB
c = 1356 MB
c_min = 377 MB
c_max = 3017 MB
size = 1356 MB
hdr_size = 9954840
l2_hits = 0
l2_misses = 0
l2_feeds = 0
l2_rw_clash = 0
l2_writes_sent = 0
l2_writes_done = 0
l2_writes_error = 0
l2_writes_hdr_miss = 0
l2_evict_lock_retry = 0
l2_evict_reading = 0
l2_free_on_write = 0
l2_abort_lowmem = 0
l2_cksum_bad = 0
l2_io_error = 0
l2_size = 0
l2_hdr_size = 0
arc_no_grow = 1
arc_tempreserve = 0 MB
arc_meta_used = 448 MB
arc_meta_limit = 754 MB
arc_meta_max = 464 MB
Post by Jürgen Keil
::arc
hits = 25265436
misses = 1907270
demand_data_hits = 19051747
demand_data_misses = 983231
demand_metadata_hits = 5766668
demand_metadata_misses = 99264
prefetch_data_hits = 219895
prefetch_data_misses = 812873
prefetch_metadata_hits = 227126
prefetch_metadata_misses = 11902
mru_hits = 12030281
mru_ghost_hits = 55142
mfu_hits = 12896265
mfu_ghost_hits = 143230
deleted = 1997843
recycle_miss = 115508
mutex_miss = 1716
evict_skip = 8146750
hash_elements = 59206
hash_elements_max = 132131
hash_collisions = 2110982
hash_chains = 15058
hash_chain_max = 11
p = 1155 MB
c = 1356 MB
c_min = 377 MB
c_max = 3017 MB
size = 1356 MB
hdr_size = 9954840
l2_hits = 0
l2_misses = 0
l2_feeds = 0
l2_rw_clash = 0
l2_writes_sent = 0
l2_writes_done = 0
l2_writes_error = 0
l2_writes_hdr_miss = 0
l2_evict_lock_retry = 0
l2_evict_reading = 0
l2_free_on_write = 0
l2_abort_lowmem = 0
l2_cksum_bad = 0
l2_io_error = 0
l2_size = 0
l2_hdr_size = 0
arc_no_grow = 1
arc_tempreserve = 0 MB
arc_meta_used = 448 MB
arc_meta_limit = 754 MB
arc_meta_max = 464 MB
This message posted from opensolaris.org
_______________________________________________
opensolaris-discuss mailing list
Jürgen Keil
2008-04-14 14:42:19 UTC
Permalink
Jason,
Post by Jason J. W. Williams
Hi Jurgen,
On our Thumpers running MySQL, we limit the ARC to 4GB.
On systems with less RAM we limit the ARC to 1GB.
Ok, on a machine running a database server (and nothing else),
that might make sense.

On a general purpose box it doesn't make much sense;
I'd have to reboot the box in various ARC configurations
depending on the workload / memory requirements.
Post by Jason J. W. Williams
What kind of disk is backing the pools?
I'm my case (with running a single big memory
hungy process and the ARC cache filled with lots
of cached data) it's a simple zpool on one
100GB ATA HDD.
Post by Jason J. W. Williams
Might be some kind of array cache flushing
issues going on.
Not in my case.


This message posted from opensolaris.org
Brandon High
2008-04-14 21:12:29 UTC
Permalink
Post by Jürgen Keil
And another case was an attempt to use "xsltproc(1)" on a
big xml file, this time on an amd64 x2 machine with 4GB of
memory, using zfs, and the xsltproc process had grown to
use > 2GB of memory. Again heavy disk trashing, and I
didn't had the impression that the arc cache did shrink enough
to prevent that thrashing.
I did a quick check, and found some RFEs and associated bugs that look
related to the problem that you're seeing:

* ZFS should avoiding growing the ARC into trouble
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6488341

* ZFS/ARC should cleanup more after itself
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424665

It sounds like the ARC is shrinking (since after multiple runs it's
faster) but it's not reacting quickly enough to the increase in memory
use.

-B
--
Brandon High bhigh-UUz7JMH5g5LQT0dZR+***@public.gmane.org
"The good is the enemy of the best." - Nietzsche
Loading...