commit 0d0eef55e03a76885b5d665b1f5572e1f4975886 Author: Greg Kroah-Hartman Date: Sun Jul 29 08:04:57 2012 -0700 Linux 3.4.7 commit 5d0c7b47b27d51c45700b1153ac86752ff743aa6 Author: Jeff Layton Date: Wed Jul 11 09:09:36 2012 -0400 cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps commit 3cf003c08be785af4bee9ac05891a15bcbff856a upstream. [The async read code was broadened to include uncached reads in 3.5, so the mainline patch did not apply directly. This patch is just a backport to account for that change.] Jian found that when he ran fsx on a 32 bit arch with a large wsize the process and one of the bdi writeback kthreads would sometimes deadlock with a stack trace like this: crash> bt PID: 2789 TASK: f02edaa0 CPU: 3 COMMAND: "fsx" #0 [eed63cbc] schedule at c083c5b3 #1 [eed63d80] kmap_high at c0500ec8 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs] #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs] #4 [eed63e50] do_writepages at c04f3e32 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a #6 [eed63ea4] filemap_fdatawrite at c04e1b3e #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs] #8 [eed63ecc] do_sync_write at c052d202 #9 [eed63f74] vfs_write at c052d4ee #10 [eed63f94] sys_write at c052df4c #11 [eed63fb0] ia32_sysenter_target at c0409a98 EAX: 00000004 EBX: 00000003 ECX: abd73b73 EDX: 012a65c6 DS: 007b ESI: 012a65c6 ES: 007b EDI: 00000000 SS: 007b ESP: bf8db178 EBP: bf8db1f8 GS: 0033 CS: 0073 EIP: 40000424 ERR: 00000004 EFLAGS: 00000246 Each task would kmap part of its address array before getting stuck, but not enough to actually issue the write. This patch fixes this by serializing the marshal_iov operations for async reads and writes. The idea here is to ensure that cifs aggressively tries to populate a request before attempting to fulfill another one. As soon as all of the pages are kmapped for a request, then we can unlock and allow another one to proceed. There's no need to do this serialization on non-CONFIG_HIGHMEM arches however, so optimize all of this out when CONFIG_HIGHMEM isn't set. Reported-by: Jian Li Signed-off-by: Jeff Layton Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit a22ad130fc91983ac717a24e29838bf2969a67d4 Author: Tushar Behera Date: Thu Jul 12 18:06:28 2012 +0900 ARM: SAMSUNG: Update default rate for xusbxti clock commit bdd3cc26ba651e33780ade33f1410320cf2d0cf4 upstream. The rate of xusbxti clock is set in individual machine files. The default value should be defined at the clock definition and individual machine files should modify it if required. Division by zero in kernel. [] (unwind_backtrace+0x1/0x9c) from [] (Ldiv0+0x9/0x12) [] (Ldiv0+0x9/0x12) from [] (s3c_setrate_clksrc+0x33/0x78) [] (s3c_setrate_clksrc+0x33/0x78) from [] (clk_set_rate+0x2f/0x78) Signed-off-by: Tushar Behera Signed-off-by: Kukjin Kim Signed-off-by: Greg Kroah-Hartman commit 5a4db9ee4f44658077a11c71d78a17573016fc0e Author: Mikulas Patocka Date: Fri Jul 20 14:25:07 2012 +0100 dm raid1: set discard_zeroes_data_unsupported commit 7c8d3a42fe1c58a7e8fd3f6a013e7d7b474ff931 upstream. We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if the underlying disks support zero on discard. So this patch sets ti->discard_zeroes_data_unsupported. For example, if the mirror is in the process of resynchronizing, it may happen that kcopyd reads a piece of data, then discard is sent on the same area and then kcopyd writes the piece of data to another leg. Consequently, the data is not zeroed. The flag was made available by commit 983c7db347db8ce2d8453fd1d89b7a4bb6920d56 (dm crypt: always disable discard_zeroes_data). Signed-off-by: Mikulas Patocka Signed-off-by: Alasdair G Kergon Signed-off-by: Greg Kroah-Hartman commit 91aafba4414743c24c6d06ccca75113e4abd13a8 Author: Mikulas Patocka Date: Fri Jul 20 14:25:03 2012 +0100 dm raid1: fix crash with mirror recovery and discard commit 751f188dd5ab95b3f2b5f2f467c38aae5a2877eb upstream. This patch fixes a crash when a discard request is sent during mirror recovery. Firstly, some background. Generally, the following sequence happens during mirror synchronization: - function do_recovery is called - do_recovery calls dm_rh_recovery_prepare - dm_rh_recovery_prepare uses a semaphore to limit the number simultaneously recovered regions (by default the semaphore value is 1, so only one region at a time is recovered) - dm_rh_recovery_prepare calls __rh_recovery_prepare, __rh_recovery_prepare asks the log driver for the next region to recover. Then, it sets the region state to DM_RH_RECOVERING. If there are no pending I/Os on this region, the region is added to quiesced_regions list. If there are pending I/Os, the region is not added to any list. It is added to the quiesced_regions list later (by dm_rh_dec function) when all I/Os finish. - when the region is on quiesced_regions list, there are no I/Os in flight on this region. The region is popped from the list in dm_rh_recovery_start function. Then, a kcopyd job is started in the recover function. - when the kcopyd job finishes, recovery_complete is called. It calls dm_rh_recovery_end. dm_rh_recovery_end adds the region to recovered_regions or failed_recovered_regions list (depending on whether the copy operation was successful or not). The above mechanism assumes that if the region is in DM_RH_RECOVERING state, no new I/Os are started on this region. When I/O is started, dm_rh_inc_pending is called, which increases reg->pending count. When I/O is finished, dm_rh_dec is called. It decreases reg->pending count. If the count is zero and the region was in DM_RH_RECOVERING state, dm_rh_dec adds it to the quiesced_regions list. Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is in DM_RH_RECOVERING state, it could be added to quiesced_regions list multiple times or it could be added to this list when kcopyd is copying data (it is assumed that the region is not on any list while kcopyd does its jobs). This results in memory corruption and crash. There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests do not belong to any region, so they are always added to the sync list in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH requests. These bypasses avoid the crash possibility described above. These bypasses were improperly implemented for REQ_DISCARD when the mirror target gained discard support in commit 5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4 (dm raid1: support discard). In do_writes, REQ_DISCARD requests is always added to the sync queue and immediately dispatched (even if the region is in DM_RH_RECOVERING). However, dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts. So it violates the rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list corruption described above. This patch changes it so that REQ_DISCARD requests follow the same path as REQ_FLUSH. This avoids the crash. Reference: https://bugzilla.redhat.com/837607 Signed-off-by: Mikulas Patocka Signed-off-by: Alasdair G Kergon Signed-off-by: Greg Kroah-Hartman commit 5b8bbc39d5678179f2fd4ee2e09005d8f277834c Author: Mikulas Patocka Date: Fri Jul 20 14:25:05 2012 +0100 dm thin: do not send discards to shared blocks commit 650d2a06b4fe1cc1d218c20e256650f68bf0ca31 upstream. When process_discard receives a partial discard that doesn't cover a full block, it sends this discard down to that block. Unfortunately, the block can be shared and the discard would corrupt the other snapshots sharing this block. This patch detects block sharing and ends the discard with success when sending it to the shared block. The above change means that if the device supports discard it can't be guaranteed that a discard request zeroes data. Therefore, we set ti->discard_zeroes_data_unsupported. Thin target discard support with this bug arrived in commit 104655fd4dcebd50068ef30253a001da72e3a081 (dm thin: support discards). Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Alasdair G Kergon Signed-off-by: Greg Kroah-Hartman commit 08603bdd6b0b65248921c8be05febe574dd78905 Author: Boaz Harrosh Date: Fri Jun 8 05:29:40 2012 +0300 pnfs-obj: don't leak objio_state if ore_write/read fails commit 9909d45a8557455ca5f8ee7af0f253debc851f1a upstream. [Bug since 3.2 Kernel] Signed-off-by: Boaz Harrosh Signed-off-by: Greg Kroah-Hartman commit f6ecbea43e774dfc0b3678d2cdda9d4b43cfecf8 Author: Boaz Harrosh Date: Fri Jun 8 04:30:40 2012 +0300 ore: Remove support of partial IO request (NFS crash) commit 62b62ad873f2accad9222a4d7ffbe1e93f6714c1 upstream. Do to OOM situations the ore might fail to allocate all resources needed for IO of the full request. If some progress was possible it would proceed with a partial/short request, for the sake of forward progress. Since this crashes NFS-core and exofs is just fine without it just remove this contraption, and fail. TODO: Support real forward progress with some reserved allocations of resources, such as mem pools and/or bio_sets [Bug since 3.2 Kernel] CC: Benny Halevy Signed-off-by: Boaz Harrosh Signed-off-by: Greg Kroah-Hartman commit 0234af60fb13cbb7caa6f757e4d8e29cd87aaba6 Author: Boaz Harrosh Date: Fri Jun 8 01:19:07 2012 +0300 ore: Fix NFS crash by supporting any unaligned RAID IO commit 9ff19309a9623f2963ac5a136782ea4d8b5d67fb upstream. In RAID_5/6 We used to not permit an IO that it's end byte is not stripe_size aligned and spans more than one stripe. .i.e the caller must check if after submission the actual transferred bytes is shorter, and would need to resubmit a new IO with the remainder. Exofs supports this, and NFS was supposed to support this as well with it's short write mechanism. But late testing has exposed a CRASH when this is used with none-RPC layout-drivers. The change at NFS is deep and risky, in it's place the fix at ORE to lift the limitation is actually clean and simple. So here it is below. The principal here is that in the case of unaligned IO on both ends, beginning and end, we will send two read requests one like old code, before the calculation of the first stripe, and also a new site, before the calculation of the last stripe. If any "boundary" is aligned or the complete IO is within a single stripe. we do a single read like before. The code is clean and simple by splitting the old _read_4_write into 3 even parts: 1._read_4_write_first_stripe 2. _read_4_write_last_stripe 3. _read_4_write_execute And calling 1+3 at the same place as before. 2+3 before last stripe, and in the case of all in a single stripe then 1+2+3 is preformed additively. Why did I not think of it before. Well I had a strike of genius because I have stared at this code for 2 years, and did not find this simple solution, til today. Not that I did not try. This solution is much better for NFS than the previous supposedly solution because the short write was dealt with out-of-band after IO_done, which would cause for a seeky IO pattern where as in here we execute in order. At both solutions we do 2 separate reads, only here we do it within a single IO request. (And actually combine two writes into a single submission) NFS/exofs code need not change since the ORE API communicates the new shorter length on return, what will happen is that this case would not occur anymore. hurray!! [Stable this is an NFS bug since 3.2 Kernel should apply cleanly] Signed-off-by: Boaz Harrosh Signed-off-by: Greg Kroah-Hartman commit 863c806617059e412ade0b1bbb6d215106be14c1 Author: Artem Bityutskiy Date: Sat Jul 14 14:33:09 2012 +0300 UBIFS: fix a bug in empty space fix-up commit c6727932cfdb13501108b16c38463c09d5ec7a74 upstream. UBIFS has a feature called "empty space fix-up" which is a quirk to work-around limitations of dumb flasher programs. Namely, of those flashers that are unable to skip NAND pages full of 0xFFs while flashing, resulting in empty space at the end of half-filled eraseblocks to be unusable for UBIFS. This feature is relatively new (introduced in v3.0). The fix-up routine (fixup_free_space()) is executed only once at the very first mount if the superblock has the 'space_fixup' flag set (can be done with -F option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and writes it back to the same LEB. The routine assumes the image is pristine and does not have anything in the journal. There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly. All but one LEB of the log of a pristine file-system are empty. And one contains just a commit start node. And 'fixup_free_space()' just unmapped this LEB, which resulted in wiping the commit start node. As a result, some users were unable to mount the file-system next time with the following symptom: UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0 The root-cause of this bug was that 'fixup_free_space()' wrongly assumed that the beginning of empty space in the log head (c->lhead_offs) was known on mount. However, it is not the case - it was always 0. UBIFS does not store in it the master node and finds out by scanning the log on every mount. The fix is simple - just pass commit start node size instead of 0 to 'fixup_leb()'. Signed-off-by: Artem Bityutskiy Reported-by: Iwo Mergler Tested-by: Iwo Mergler Reported-by: James Nute Signed-off-by: Greg Kroah-Hartman commit ec8348902ff82ae44f25b3127f8f3ec47e1e6a07 Author: David Daney Date: Thu Jul 19 09:11:14 2012 +0200 MIPS: Properly align the .data..init_task section. commit 7b1c0d26a8e272787f0f9fcc5f3e8531df3b3409 upstream. Improper alignment can lead to unbootable systems and/or random crashes. [ralf@linux-mips.org: This is a lond standing bug since 6eb10bc9e2deab06630261cd05c4cb1e9a60e980 (kernel.org) rsp. c422a10917f75fd19fa7fe070aaaa23e384dae6f (lmo) [MIPS: Clean up linker script using new linker script macros.] so dates back to 2.6.32.] Signed-off-by: David Daney Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/3881/ Signed-off-by: Ralf Baechle Signed-off-by: Greg Kroah-Hartman commit 60d091aeb66c0b201689862bcb803c32db712d05 Author: Jiri Kosina Date: Fri Apr 20 12:15:44 2012 +0200 HID: multitouch: Add support for Baanto touchscreen commit 9ed326951806c424b42dcf2e1125e25a98fb13d1 upstream. Reported-by: Tvrtko Ursulin Tested-by: Tvrtko Ursulin Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit eb6cf92d839f43dbc2009a2440479f46f7adfd46 Author: Frank Kunz Date: Thu Jul 5 22:32:49 2012 +0200 HID: add Sennheiser BTD500USB device support commit 0e050923a797c1fc46ccc1e5182fd3090f33a75d upstream. The Sennheiser BTD500USB composit device requires the HID_QUIRK_NOGET flag to be set for working proper. Without the flag the device crashes during hid intialization. Signed-off-by: Frank Kunz Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit e58cd46e372812120c547f1b311f7a1a00c22af4 Author: Daniel Nicoletti Date: Wed Jul 4 10:20:31 2012 -0300 HID: add battery quirk for Apple Wireless ANSI commit 0c47935c5b5cd4916cf1c1ed4a2894807f7bcc3e upstream. Add USB_DEVICE_ID_APPLE_ALU_WIRELESS_ANSI, to the quirk list since it report wrong feature type and wrong percentage range. Signed-off-by: Daniel Nicoletti Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman commit 07aa70120e4be526a468b9c3ea93b45cfbe95013 Author: Aaditya Kumar Date: Tue Jul 17 15:48:07 2012 -0700 mm: fix lost kswapd wakeup in kswapd_stop() commit 1c7e7f6c0703d03af6bcd5ccc11fc15d23e5ecbe upstream. Offlining memory may block forever, waiting for kswapd() to wake up because kswapd() does not check the event kthread->should_stop before sleeping. The proper pattern, from Documentation/memory-barriers.txt, is: --- waker --- event_indicated = 1; wake_up_process(event_daemon); --- sleeper --- for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (event_indicated) break; schedule(); } set_current_state() may be wrapped by: prepare_to_wait(); In the kswapd() case, event_indicated is kthread->should_stop. === offlining memory (waker) === kswapd_stop() kthread_stop() kthread->should_stop = 1 wake_up_process() wait_for_completion() === kswapd_try_to_sleep (sleeper) === kswapd_try_to_sleep() prepare_to_wait() . . schedule() . . finish_wait() The schedule() needs to be protected by a test of kthread->should_stop, which is wrapped by kthread_should_stop(). Reproducer: Do heavy file I/O in background. Do a memory offline/online in a tight loop Signed-off-by: Aaditya Kumar Acked-by: KOSAKI Motohiro Reviewed-by: Minchan Kim Acked-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 2dbbb550c56cbaf9d8353f4546aab9a88786d279 Author: Al Viro Date: Wed Jul 18 09:31:36 2012 +0100 ext4: fix duplicated mnt_drop_write call in EXT4_IOC_MOVE_EXT commit 331ae4962b975246944ea039697a8f1cadce42bb upstream. Caused, AFAICS, by mismerge in commit ff9cb1c4eead ("Merge branch 'for_linus' into for_linus_merged") Signed-off-by: Al Viro Cc: Theodore Ts'o Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 8db5153f9dca89afbeebe0b077e7cc1e5827e07a Author: Mark Rustad Date: Fri Jul 13 18:18:04 2012 -0700 tcm_fc: Fix crash seen with aborts and large reads commit 3cc5d2a6b9a2fd1bf024aa5e52dd22961eecaf13 upstream. This patch fixes a crash seen when large reads have their exchange aborted by either timing out or being reset. Because the exchange abort results in the seq pointer being set to NULL, because the sequence is no longer valid, it must not be dereferenced. This patch changes the function ft_get_task_tag to return ~0 if it is unable to get the tag for this reason. Because the get_task_tag interface provides no means of returning an error, this seems like the best way to fix this issue at the moment. Signed-off-by: Mark Rustad Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman commit fd25080998d00a94a87bf7fc9f843291db7250a6 Author: John Stultz Date: Fri Jul 13 01:21:50 2012 -0400 ntp: Fix STA_INS/DEL clearing bug commit 6b1859dba01c7d512b72d77e3fd7da8354235189 upstream. In commit 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d, I introduced a bug that kept the STA_INS or STA_DEL bit from being cleared from time_status via adjtimex() without forcing STA_PLL first. Usually once the STA_INS is set, it isn't cleared until the leap second is applied, so its unlikely this affected anyone. However during testing I noticed it took some effort to cancel a leap second once STA_INS was set. Signed-off-by: John Stultz Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Richard Cochran Cc: Prarit Bhargava Link: http://lkml.kernel.org/r/1342156917-25092-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 5c8c9e57270fc5f54aa02144f7280f91ee2e3334 Author: Roland Dreier Date: Mon Jul 16 17:10:17 2012 -0700 target: Fix range calculation in WRITE SAME emulation when num blocks == 0 commit 1765fe5edcb83f53fc67edeb559fcf4bc82c6460 upstream. When NUMBER OF LOGICAL BLOCKS is 0, WRITE SAME is supposed to write all the blocks from the specified LBA through the end of the device. However, dev->transport->get_blocks(dev) (perhaps confusingly) returns the last valid LBA rather than the number of blocks, so the correct number of blocks to write starting with lba is dev->transport->get_blocks(dev) - lba + 1 (nab: Backport roland's for-3.6 patch to for-3.5) Signed-off-by: Roland Dreier Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman commit 815faab2c4b7deb2cc6495186307ceb0983214a9 Author: Roland Dreier Date: Mon Jul 16 15:17:10 2012 -0700 target: Clean up returning errors in PR handling code commit d35212f3ca3bf4fb49d15e37f530c9931e2d2183 upstream. - instead of (PTR_ERR(file) < 0) just use IS_ERR(file) - return -EINVAL instead of EINVAL - all other error returns in target_scsi3_emulate_pr_out() use "goto out" -- get rid of the one remaining straight "return." Signed-off-by: Roland Dreier Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman commit c82cd13737cfab6cb9d1b31e624896e627f9c9e4 Author: Jeff Layton Date: Wed Jul 11 09:09:35 2012 -0400 cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space commit 3ae629d98bd5ed77585a878566f04f310adbc591 upstream. We currently rely on being able to kmap all of the pages in an async read or write request. If you're on a machine that has CONFIG_HIGHMEM set then that kmap space is limited, sometimes to as low as 512 slots. With 512 slots, we can only support up to a 2M r/wsize, and that's assuming that we can get our greedy little hands on all of them. There are other users however, so it's possible we'll end up stuck with a size that large. Since we can't handle a rsize or wsize larger than that currently, cap those options at the number of kmap slots we have. We could consider capping it even lower, but we currently default to a max of 1M. Might as well allow those luddites on 32 bit arches enough rope to hang themselves. A more robust fix would be to teach the send and receive routines how to contend with an array of pages so we don't need to marshal up a kvec array at all. That's a fairly significant overhaul though, so we'll need this limit in place until that's ready. Reported-by: Jian Li Signed-off-by: Jeff Layton Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit f952e137849253c0584a60cc2d73a220d9a091f8 Author: Jeff Layton Date: Fri Jul 6 07:09:42 2012 -0400 cifs: always update the inode cache with the results from a FIND_* commit cd60042cc1392e79410dc8de9e9c1abb38a29e57 upstream. When we get back a FIND_FIRST/NEXT result, we have some info about the dentry that we use to instantiate a new inode. We were ignoring and discarding that info when we had an existing dentry in the cache. Fix this by updating the inode in place when we find an existing dentry and the uniqueid is the same. Reported-and-Tested-by: Andrew Bartlett Reported-by: Bill Robertson Reported-by: Dion Edwards Signed-off-by: Jeff Layton Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit d8ae4bb4a1c12f9cfb373f215e624a9f1fca0767 Author: NeilBrown Date: Thu Jul 19 15:59:18 2012 +1000 md/raid1: close some possible races on write errors during resync commit 58e94ae18478c08229626daece2fc108a4a23261 upstream. commit 4367af556133723d0f443e14ca8170d9447317cb md/raid1: clear bad-block record when write succeeds. Added a 'reschedule_retry' call possibility at the end of end_sync_write, but didn't add matching code at the end of sync_request_write. So if the writes complete very quickly, or scheduling makes it seem that way, then we can miss rescheduling the request and the resync could hang. Also commit 73d5c38a9536142e062c35997b044e89166e063b md: avoid races when stopping resync. Fix a race condition in this same code in end_sync_write but didn't make the change in sync_request_write. This patch updates sync_request_write to fix both of those. Patch is suitable for 3.1 and later kernels. Reported-by: Alexander Lyakas Original-version-by: Alexander Lyakas Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman commit 2ac8a0f58a8782ff8024404a579eee260e8b2010 Author: NeilBrown Date: Thu Jul 19 15:59:18 2012 +1000 md: avoid crash when stopping md array races with closing other open fds. commit a05b7ea03d72f36edb0cec05e8893803335c61a0 upstream. md will refuse to stop an array if any other fd (or mounted fs) is using it. When any fs is unmounted of when the last open fd is closed all pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put) so there will be no pending IO to worry about when the array is stopped. However in order to send the STOP_ARRAY ioctl to stop the array one must first get and open fd on the block device. If some fd is being used to write to the block device and it is closed after mdadm open the block device, but before mdadm issues the STOP_ARRAY ioctl, then there will be no last-close on the md device so __blkdev_put will not call sync_blockdev. If this happens, then IO can still be in-flight while md tears down the array and bad things can happen (use-after-free and subsequent havoc). So in the case where do_md_stop is being called from an open file descriptor, call sync_block after taking the mutex to ensure there will be no new openers. This is needed when setting a read-write device to read-only too. Reported-by: majianpeng Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman