+++ b/.mailmap @@ -74,6 +74,7 @@ Domen Puncer Douglas Gilbert Ed L. Cashin +Erik Kaneda Evgeniy Polyakov Felipe W Damasio Felix Kuhling @@ -138,6 +139,7 @@ Juha Yrjola Juha Yrjola Julien Thierry +Kamil Konieczny Kay Sievers Kenneth W Chen Konstantin Khlebnikov @@ -209,6 +211,10 @@ Patrick Mochel Paul Burton Paul Burton +Paul E. McKenney +Paul E. McKenney +Paul E. McKenney +Paul E. McKenney Peter A Jonsson Peter Oruba Peter Oruba @@ -217,6 +223,7 @@ Punit Agrawal Qais Yousef Quentin Perret +Rafael J. Wysocki Rajesh Shah Ralf Baechle Ralf Wildenhues --- a/CREDITS +++ b/CREDITS @@ -3302,7 +3302,9 @@ N: Aleksa Sarai E: cyphar@cyphar.com W: https://www.cyphar.com/ -D: `pids` cgroup subsystem +D: /sys/fs/cgroup/pids +D: openat2(2) +S: Sydney, Australia N: Dipankar Sarma E: dipankar@in.ibm.com --- a/Documentation/ABI/obsolete/sysfs-selinux-disable +++ b/Documentation/ABI/obsolete/sysfs-selinux-disable @@ -0,0 +1,26 @@ +What: /sys/fs/selinux/disable +Date: April 2005 (predates git) +KernelVersion: 2.6.12-rc2 (predates git) +Contact: selinux@vger.kernel.org +Description: + + The selinuxfs "disable" node allows SELinux to be disabled at runtime + prior to a policy being loaded into the kernel. If disabled via this + mechanism, SELinux will remain disabled until the system is rebooted. + + The preferred method of disabling SELinux is via the "selinux=0" boot + parameter, but the selinuxfs "disable" node was created to make it + easier for systems with primitive bootloaders that did not allow for + easy modification of the kernel command line. Unfortunately, allowing + for SELinux to be disabled at runtime makes it difficult to secure the + kernel's LSM hooks using the "__ro_after_init" feature. + + Thankfully, the need for the SELinux runtime disable appears to be + gone, the default Kconfig configuration disables this selinuxfs node, + and only one of the major distributions, Fedora, supports disabling + SELinux at runtime. Fedora is in the process of removing the + selinuxfs "disable" node and once that is complete we will start the + slow process of removing this code from the kernel. + + More information on /sys/fs/selinux/disable can be found under the + CONFIG_SECURITY_SELINUX_DISABLE Kconfig option. --- a/Documentation/ABI/stable/sysfs-class-tpm +++ b/Documentation/ABI/stable/sysfs-class-tpm @@ -1,7 +1,7 @@ What: /sys/class/tpm/tpmX/device/ Date: April 2005 KernelVersion: 2.6.12 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The device/ directory under a specific TPM instance exposes the properties of that TPM chip @@ -9,7 +9,7 @@ What: /sys/class/tpm/tpmX/device/active Date: April 2006 KernelVersion: 2.6.17 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The "active" property prints a '1' if the TPM chip is accepting commands. An inactive TPM chip still contains all the state of an active chip (Storage Root Key, NVRAM, etc), and can be @@ -21,7 +21,7 @@ What: /sys/class/tpm/tpmX/device/cancel Date: June 2005 KernelVersion: 2.6.13 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The "cancel" property allows you to cancel the currently pending TPM command. Writing any value to cancel will call the TPM vendor specific cancel operation. @@ -29,7 +29,7 @@ What: /sys/class/tpm/tpmX/device/caps Date: April 2005 KernelVersion: 2.6.12 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The "caps" property contains TPM manufacturer and version info. Example output: @@ -46,7 +46,7 @@ What: /sys/class/tpm/tpmX/device/durations Date: March 2011 KernelVersion: 3.1 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The "durations" property shows the 3 vendor-specific values used to wait for a short, medium and long TPM command. All TPM commands are categorized as short, medium or long in @@ -69,7 +69,7 @@ What: /sys/class/tpm/tpmX/device/enabled Date: April 2006 KernelVersion: 2.6.17 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The "enabled" property prints a '1' if the TPM chip is enabled, meaning that it should be visible to the OS. This property may be visible but produce a '0' after some operation that @@ -78,7 +78,7 @@ What: /sys/class/tpm/tpmX/device/owned Date: April 2006 KernelVersion: 2.6.17 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The "owned" property produces a '1' if the TPM_TakeOwnership ordinal has been executed successfully in the chip. A '0' indicates that ownership hasn't been taken. @@ -86,7 +86,7 @@ What: /sys/class/tpm/tpmX/device/pcrs Date: April 2005 KernelVersion: 2.6.12 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The "pcrs" property will dump the current value of all Platform Configuration Registers in the TPM. Note that since these values may be constantly changing, the output is only valid @@ -109,7 +109,7 @@ What: /sys/class/tpm/tpmX/device/pubek Date: April 2005 KernelVersion: 2.6.12 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The "pubek" property will return the TPM's public endorsement key if possible. If the TPM has had ownership established and is version 1.2, the pubek will not be available without the @@ -161,7 +161,7 @@ What: /sys/class/tpm/tpmX/device/temp_deactivated Date: April 2006 KernelVersion: 2.6.17 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The "temp_deactivated" property returns a '1' if the chip has been temporarily deactivated, usually until the next power cycle. Whether a warm boot (reboot) will clear a TPM chip @@ -170,7 +170,7 @@ What: /sys/class/tpm/tpmX/device/timeouts Date: March 2011 KernelVersion: 3.1 -Contact: tpmdd-devel@lists.sf.net +Contact: linux-integrity@vger.kernel.org Description: The "timeouts" property shows the 4 vendor-specific values for the TPM's interface spec timeouts. The use of these timeouts is defined by the TPM interface spec that the chip @@ -183,3 +183,14 @@ The four timeout values are shown in usecs, with a trailing "[original]" or "[adjusted]" depending on whether the values were scaled by the driver to be reported in usec from msecs. + +What: /sys/class/tpm/tpmX/tpm_version_major +Date: October 2019 +KernelVersion: 5.5 +Contact: linux-integrity@vger.kernel.org +Description: The "tpm_version_major" property shows the TCG spec major version + implemented by the TPM device. + + Example output: + + 2 --- a/Documentation/ABI/stable/sysfs-driver-dma-idxd +++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd @@ -0,0 +1,171 @@ +What: sys/bus/dsa/devices/dsa/cdev_major +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The major number that the character device driver assigned to + this device. + +What: sys/bus/dsa/devices/dsa/errors +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The error information for this device. + +What: sys/bus/dsa/devices/dsa/max_batch_size +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The largest number of work descriptors in a batch. + +What: sys/bus/dsa/devices/dsa/max_work_queues_size +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The maximum work queue size supported by this device. + +What: sys/bus/dsa/devices/dsa/max_engines +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The maximum number of engines supported by this device. + +What: sys/bus/dsa/devices/dsa/max_groups +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The maximum number of groups can be created under this device. + +What: sys/bus/dsa/devices/dsa/max_tokens +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The total number of bandwidth tokens supported by this device. + The bandwidth tokens represent resources within the DSA + implementation, and these resources are allocated by engines to + support operations. + +What: sys/bus/dsa/devices/dsa/max_transfer_size +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The number of bytes to be read from the source address to + perform the operation. The maximum transfer size is dependent on + the workqueue the descriptor was submitted to. + +What: sys/bus/dsa/devices/dsa/max_work_queues +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The maximum work queue number that this device supports. + +What: sys/bus/dsa/devices/dsa/numa_node +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The numa node number for this device. + +What: sys/bus/dsa/devices/dsa/op_cap +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The operation capability bit mask specify the operation types + supported by the this device. + +What: sys/bus/dsa/devices/dsa/state +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The state information of this device. It can be either enabled + or disabled. + +What: sys/bus/dsa/devices/dsa/group. +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The assigned group under this device. + +What: sys/bus/dsa/devices/dsa/engine. +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The assigned engine under this device. + +What: sys/bus/dsa/devices/dsa/wq. +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The assigned work queue under this device. + +What: sys/bus/dsa/devices/dsa/configurable +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: To indicate if this device is configurable or not. + +What: sys/bus/dsa/devices/dsa/token_limit +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The maximum number of bandwidth tokens that may be in use at + one time by operations that access low bandwidth memory in the + device. + +What: sys/bus/dsa/devices/wq./group_id +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The group id that this work queue belongs to. + +What: sys/bus/dsa/devices/wq./size +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The work queue size for this work queue. + +What: sys/bus/dsa/devices/wq./type +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The type of this work queue, it can be "kernel" type for work + queue usages in the kernel space or "user" type for work queue + usages by applications in user space. + +What: sys/bus/dsa/devices/wq./cdev_minor +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The minor number assigned to this work queue by the character + device driver. + +What: sys/bus/dsa/devices/wq./mode +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The work queue mode type for this work queue. + +What: sys/bus/dsa/devices/wq./priority +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The priority value of this work queue, it is a vlue relative to + other work queue in the same group to control quality of service + for dispatching work from multiple workqueues in the same group. + +What: sys/bus/dsa/devices/wq./state +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The current state of the work queue. + +What: sys/bus/dsa/devices/wq./threshold +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The number of entries in this work queue that may be filled + via a limited portal. + +What: sys/bus/dsa/devices/engine./group_id +Date: Oct 25, 2019 +KernelVersion: 5.6.0 +Contact: dmaengine@vger.kernel.org +Description: The group that this engine belongs to. --- a/Documentation/ABI/stable/sysfs-driver-mlxreg-io +++ b/Documentation/ABI/stable/sysfs-driver-mlxreg-io @@ -1,5 +1,4 @@ What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/asic_health - Date: June 2018 KernelVersion: 4.19 Contact: Vadim Pasternak @@ -19,7 +18,6 @@ The files are read only. What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/fan_dir - Date: December 2018 KernelVersion: 5.0 Contact: Vadim Pasternak @@ -30,7 +28,6 @@ The files are read only. What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld3_version - Date: November 2018 KernelVersion: 5.0 Contact: Vadim Pasternak @@ -40,7 +37,6 @@ The files are read only. What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/jtag_enable - Date: November 2018 KernelVersion: 5.0 Contact: Vadim Pasternak @@ -108,7 +104,6 @@ What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_comex What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_system What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_voltmon_upgrade_fail - Date: November 2018 KernelVersion: 5.0 Contact: Vadim Pasternak @@ -130,6 +125,12 @@ The files are read only. +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_thermal +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_wd +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_asic +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_reload_bios +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sff_wd +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_swb_wd Date: June 2019 KernelVersion: 5.3 Contact: Vadim Pasternak @@ -143,9 +144,65 @@ The files are read only. -What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_thermal -What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_wd -What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_asic -What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_reload_bios -What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sff_wd -What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_swb_wd +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/config1 +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/config2 +Date: January 2020 +KernelVersion: 5.6 +Contact: Vadim Pasternak +Description: These files show system static topology identification + like system's static I2C topology, number and type of FPGA + devices within the system and so on. + + The files are read only. + +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_ac_pwr_fail +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_platform +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_soc +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sw_pwr_off +Date: January 2020 +KernelVersion: 5.6 +Contact: Vadim Pasternak +Description: These files show the system reset causes, as following: reset + due to AC power failure, reset invoked from software by + assertion reset signal through CPLD. reset caused by signal + asserted by SOC through ACPI register, reset invoked from + software by assertion power off signal through CPLD. + + The files are read only. + +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/pcie_asic_reset_dis +Date: January 2020 +KernelVersion: 5.6 +Contact: Vadim Pasternak +Description: This file allows to retain ASIC up during PCIe root complex + reset, when attribute is set 1. + + The file is read/write. + +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/vpd_wp +Date: January 2020 +KernelVersion: 5.6 +Contact: Vadim Pasternak +Description: This file allows to overwrite system VPD hardware wrtie + protection when attribute is set 1. + + The file is read/write. + +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/voltreg_update_status +Date: January 2020 +KernelVersion: 5.6 +Contact: Vadim Pasternak +Description: This file exposes the configuration update status of burnable + voltage regulator devices. The status values are as following: + 0 - OK; 1 - CRC failure; 2 = I2C failure; 3 - in progress. + + The file is read only. + +What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/ufm_version +Date: January 2020 +KernelVersion: 5.6 +Contact: Vadim Pasternak +Description: This file exposes the firmware version of burnable voltage + regulator devices. + + The file is read only. --- a/Documentation/ABI/testing/configfs-usb-gadget +++ b/Documentation/ABI/testing/configfs-usb-gadget @@ -16,6 +16,10 @@ write UDC's name found in /sys/class/udc/* to bind a gadget, empty string "" to unbind. + max_speed - maximum speed the driver supports. Valid + names are super-speed-plus, super-speed, + high-speed, full-speed, and low-speed. + bDeviceClass - USB device class code bDeviceSubClass - USB device subclass code bDeviceProtocol - USB device protocol code --- a/Documentation/ABI/testing/ima_policy +++ b/Documentation/ABI/testing/ima_policy @@ -25,11 +25,11 @@ lsm: [[subj_user=] [subj_role=] [subj_type=] [obj_user=] [obj_role=] [obj_type=]] option: [[appraise_type=]] [template=] [permit_directio] - [appraise_flag=] + [appraise_flag=] [keyrings=] base: func:= [BPRM_CHECK][MMAP_CHECK][CREDS_CHECK][FILE_CHECK][MODULE_CHECK] [FIRMWARE_CHECK] [KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK] - [KEXEC_CMDLINE] + [KEXEC_CMDLINE] [KEY_CHECK] mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND] [[^]MAY_EXEC] fsmagic:= hex value @@ -42,6 +42,9 @@ appraise_flag:= [check_blacklist] Currently, blacklist check is only for files signed with appended signature. + keyrings:= list of keyrings + (eg, .builtin_trusted_keys|.ima). Only valid + when action is "measure" and func is KEY_CHECK. template:= name of a defined IMA template type (eg, ima-ng). Only valid when action is "measure". pcr:= decimal value @@ -113,3 +116,12 @@ Example of appraise rule allowing modsig appended signatures: appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig + + Example of measure rule using KEY_CHECK to measure all keys: + + measure func=KEY_CHECK + + Example of measure rule using KEY_CHECK to only measure + keys added to .builtin_trusted_keys or .ima keyring: + + measure func=KEY_CHECK keyrings=.builtin_trusted_keys|.ima --- a/Documentation/ABI/testing/sysfs-bus-iio +++ b/Documentation/ABI/testing/sysfs-bus-iio @@ -1726,3 +1726,16 @@ Description: List of valid periods (in seconds) for which the light intensity must be above the threshold level before interrupt is asserted. + +What: /sys/bus/iio/devices/iio:deviceX/in_filter_notch_center_frequency +KernelVersion: 5.5 +Contact: linux-iio@vger.kernel.org +Description: + Center frequency in Hz for a notch filter. Used i.e. for line + noise suppression. + +What: /sys/bus/iio/devices/iio:deviceX/in_temp_thermocouple_type +KernelVersion: 5.5 +Contact: linux-iio@vger.kernel.org +Description: + One of the following thermocouple types: B, E, J, K, N, R, S, T. --- a/Documentation/ABI/testing/sysfs-bus-iio-dma-buffer +++ b/Documentation/ABI/testing/sysfs-bus-iio-dma-buffer @@ -0,0 +1,19 @@ +What: /sys/bus/iio/devices/iio:deviceX/buffer/length_align_bytes +KernelVersion: 5.4 +Contact: linux-iio@vger.kernel.org +Description: + DMA buffers tend to have a alignment requirement for the + buffers. If this alignment requirement is not met samples might + be dropped from the buffer. + + This property reports the alignment requirements in bytes. + This means that the buffer size in bytes needs to be a integer + multiple of the number reported by this file. + + The alignment requirements in number of sample sets will depend + on the enabled channels and the bytes per channel. This means + that the alignment requirement in samples sets might change + depending on which and how many channels are enabled. Whereas + the alignment requirement reported in bytes by this property + will remain static and does not depend on which channels are + enabled. --- a/Documentation/ABI/testing/sysfs-bus-mdio +++ b/Documentation/ABI/testing/sysfs-bus-mdio @@ -0,0 +1,63 @@ +What: /sys/bus/mdio_bus/devices/.../statistics/ +Date: January 2020 +KernelVersion: 5.6 +Contact: netdev@vger.kernel.org +Description: + This folder contains statistics about global and per + MDIO bus address statistics. + +What: /sys/bus/mdio_bus/devices/.../statistics/transfers +Date: January 2020 +KernelVersion: 5.6 +Contact: netdev@vger.kernel.org +Description: + Total number of transfers for this MDIO bus. + +What: /sys/bus/mdio_bus/devices/.../statistics/errors +Date: January 2020 +KernelVersion: 5.6 +Contact: netdev@vger.kernel.org +Description: + Total number of transfer errors for this MDIO bus. + +What: /sys/bus/mdio_bus/devices/.../statistics/writes +Date: January 2020 +KernelVersion: 5.6 +Contact: netdev@vger.kernel.org +Description: + Total number of write transactions for this MDIO bus. + +What: /sys/bus/mdio_bus/devices/.../statistics/reads +Date: January 2020 +KernelVersion: 5.6 +Contact: netdev@vger.kernel.org +Description: + Total number of read transactions for this MDIO bus. + +What: /sys/bus/mdio_bus/devices/.../statistics/transfers_ +Date: January 2020 +KernelVersion: 5.6 +Contact: netdev@vger.kernel.org +Description: + Total number of transfers for this MDIO bus address. + +What: /sys/bus/mdio_bus/devices/.../statistics/errors_ +Date: January 2020 +KernelVersion: 5.6 +Contact: netdev@vger.kernel.org +Description: + Total number of transfer errors for this MDIO bus address. + +What: /sys/bus/mdio_bus/devices/.../statistics/writes_ +Date: January 2020 +KernelVersion: 5.6 +Contact: netdev@vger.kernel.org +Description: + Total number of write transactions for this MDIO bus address. + +What: /sys/bus/mdio_bus/devices/.../statistics/reads_ +Date: January 2020 +KernelVersion: 5.6 +Contact: netdev@vger.kernel.org +Description: + Total number of read transactions for this MDIO bus address. --- a/Documentation/ABI/testing/sysfs-class-devfreq +++ b/Documentation/ABI/testing/sysfs-class-devfreq @@ -7,6 +7,13 @@ The name of devfreq object denoted as ... is same as the name of device using devfreq. +What: /sys/class/devfreq/.../name +Date: November 2019 +Contact: Chanwoo Choi +Description: + The /sys/class/devfreq/.../name shows the name of device + of the corresponding devfreq object. + What: /sys/class/devfreq/.../governor Date: September 2011 Contact: MyungJoo Ham @@ -48,12 +55,15 @@ Date: October 2012 Contact: MyungJoo Ham Description: - This ABI shows the statistics of devfreq behavior on a - specific device. It shows the time spent in each state and - the number of transitions between states. + This ABI shows or clears the statistics of devfreq behavior + on a specific device. It shows the time spent in each state + and the number of transitions between states. In order to activate this ABI, the devfreq target device driver should provide the list of available frequencies - with its profile. + with its profile. If need to reset the statistics of devfreq + behavior on a specific device, enter 0(zero) to 'trans_stat' + as following: + echo 0 > /sys/class/devfreq/.../trans_stat What: /sys/class/devfreq/.../userspace/set_freq Date: September 2011 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -196,6 +196,12 @@ does not reflect it. Likewise, if one enables a deep state but a lighter state still is disabled, then this has no effect. +What: /sys/devices/system/cpu/cpuX/cpuidle/stateN/default_status +Date: December 2019 +KernelVersion: v5.6 +Contact: Linux power management list +Description: + (RO) The default status of this state, "enabled" or "disabled". What: /sys/devices/system/cpu/cpuX/cpuidle/stateN/residency Date: March 2014 --- a/Documentation/ABI/testing/sysfs-platform-asus-wmi +++ b/Documentation/ABI/testing/sysfs-platform-asus-wmi @@ -46,3 +46,13 @@ * 0 - normal, * 1 - overboost, * 2 - silent + +What: /sys/devices/platform//throttle_thermal_policy +Date: Dec 2019 +KernelVersion: 5.6 +Contact: "Leonid Maksymchuk" +Description: + Throttle thermal policy mode: + * 0 - default, + * 1 - overboost, + * 2 - silent --- a/Documentation/ABI/testing/sysfs-power +++ b/Documentation/ABI/testing/sysfs-power @@ -407,3 +407,16 @@ Description: The /sys/power/suspend_stats/last_failed_step file contains the last failed step in the suspend/resume path. + +What: /sys/power/sync_on_suspend +Date: October 2019 +Contact: Jonas Meurer +Description: + This file controls whether or not the kernel will sync() + filesystems during system suspend (after freezing user space + and before suspending devices). + + Writing a "1" to this file enables the sync() and writing a "0" + disables it. Reads from the file return the current value. + The default is "1" if the build-time "SUSPEND_SKIP_SYNC" config + flag is unset, or "0" otherwise. --- a/Documentation/ABI/testing/usb-charger-uevent +++ b/Documentation/ABI/testing/usb-charger-uevent @@ -0,0 +1,46 @@ +What: Raise a uevent when a USB charger is inserted or removed +Date: 2020-01-14 +KernelVersion: 5.6 +Contact: linux-usb@vger.kernel.org +Description: There are two USB charger states: + USB_CHARGER_ABSENT + USB_CHARGER_PRESENT + There are five USB charger types: + USB_CHARGER_UNKNOWN_TYPE: Charger type is unknown + USB_CHARGER_SDP_TYPE: Standard Downstream Port + USB_CHARGER_CDP_TYPE: Charging Downstream Port + USB_CHARGER_DCP_TYPE: Dedicated Charging Port + USB_CHARGER_ACA_TYPE: Accessory Charging Adapter + https://www.usb.org/document-library/battery-charging-v12-spec-and-adopters-agreement + + Here are two examples taken using udevadm monitor -p when + USB charger is online: + UDEV change /devices/soc0/usbphynop1 (platform) + ACTION=change + DEVPATH=/devices/soc0/usbphynop1 + DRIVER=usb_phy_generic + MODALIAS=of:Nusbphynop1T(null)Cusb-nop-xceiv + OF_COMPATIBLE_0=usb-nop-xceiv + OF_COMPATIBLE_N=1 + OF_FULLNAME=/usbphynop1 + OF_NAME=usbphynop1 + SEQNUM=2493 + SUBSYSTEM=platform + USB_CHARGER_STATE=USB_CHARGER_PRESENT + USB_CHARGER_TYPE=USB_CHARGER_SDP_TYPE + USEC_INITIALIZED=227422826 + + USB charger is offline: + KERNEL change /devices/soc0/usbphynop1 (platform) + ACTION=change + DEVPATH=/devices/soc0/usbphynop1 + DRIVER=usb_phy_generic + MODALIAS=of:Nusbphynop1T(null)Cusb-nop-xceiv + OF_COMPATIBLE_0=usb-nop-xceiv + OF_COMPATIBLE_N=1 + OF_FULLNAME=/usbphynop1 + OF_NAME=usbphynop1 + SEQNUM=2494 + SUBSYSTEM=platform + USB_CHARGER_STATE=USB_CHARGER_ABSENT + USB_CHARGER_TYPE=USB_CHARGER_UNKNOWN_TYPE --- a/Documentation/RCU/NMI-RCU.rst +++ b/Documentation/RCU/NMI-RCU.rst @@ -0,0 +1,124 @@ +.. _NMI_rcu_doc: + +Using RCU to Protect Dynamic NMI Handlers +========================================= + + +Although RCU is usually used to protect read-mostly data structures, +it is possible to use RCU to provide dynamic non-maskable interrupt +handlers, as well as dynamic irq handlers. This document describes +how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer +work in "arch/x86/oprofile/nmi_timer_int.c" and in +"arch/x86/kernel/traps.c". + +The relevant pieces of code are listed below, each followed by a +brief explanation:: + + static int dummy_nmi_callback(struct pt_regs *regs, int cpu) + { + return 0; + } + +The dummy_nmi_callback() function is a "dummy" NMI handler that does +nothing, but returns zero, thus saying that it did nothing, allowing +the NMI handler to take the default machine-specific action:: + + static nmi_callback_t nmi_callback = dummy_nmi_callback; + +This nmi_callback variable is a global function pointer to the current +NMI handler:: + + void do_nmi(struct pt_regs * regs, long error_code) + { + int cpu; + + nmi_enter(); + + cpu = smp_processor_id(); + ++nmi_count(cpu); + + if (!rcu_dereference_sched(nmi_callback)(regs, cpu)) + default_do_nmi(regs); + + nmi_exit(); + } + +The do_nmi() function processes each NMI. It first disables preemption +in the same way that a hardware irq would, then increments the per-CPU +count of NMIs. It then invokes the NMI handler stored in the nmi_callback +function pointer. If this handler returns zero, do_nmi() invokes the +default_do_nmi() function to handle a machine-specific NMI. Finally, +preemption is restored. + +In theory, rcu_dereference_sched() is not needed, since this code runs +only on i386, which in theory does not need rcu_dereference_sched() +anyway. However, in practice it is a good documentation aid, particularly +for anyone attempting to do something similar on Alpha or on systems +with aggressive optimizing compilers. + +Quick Quiz: + Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only? + +:ref:`Answer to Quick Quiz ` + +Back to the discussion of NMI and RCU:: + + void set_nmi_callback(nmi_callback_t callback) + { + rcu_assign_pointer(nmi_callback, callback); + } + +The set_nmi_callback() function registers an NMI handler. Note that any +data that is to be used by the callback must be initialized up -before- +the call to set_nmi_callback(). On architectures that do not order +writes, the rcu_assign_pointer() ensures that the NMI handler sees the +initialized values:: + + void unset_nmi_callback(void) + { + rcu_assign_pointer(nmi_callback, dummy_nmi_callback); + } + +This function unregisters an NMI handler, restoring the original +dummy_nmi_handler(). However, there may well be an NMI handler +currently executing on some other CPU. We therefore cannot free +up any data structures used by the old NMI handler until execution +of it completes on all other CPUs. + +One way to accomplish this is via synchronize_rcu(), perhaps as +follows:: + + unset_nmi_callback(); + synchronize_rcu(); + kfree(my_nmi_data); + +This works because (as of v4.20) synchronize_rcu() blocks until all +CPUs complete any preemption-disabled segments of code that they were +executing. +Since NMI handlers disable preemption, synchronize_rcu() is guaranteed +not to return until all ongoing NMI handlers exit. It is therefore safe +to free up the handler's data as soon as synchronize_rcu() returns. + +Important note: for this to work, the architecture in question must +invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively. + +.. _answer_quick_quiz_NMI: + +Answer to Quick Quiz: + Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only? + + The caller to set_nmi_callback() might well have + initialized some data that is to be used by the new NMI + handler. In this case, the rcu_dereference_sched() would + be needed, because otherwise a CPU that received an NMI + just after the new handler was set might see the pointer + to the new NMI handler, but the old pre-initialized + version of the handler's data. + + This same sad story can happen on other CPUs when using + a compiler with aggressive pointer-value speculation + optimizations. + + More important, the rcu_dereference_sched() makes it + clear to someone reading the code that the pointer is + being protected by RCU-sched. --- a/Documentation/RCU/NMI-RCU.txt +++ b/Documentation/RCU/NMI-RCU.txt @@ -1,121 +0,0 @@ -Using RCU to Protect Dynamic NMI Handlers - - -Although RCU is usually used to protect read-mostly data structures, -it is possible to use RCU to provide dynamic non-maskable interrupt -handlers, as well as dynamic irq handlers. This document describes -how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer -work in "arch/x86/oprofile/nmi_timer_int.c" and in -"arch/x86/kernel/traps.c". - -The relevant pieces of code are listed below, each followed by a -brief explanation. - - static int dummy_nmi_callback(struct pt_regs *regs, int cpu) - { - return 0; - } - -The dummy_nmi_callback() function is a "dummy" NMI handler that does -nothing, but returns zero, thus saying that it did nothing, allowing -the NMI handler to take the default machine-specific action. - - static nmi_callback_t nmi_callback = dummy_nmi_callback; - -This nmi_callback variable is a global function pointer to the current -NMI handler. - - void do_nmi(struct pt_regs * regs, long error_code) - { - int cpu; - - nmi_enter(); - - cpu = smp_processor_id(); - ++nmi_count(cpu); - - if (!rcu_dereference_sched(nmi_callback)(regs, cpu)) - default_do_nmi(regs); - - nmi_exit(); - } - -The do_nmi() function processes each NMI. It first disables preemption -in the same way that a hardware irq would, then increments the per-CPU -count of NMIs. It then invokes the NMI handler stored in the nmi_callback -function pointer. If this handler returns zero, do_nmi() invokes the -default_do_nmi() function to handle a machine-specific NMI. Finally, -preemption is restored. - -In theory, rcu_dereference_sched() is not needed, since this code runs -only on i386, which in theory does not need rcu_dereference_sched() -anyway. However, in practice it is a good documentation aid, particularly -for anyone attempting to do something similar on Alpha or on systems -with aggressive optimizing compilers. - -Quick Quiz: Why might the rcu_dereference_sched() be necessary on Alpha, - given that the code referenced by the pointer is read-only? - - -Back to the discussion of NMI and RCU... - - void set_nmi_callback(nmi_callback_t callback) - { - rcu_assign_pointer(nmi_callback, callback); - } - -The set_nmi_callback() function registers an NMI handler. Note that any -data that is to be used by the callback must be initialized up -before- -the call to set_nmi_callback(). On architectures that do not order -writes, the rcu_assign_pointer() ensures that the NMI handler sees the -initialized values. - - void unset_nmi_callback(void) - { - rcu_assign_pointer(nmi_callback, dummy_nmi_callback); - } - -This function unregisters an NMI handler, restoring the original -dummy_nmi_handler(). However, there may well be an NMI handler -currently executing on some other CPU. We therefore cannot free -up any data structures used by the old NMI handler until execution -of it completes on all other CPUs. - -One way to accomplish this is via synchronize_rcu(), perhaps as -follows: - - unset_nmi_callback(); - synchronize_rcu(); - kfree(my_nmi_data); - -This works because (as of v4.20) synchronize_rcu() blocks until all -CPUs complete any preemption-disabled segments of code that they were -executing. -Since NMI handlers disable preemption, synchronize_rcu() is guaranteed -not to return until all ongoing NMI handlers exit. It is therefore safe -to free up the handler's data as soon as synchronize_rcu() returns. - -Important note: for this to work, the architecture in question must -invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively. - - -Answer to Quick Quiz - - Why might the rcu_dereference_sched() be necessary on Alpha, given - that the code referenced by the pointer is read-only? - - Answer: The caller to set_nmi_callback() might well have - initialized some data that is to be used by the new NMI - handler. In this case, the rcu_dereference_sched() would - be needed, because otherwise a CPU that received an NMI - just after the new handler was set might see the pointer - to the new NMI handler, but the old pre-initialized - version of the handler's data. - - This same sad story can happen on other CPUs when using - a compiler with aggressive pointer-value speculation - optimizations. - - More important, the rcu_dereference_sched() makes it - clear to someone reading the code that the pointer is - being protected by RCU-sched. --- a/Documentation/RCU/arrayRCU.rst +++ b/Documentation/RCU/arrayRCU.rst @@ -0,0 +1,165 @@ +.. _array_rcu_doc: + +Using RCU to Protect Read-Mostly Arrays +======================================= + +Although RCU is more commonly used to protect linked lists, it can +also be used to protect arrays. Three situations are as follows: + +1. :ref:`Hash Tables ` + +2. :ref:`Static Arrays ` + +3. :ref:`Resizable Arrays ` + +Each of these three situations involves an RCU-protected pointer to an +array that is separately indexed. It might be tempting to consider use +of RCU to instead protect the index into an array, however, this use +case is **not** supported. The problem with RCU-protected indexes into +arrays is that compilers can play way too many optimization games with +integers, which means that the rules governing handling of these indexes +are far more trouble than they are worth. If RCU-protected indexes into +arrays prove to be particularly valuable (which they have not thus far), +explicit cooperation from the compiler will be required to permit them +to be safely used. + +That aside, each of the three RCU-protected pointer situations are +described in the following sections. + +.. _hash_tables: + +Situation 1: Hash Tables +------------------------ + +Hash tables are often implemented as an array, where each array entry +has a linked-list hash chain. Each hash chain can be protected by RCU +as described in the listRCU.txt document. This approach also applies +to other array-of-list situations, such as radix trees. + +.. _static_arrays: + +Situation 2: Static Arrays +-------------------------- + +Static arrays, where the data (rather than a pointer to the data) is +located in each array element, and where the array is never resized, +have not been used with RCU. Rik van Riel recommends using seqlock in +this situation, which would also have minimal read-side overhead as long +as updates are rare. + +Quick Quiz: + Why is it so important that updates be rare when using seqlock? + +:ref:`Answer to Quick Quiz ` + +.. _resizable_arrays: + +Situation 3: Resizable Arrays +------------------------------ + +Use of RCU for resizable arrays is demonstrated by the grow_ary() +function formerly used by the System V IPC code. The array is used +to map from semaphore, message-queue, and shared-memory IDs to the data +structure that represents the corresponding IPC construct. The grow_ary() +function does not acquire any locks; instead its caller must hold the +ids->sem semaphore. + +The grow_ary() function, shown below, does some limit checks, allocates a +new ipc_id_ary, copies the old to the new portion of the new, initializes +the remainder of the new, updates the ids->entries pointer to point to +the new array, and invokes ipc_rcu_putref() to free up the old array. +Note that rcu_assign_pointer() is used to update the ids->entries pointer, +which includes any memory barriers required on whatever architecture +you are running on:: + + static int grow_ary(struct ipc_ids* ids, int newsize) + { + struct ipc_id_ary* new; + struct ipc_id_ary* old; + int i; + int size = ids->entries->size; + + if(newsize > IPCMNI) + newsize = IPCMNI; + if(newsize <= size) + return newsize; + + new = ipc_rcu_alloc(sizeof(struct kern_ipc_perm *)*newsize + + sizeof(struct ipc_id_ary)); + if(new == NULL) + return size; + new->size = newsize; + memcpy(new->p, ids->entries->p, + sizeof(struct kern_ipc_perm *)*size + + sizeof(struct ipc_id_ary)); + for(i=size;ip[i] = NULL; + } + old = ids->entries; + + /* + * Use rcu_assign_pointer() to make sure the memcpyed + * contents of the new array are visible before the new + * array becomes visible. + */ + rcu_assign_pointer(ids->entries, new); + + ipc_rcu_putref(old); + return newsize; + } + +The ipc_rcu_putref() function decrements the array's reference count +and then, if the reference count has dropped to zero, uses call_rcu() +to free the array after a grace period has elapsed. + +The array is traversed by the ipc_lock() function. This function +indexes into the array under the protection of rcu_read_lock(), +using rcu_dereference() to pick up the pointer to the array so +that it may later safely be dereferenced -- memory barriers are +required on the Alpha CPU. Since the size of the array is stored +with the array itself, there can be no array-size mismatches, so +a simple check suffices. The pointer to the structure corresponding +to the desired IPC object is placed in "out", with NULL indicating +a non-existent entry. After acquiring "out->lock", the "out->deleted" +flag indicates whether the IPC object is in the process of being +deleted, and, if not, the pointer is returned:: + + struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id) + { + struct kern_ipc_perm* out; + int lid = id % SEQ_MULTIPLIER; + struct ipc_id_ary* entries; + + rcu_read_lock(); + entries = rcu_dereference(ids->entries); + if(lid >= entries->size) { + rcu_read_unlock(); + return NULL; + } + out = entries->p[lid]; + if(out == NULL) { + rcu_read_unlock(); + return NULL; + } + spin_lock(&out->lock); + + /* ipc_rmid() may have already freed the ID while ipc_lock + * was spinning: here verify that the structure is still valid + */ + if (out->deleted) { + spin_unlock(&out->lock); + rcu_read_unlock(); + return NULL; + } + return out; + } + +.. _answer_quick_quiz_seqlock: + +Answer to Quick Quiz: + Why is it so important that updates be rare when using seqlock? + + The reason that it is important that updates be rare when + using seqlock is that frequent updates can livelock readers. + One way to avoid this problem is to assign a seqlock for + each array entry rather than to the entire array. --- a/Documentation/RCU/arrayRCU.txt +++ b/Documentation/RCU/arrayRCU.txt @@ -1,153 +0,0 @@ -Using RCU to Protect Read-Mostly Arrays - - -Although RCU is more commonly used to protect linked lists, it can -also be used to protect arrays. Three situations are as follows: - -1. Hash Tables - -2. Static Arrays - -3. Resizeable Arrays - -Each of these three situations involves an RCU-protected pointer to an -array that is separately indexed. It might be tempting to consider use -of RCU to instead protect the index into an array, however, this use -case is -not- supported. The problem with RCU-protected indexes into -arrays is that compilers can play way too many optimization games with -integers, which means that the rules governing handling of these indexes -are far more trouble than they are worth. If RCU-protected indexes into -arrays prove to be particularly valuable (which they have not thus far), -explicit cooperation from the compiler will be required to permit them -to be safely used. - -That aside, each of the three RCU-protected pointer situations are -described in the following sections. - - -Situation 1: Hash Tables - -Hash tables are often implemented as an array, where each array entry -has a linked-list hash chain. Each hash chain can be protected by RCU -as described in the listRCU.txt document. This approach also applies -to other array-of-list situations, such as radix trees. - - -Situation 2: Static Arrays - -Static arrays, where the data (rather than a pointer to the data) is -located in each array element, and where the array is never resized, -have not been used with RCU. Rik van Riel recommends using seqlock in -this situation, which would also have minimal read-side overhead as long -as updates are rare. - -Quick Quiz: Why is it so important that updates be rare when - using seqlock? - - -Situation 3: Resizeable Arrays - -Use of RCU for resizeable arrays is demonstrated by the grow_ary() -function formerly used by the System V IPC code. The array is used -to map from semaphore, message-queue, and shared-memory IDs to the data -structure that represents the corresponding IPC construct. The grow_ary() -function does not acquire any locks; instead its caller must hold the -ids->sem semaphore. - -The grow_ary() function, shown below, does some limit checks, allocates a -new ipc_id_ary, copies the old to the new portion of the new, initializes -the remainder of the new, updates the ids->entries pointer to point to -the new array, and invokes ipc_rcu_putref() to free up the old array. -Note that rcu_assign_pointer() is used to update the ids->entries pointer, -which includes any memory barriers required on whatever architecture -you are running on. - - static int grow_ary(struct ipc_ids* ids, int newsize) - { - struct ipc_id_ary* new; - struct ipc_id_ary* old; - int i; - int size = ids->entries->size; - - if(newsize > IPCMNI) - newsize = IPCMNI; - if(newsize <= size) - return newsize; - - new = ipc_rcu_alloc(sizeof(struct kern_ipc_perm *)*newsize + - sizeof(struct ipc_id_ary)); - if(new == NULL) - return size; - new->size = newsize; - memcpy(new->p, ids->entries->p, - sizeof(struct kern_ipc_perm *)*size + - sizeof(struct ipc_id_ary)); - for(i=size;ip[i] = NULL; - } - old = ids->entries; - - /* - * Use rcu_assign_pointer() to make sure the memcpyed - * contents of the new array are visible before the new - * array becomes visible. - */ - rcu_assign_pointer(ids->entries, new); - - ipc_rcu_putref(old); - return newsize; - } - -The ipc_rcu_putref() function decrements the array's reference count -and then, if the reference count has dropped to zero, uses call_rcu() -to free the array after a grace period has elapsed. - -The array is traversed by the ipc_lock() function. This function -indexes into the array under the protection of rcu_read_lock(), -using rcu_dereference() to pick up the pointer to the array so -that it may later safely be dereferenced -- memory barriers are -required on the Alpha CPU. Since the size of the array is stored -with the array itself, there can be no array-size mismatches, so -a simple check suffices. The pointer to the structure corresponding -to the desired IPC object is placed in "out", with NULL indicating -a non-existent entry. After acquiring "out->lock", the "out->deleted" -flag indicates whether the IPC object is in the process of being -deleted, and, if not, the pointer is returned. - - struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id) - { - struct kern_ipc_perm* out; - int lid = id % SEQ_MULTIPLIER; - struct ipc_id_ary* entries; - - rcu_read_lock(); - entries = rcu_dereference(ids->entries); - if(lid >= entries->size) { - rcu_read_unlock(); - return NULL; - } - out = entries->p[lid]; - if(out == NULL) { - rcu_read_unlock(); - return NULL; - } - spin_lock(&out->lock); - - /* ipc_rmid() may have already freed the ID while ipc_lock - * was spinning: here verify that the structure is still valid - */ - if (out->deleted) { - spin_unlock(&out->lock); - rcu_read_unlock(); - return NULL; - } - return out; - } - - -Answer to Quick Quiz: - - The reason that it is important that updates be rare when - using seqlock is that frequent updates can livelock readers. - One way to avoid this problem is to assign a seqlock for - each array entry rather than to the entire array. --- a/Documentation/RCU/index.rst +++ b/Documentation/RCU/index.rst @@ -7,8 +7,13 @@ .. toctree:: :maxdepth: 3 + arrayRCU + rcubarrier + rcu_dereference + whatisRCU rcu listRCU + NMI-RCU UP Design/Memory-Ordering/Tree-RCU-Memory-Ordering --- a/Documentation/RCU/lockdep-splat.txt +++ b/Documentation/RCU/lockdep-splat.txt @@ -99,7 +99,7 @@ read-side critical section, which again would have suppressed the above lockdep-RCU splat. -But in this particular case, we don't actually deference the pointer +But in this particular case, we don't actually dereference the pointer returned from rcu_dereference(). Instead, that pointer is just compared to the cic pointer, which means that the rcu_dereference() can be replaced by rcu_access_pointer() as follows: --- a/Documentation/RCU/rcu_dereference.rst +++ b/Documentation/RCU/rcu_dereference.rst @@ -0,0 +1,463 @@ +.. _rcu_dereference_doc: + +PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference() +=============================================================== + +Most of the time, you can use values from rcu_dereference() or one of +the similar primitives without worries. Dereferencing (prefix "*"), +field selection ("->"), assignment ("="), address-of ("&"), addition and +subtraction of constants, and casts all work quite naturally and safely. + +It is nevertheless possible to get into trouble with other operations. +Follow these rules to keep your RCU code working properly: + +- You must use one of the rcu_dereference() family of primitives + to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU + will complain. Worse yet, your code can see random memory-corruption + bugs due to games that compilers and DEC Alpha can play. + Without one of the rcu_dereference() primitives, compilers + can reload the value, and won't your code have fun with two + different values for a single pointer! Without rcu_dereference(), + DEC Alpha can load a pointer, dereference that pointer, and + return data preceding initialization that preceded the store of + the pointer. + + In addition, the volatile cast in rcu_dereference() prevents the + compiler from deducing the resulting pointer value. Please see + the section entitled "EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH" + for an example where the compiler can in fact deduce the exact + value of the pointer, and thus cause misordering. + +- You are only permitted to use rcu_dereference on pointer values. + The compiler simply knows too much about integral values to + trust it to carry dependencies through integer operations. + There are a very few exceptions, namely that you can temporarily + cast the pointer to uintptr_t in order to: + + - Set bits and clear bits down in the must-be-zero low-order + bits of that pointer. This clearly means that the pointer + must have alignment constraints, for example, this does + -not- work in general for char* pointers. + + - XOR bits to translate pointers, as is done in some + classic buddy-allocator algorithms. + + It is important to cast the value back to pointer before + doing much of anything else with it. + +- Avoid cancellation when using the "+" and "-" infix arithmetic + operators. For example, for a given variable "x", avoid + "(x-(uintptr_t)x)" for char* pointers. The compiler is within its + rights to substitute zero for this sort of expression, so that + subsequent accesses no longer depend on the rcu_dereference(), + again possibly resulting in bugs due to misordering. + + Of course, if "p" is a pointer from rcu_dereference(), and "a" + and "b" are integers that happen to be equal, the expression + "p+a-b" is safe because its value still necessarily depends on + the rcu_dereference(), thus maintaining proper ordering. + +- If you are using RCU to protect JITed functions, so that the + "()" function-invocation operator is applied to a value obtained + (directly or indirectly) from rcu_dereference(), you may need to + interact directly with the hardware to flush instruction caches. + This issue arises on some systems when a newly JITed function is + using the same memory that was used by an earlier JITed function. + +- Do not use the results from relational operators ("==", "!=", + ">", ">=", "<", or "<=") when dereferencing. For example, + the following (quite strange) code is buggy:: + + int *p; + int *q; + + ... + + p = rcu_dereference(gp) + q = &global_q; + q += p > &oom_p; + r1 = *q; /* BUGGY!!! */ + + As before, the reason this is buggy is that relational operators + are often compiled using branches. And as before, although + weak-memory machines such as ARM or PowerPC do order stores + after such branches, but can speculate loads, which can again + result in misordering bugs. + +- Be very careful about comparing pointers obtained from + rcu_dereference() against non-NULL values. As Linus Torvalds + explained, if the two pointers are equal, the compiler could + substitute the pointer you are comparing against for the pointer + obtained from rcu_dereference(). For example:: + + p = rcu_dereference(gp); + if (p == &default_struct) + do_default(p->a); + + Because the compiler now knows that the value of "p" is exactly + the address of the variable "default_struct", it is free to + transform this code into the following:: + + p = rcu_dereference(gp); + if (p == &default_struct) + do_default(default_struct.a); + + On ARM and Power hardware, the load from "default_struct.a" + can now be speculated, such that it might happen before the + rcu_dereference(). This could result in bugs due to misordering. + + However, comparisons are OK in the following cases: + + - The comparison was against the NULL pointer. If the + compiler knows that the pointer is NULL, you had better + not be dereferencing it anyway. If the comparison is + non-equal, the compiler is none the wiser. Therefore, + it is safe to compare pointers from rcu_dereference() + against NULL pointers. + + - The pointer is never dereferenced after being compared. + Since there are no subsequent dereferences, the compiler + cannot use anything it learned from the comparison + to reorder the non-existent subsequent dereferences. + This sort of comparison occurs frequently when scanning + RCU-protected circular linked lists. + + Note that if checks for being within an RCU read-side + critical section are not required and the pointer is never + dereferenced, rcu_access_pointer() should be used in place + of rcu_dereference(). + + - The comparison is against a pointer that references memory + that was initialized "a long time ago." The reason + this is safe is that even if misordering occurs, the + misordering will not affect the accesses that follow + the comparison. So exactly how long ago is "a long + time ago"? Here are some possibilities: + + - Compile time. + + - Boot time. + + - Module-init time for module code. + + - Prior to kthread creation for kthread code. + + - During some prior acquisition of the lock that + we now hold. + + - Before mod_timer() time for a timer handler. + + There are many other possibilities involving the Linux + kernel's wide array of primitives that cause code to + be invoked at a later time. + + - The pointer being compared against also came from + rcu_dereference(). In this case, both pointers depend + on one rcu_dereference() or another, so you get proper + ordering either way. + + That said, this situation can make certain RCU usage + bugs more likely to happen. Which can be a good thing, + at least if they happen during testing. An example + of such an RCU usage bug is shown in the section titled + "EXAMPLE OF AMPLIFIED RCU-USAGE BUG". + + - All of the accesses following the comparison are stores, + so that a control dependency preserves the needed ordering. + That said, it is easy to get control dependencies wrong. + Please see the "CONTROL DEPENDENCIES" section of + Documentation/memory-barriers.txt for more details. + + - The pointers are not equal -and- the compiler does + not have enough information to deduce the value of the + pointer. Note that the volatile cast in rcu_dereference() + will normally prevent the compiler from knowing too much. + + However, please note that if the compiler knows that the + pointer takes on only one of two values, a not-equal + comparison will provide exactly the information that the + compiler needs to deduce the value of the pointer. + +- Disable any value-speculation optimizations that your compiler + might provide, especially if you are making use of feedback-based + optimizations that take data collected from prior runs. Such + value-speculation optimizations reorder operations by design. + + There is one exception to this rule: Value-speculation + optimizations that leverage the branch-prediction hardware are + safe on strongly ordered systems (such as x86), but not on weakly + ordered systems (such as ARM or Power). Choose your compiler + command-line options wisely! + + +EXAMPLE OF AMPLIFIED RCU-USAGE BUG +---------------------------------- + +Because updaters can run concurrently with RCU readers, RCU readers can +see stale and/or inconsistent values. If RCU readers need fresh or +consistent values, which they sometimes do, they need to take proper +precautions. To see this, consider the following code fragment:: + + struct foo { + int a; + int b; + int c; + }; + struct foo *gp1; + struct foo *gp2; + + void updater(void) + { + struct foo *p; + + p = kmalloc(...); + if (p == NULL) + deal_with_it(); + p->a = 42; /* Each field in its own cache line. */ + p->b = 43; + p->c = 44; + rcu_assign_pointer(gp1, p); + p->b = 143; + p->c = 144; + rcu_assign_pointer(gp2, p); + } + + void reader(void) + { + struct foo *p; + struct foo *q; + int r1, r2; + + p = rcu_dereference(gp2); + if (p == NULL) + return; + r1 = p->b; /* Guaranteed to get 143. */ + q = rcu_dereference(gp1); /* Guaranteed non-NULL. */ + if (p == q) { + /* The compiler decides that q->c is same as p->c. */ + r2 = p->c; /* Could get 44 on weakly order system. */ + } + do_something_with(r1, r2); + } + +You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible, +but you should not be. After all, the updater might have been invoked +a second time between the time reader() loaded into "r1" and the time +that it loaded into "r2". The fact that this same result can occur due +to some reordering from the compiler and CPUs is beside the point. + +But suppose that the reader needs a consistent view? + +Then one approach is to use locking, for example, as follows:: + + struct foo { + int a; + int b; + int c; + spinlock_t lock; + }; + struct foo *gp1; + struct foo *gp2; + + void updater(void) + { + struct foo *p; + + p = kmalloc(...); + if (p == NULL) + deal_with_it(); + spin_lock(&p->lock); + p->a = 42; /* Each field in its own cache line. */ + p->b = 43; + p->c = 44; + spin_unlock(&p->lock); + rcu_assign_pointer(gp1, p); + spin_lock(&p->lock); + p->b = 143; + p->c = 144; + spin_unlock(&p->lock); + rcu_assign_pointer(gp2, p); + } + + void reader(void) + { + struct foo *p; + struct foo *q; + int r1, r2; + + p = rcu_dereference(gp2); + if (p == NULL) + return; + spin_lock(&p->lock); + r1 = p->b; /* Guaranteed to get 143. */ + q = rcu_dereference(gp1); /* Guaranteed non-NULL. */ + if (p == q) { + /* The compiler decides that q->c is same as p->c. */ + r2 = p->c; /* Locking guarantees r2 == 144. */ + } + spin_unlock(&p->lock); + do_something_with(r1, r2); + } + +As always, use the right tool for the job! + + +EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH +----------------------------------------- + +If a pointer obtained from rcu_dereference() compares not-equal to some +other pointer, the compiler normally has no clue what the value of the +first pointer might be. This lack of knowledge prevents the compiler +from carrying out optimizations that otherwise might destroy the ordering +guarantees that RCU depends on. And the volatile cast in rcu_dereference() +should prevent the compiler from guessing the value. + +But without rcu_dereference(), the compiler knows more than you might +expect. Consider the following code fragment:: + + struct foo { + int a; + int b; + }; + static struct foo variable1; + static struct foo variable2; + static struct foo *gp = &variable1; + + void updater(void) + { + initialize_foo(&variable2); + rcu_assign_pointer(gp, &variable2); + /* + * The above is the only store to gp in this translation unit, + * and the address of gp is not exported in any way. + */ + } + + int reader(void) + { + struct foo *p; + + p = gp; + barrier(); + if (p == &variable1) + return p->a; /* Must be variable1.a. */ + else + return p->b; /* Must be variable2.b. */ + } + +Because the compiler can see all stores to "gp", it knows that the only +possible values of "gp" are "variable1" on the one hand and "variable2" +on the other. The comparison in reader() therefore tells the compiler +the exact value of "p" even in the not-equals case. This allows the +compiler to make the return values independent of the load from "gp", +in turn destroying the ordering between this load and the loads of the +return values. This can result in "p->b" returning pre-initialization +garbage values. + +In short, rcu_dereference() is -not- optional when you are going to +dereference the resulting pointer. + + +WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE? +------------------------------------------------------------ + +First, please avoid using rcu_dereference_raw() and also please avoid +using rcu_dereference_check() and rcu_dereference_protected() with a +second argument with a constant value of 1 (or true, for that matter). +With that caution out of the way, here is some guidance for which +member of the rcu_dereference() to use in various situations: + +1. If the access needs to be within an RCU read-side critical + section, use rcu_dereference(). With the new consolidated + RCU flavors, an RCU read-side critical section is entered + using rcu_read_lock(), anything that disables bottom halves, + anything that disables interrupts, or anything that disables + preemption. + +2. If the access might be within an RCU read-side critical section + on the one hand, or protected by (say) my_lock on the other, + use rcu_dereference_check(), for example:: + + p1 = rcu_dereference_check(p->rcu_protected_pointer, + lockdep_is_held(&my_lock)); + + +3. If the access might be within an RCU read-side critical section + on the one hand, or protected by either my_lock or your_lock on + the other, again use rcu_dereference_check(), for example:: + + p1 = rcu_dereference_check(p->rcu_protected_pointer, + lockdep_is_held(&my_lock) || + lockdep_is_held(&your_lock)); + +4. If the access is on the update side, so that it is always protected + by my_lock, use rcu_dereference_protected():: + + p1 = rcu_dereference_protected(p->rcu_protected_pointer, + lockdep_is_held(&my_lock)); + + This can be extended to handle multiple locks as in #3 above, + and both can be extended to check other conditions as well. + +5. If the protection is supplied by the caller, and is thus unknown + to this code, that is the rare case when rcu_dereference_raw() + is appropriate. In addition, rcu_dereference_raw() might be + appropriate when the lockdep expression would be excessively + complex, except that a better approach in that case might be to + take a long hard look at your synchronization design. Still, + there are data-locking cases where any one of a very large number + of locks or reference counters suffices to protect the pointer, + so rcu_dereference_raw() does have its place. + + However, its place is probably quite a bit smaller than one + might expect given the number of uses in the current kernel. + Ditto for its synonym, rcu_dereference_check( ... , 1), and + its close relative, rcu_dereference_protected(... , 1). + + +SPARSE CHECKING OF RCU-PROTECTED POINTERS +----------------------------------------- + +The sparse static-analysis tool checks for direct access to RCU-protected +pointers, which can result in "interesting" bugs due to compiler +optimizations involving invented loads and perhaps also load tearing. +For example, suppose someone mistakenly does something like this:: + + p = q->rcu_protected_pointer; + do_something_with(p->a); + do_something_else_with(p->b); + +If register pressure is high, the compiler might optimize "p" out +of existence, transforming the code to something like this:: + + do_something_with(q->rcu_protected_pointer->a); + do_something_else_with(q->rcu_protected_pointer->b); + +This could fatally disappoint your code if q->rcu_protected_pointer +changed in the meantime. Nor is this a theoretical problem: Exactly +this sort of bug cost Paul E. McKenney (and several of his innocent +colleagues) a three-day weekend back in the early 1990s. + +Load tearing could of course result in dereferencing a mashup of a pair +of pointers, which also might fatally disappoint your code. + +These problems could have been avoided simply by making the code instead +read as follows:: + + p = rcu_dereference(q->rcu_protected_pointer); + do_something_with(p->a); + do_something_else_with(p->b); + +Unfortunately, these sorts of bugs can be extremely hard to spot during +review. This is where the sparse tool comes into play, along with the +"__rcu" marker. If you mark a pointer declaration, whether in a structure +or as a formal parameter, with "__rcu", which tells sparse to complain if +this pointer is accessed directly. It will also cause sparse to complain +if a pointer not marked with "__rcu" is accessed using rcu_dereference() +and friends. For example, ->rcu_protected_pointer might be declared as +follows:: + + struct foo __rcu *rcu_protected_pointer; + +Use of "__rcu" is opt-in. If you choose not to use it, then you should +ignore the sparse warnings. --- a/Documentation/RCU/rcu_dereference.txt +++ b/Documentation/RCU/rcu_dereference.txt @@ -1,456 +0,0 @@ -PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference() - -Most of the time, you can use values from rcu_dereference() or one of -the similar primitives without worries. Dereferencing (prefix "*"), -field selection ("->"), assignment ("="), address-of ("&"), addition and -subtraction of constants, and casts all work quite naturally and safely. - -It is nevertheless possible to get into trouble with other operations. -Follow these rules to keep your RCU code working properly: - -o You must use one of the rcu_dereference() family of primitives - to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU - will complain. Worse yet, your code can see random memory-corruption - bugs due to games that compilers and DEC Alpha can play. - Without one of the rcu_dereference() primitives, compilers - can reload the value, and won't your code have fun with two - different values for a single pointer! Without rcu_dereference(), - DEC Alpha can load a pointer, dereference that pointer, and - return data preceding initialization that preceded the store of - the pointer. - - In addition, the volatile cast in rcu_dereference() prevents the - compiler from deducing the resulting pointer value. Please see - the section entitled "EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH" - for an example where the compiler can in fact deduce the exact - value of the pointer, and thus cause misordering. - -o You are only permitted to use rcu_dereference on pointer values. - The compiler simply knows too much about integral values to - trust it to carry dependencies through integer operations. - There are a very few exceptions, namely that you can temporarily - cast the pointer to uintptr_t in order to: - - o Set bits and clear bits down in the must-be-zero low-order - bits of that pointer. This clearly means that the pointer - must have alignment constraints, for example, this does - -not- work in general for char* pointers. - - o XOR bits to translate pointers, as is done in some - classic buddy-allocator algorithms. - - It is important to cast the value back to pointer before - doing much of anything else with it. - -o Avoid cancellation when using the "+" and "-" infix arithmetic - operators. For example, for a given variable "x", avoid - "(x-(uintptr_t)x)" for char* pointers. The compiler is within its - rights to substitute zero for this sort of expression, so that - subsequent accesses no longer depend on the rcu_dereference(), - again possibly resulting in bugs due to misordering. - - Of course, if "p" is a pointer from rcu_dereference(), and "a" - and "b" are integers that happen to be equal, the expression - "p+a-b" is safe because its value still necessarily depends on - the rcu_dereference(), thus maintaining proper ordering. - -o If you are using RCU to protect JITed functions, so that the - "()" function-invocation operator is applied to a value obtained - (directly or indirectly) from rcu_dereference(), you may need to - interact directly with the hardware to flush instruction caches. - This issue arises on some systems when a newly JITed function is - using the same memory that was used by an earlier JITed function. - -o Do not use the results from relational operators ("==", "!=", - ">", ">=", "<", or "<=") when dereferencing. For example, - the following (quite strange) code is buggy: - - int *p; - int *q; - - ... - - p = rcu_dereference(gp) - q = &global_q; - q += p > &oom_p; - r1 = *q; /* BUGGY!!! */ - - As before, the reason this is buggy is that relational operators - are often compiled using branches. And as before, although - weak-memory machines such as ARM or PowerPC do order stores - after such branches, but can speculate loads, which can again - result in misordering bugs. - -o Be very careful about comparing pointers obtained from - rcu_dereference() against non-NULL values. As Linus Torvalds - explained, if the two pointers are equal, the compiler could - substitute the pointer you are comparing against for the pointer - obtained from rcu_dereference(). For example: - - p = rcu_dereference(gp); - if (p == &default_struct) - do_default(p->a); - - Because the compiler now knows that the value of "p" is exactly - the address of the variable "default_struct", it is free to - transform this code into the following: - - p = rcu_dereference(gp); - if (p == &default_struct) - do_default(default_struct.a); - - On ARM and Power hardware, the load from "default_struct.a" - can now be speculated, such that it might happen before the - rcu_dereference(). This could result in bugs due to misordering. - - However, comparisons are OK in the following cases: - - o The comparison was against the NULL pointer. If the - compiler knows that the pointer is NULL, you had better - not be dereferencing it anyway. If the comparison is - non-equal, the compiler is none the wiser. Therefore, - it is safe to compare pointers from rcu_dereference() - against NULL pointers. - - o The pointer is never dereferenced after being compared. - Since there are no subsequent dereferences, the compiler - cannot use anything it learned from the comparison - to reorder the non-existent subsequent dereferences. - This sort of comparison occurs frequently when scanning - RCU-protected circular linked lists. - - Note that if checks for being within an RCU read-side - critical section are not required and the pointer is never - dereferenced, rcu_access_pointer() should be used in place - of rcu_dereference(). - - o The comparison is against a pointer that references memory - that was initialized "a long time ago." The reason - this is safe is that even if misordering occurs, the - misordering will not affect the accesses that follow - the comparison. So exactly how long ago is "a long - time ago"? Here are some possibilities: - - o Compile time. - - o Boot time. - - o Module-init time for module code. - - o Prior to kthread creation for kthread code. - - o During some prior acquisition of the lock that - we now hold. - - o Before mod_timer() time for a timer handler. - - There are many other possibilities involving the Linux - kernel's wide array of primitives that cause code to - be invoked at a later time. - - o The pointer being compared against also came from - rcu_dereference(). In this case, both pointers depend - on one rcu_dereference() or another, so you get proper - ordering either way. - - That said, this situation can make certain RCU usage - bugs more likely to happen. Which can be a good thing, - at least if they happen during testing. An example - of such an RCU usage bug is shown in the section titled - "EXAMPLE OF AMPLIFIED RCU-USAGE BUG". - - o All of the accesses following the comparison are stores, - so that a control dependency preserves the needed ordering. - That said, it is easy to get control dependencies wrong. - Please see the "CONTROL DEPENDENCIES" section of - Documentation/memory-barriers.txt for more details. - - o The pointers are not equal -and- the compiler does - not have enough information to deduce the value of the - pointer. Note that the volatile cast in rcu_dereference() - will normally prevent the compiler from knowing too much. - - However, please note that if the compiler knows that the - pointer takes on only one of two values, a not-equal - comparison will provide exactly the information that the - compiler needs to deduce the value of the pointer. - -o Disable any value-speculation optimizations that your compiler - might provide, especially if you are making use of feedback-based - optimizations that take data collected from prior runs. Such - value-speculation optimizations reorder operations by design. - - There is one exception to this rule: Value-speculation - optimizations that leverage the branch-prediction hardware are - safe on strongly ordered systems (such as x86), but not on weakly - ordered systems (such as ARM or Power). Choose your compiler - command-line options wisely! - - -EXAMPLE OF AMPLIFIED RCU-USAGE BUG - -Because updaters can run concurrently with RCU readers, RCU readers can -see stale and/or inconsistent values. If RCU readers need fresh or -consistent values, which they sometimes do, they need to take proper -precautions. To see this, consider the following code fragment: - - struct foo { - int a; - int b; - int c; - }; - struct foo *gp1; - struct foo *gp2; - - void updater(void) - { - struct foo *p; - - p = kmalloc(...); - if (p == NULL) - deal_with_it(); - p->a = 42; /* Each field in its own cache line. */ - p->b = 43; - p->c = 44; - rcu_assign_pointer(gp1, p); - p->b = 143; - p->c = 144; - rcu_assign_pointer(gp2, p); - } - - void reader(void) - { - struct foo *p; - struct foo *q; - int r1, r2; - - p = rcu_dereference(gp2); - if (p == NULL) - return; - r1 = p->b; /* Guaranteed to get 143. */ - q = rcu_dereference(gp1); /* Guaranteed non-NULL. */ - if (p == q) { - /* The compiler decides that q->c is same as p->c. */ - r2 = p->c; /* Could get 44 on weakly order system. */ - } - do_something_with(r1, r2); - } - -You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible, -but you should not be. After all, the updater might have been invoked -a second time between the time reader() loaded into "r1" and the time -that it loaded into "r2". The fact that this same result can occur due -to some reordering from the compiler and CPUs is beside the point. - -But suppose that the reader needs a consistent view? - -Then one approach is to use locking, for example, as follows: - - struct foo { - int a; - int b; - int c; - spinlock_t lock; - }; - struct foo *gp1; - struct foo *gp2; - - void updater(void) - { - struct foo *p; - - p = kmalloc(...); - if (p == NULL) - deal_with_it(); - spin_lock(&p->lock); - p->a = 42; /* Each field in its own cache line. */ - p->b = 43; - p->c = 44; - spin_unlock(&p->lock); - rcu_assign_pointer(gp1, p); - spin_lock(&p->lock); - p->b = 143; - p->c = 144; - spin_unlock(&p->lock); - rcu_assign_pointer(gp2, p); - } - - void reader(void) - { - struct foo *p; - struct foo *q; - int r1, r2; - - p = rcu_dereference(gp2); - if (p == NULL) - return; - spin_lock(&p->lock); - r1 = p->b; /* Guaranteed to get 143. */ - q = rcu_dereference(gp1); /* Guaranteed non-NULL. */ - if (p == q) { - /* The compiler decides that q->c is same as p->c. */ - r2 = p->c; /* Locking guarantees r2 == 144. */ - } - spin_unlock(&p->lock); - do_something_with(r1, r2); - } - -As always, use the right tool for the job! - - -EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH - -If a pointer obtained from rcu_dereference() compares not-equal to some -other pointer, the compiler normally has no clue what the value of the -first pointer might be. This lack of knowledge prevents the compiler -from carrying out optimizations that otherwise might destroy the ordering -guarantees that RCU depends on. And the volatile cast in rcu_dereference() -should prevent the compiler from guessing the value. - -But without rcu_dereference(), the compiler knows more than you might -expect. Consider the following code fragment: - - struct foo { - int a; - int b; - }; - static struct foo variable1; - static struct foo variable2; - static struct foo *gp = &variable1; - - void updater(void) - { - initialize_foo(&variable2); - rcu_assign_pointer(gp, &variable2); - /* - * The above is the only store to gp in this translation unit, - * and the address of gp is not exported in any way. - */ - } - - int reader(void) - { - struct foo *p; - - p = gp; - barrier(); - if (p == &variable1) - return p->a; /* Must be variable1.a. */ - else - return p->b; /* Must be variable2.b. */ - } - -Because the compiler can see all stores to "gp", it knows that the only -possible values of "gp" are "variable1" on the one hand and "variable2" -on the other. The comparison in reader() therefore tells the compiler -the exact value of "p" even in the not-equals case. This allows the -compiler to make the return values independent of the load from "gp", -in turn destroying the ordering between this load and the loads of the -return values. This can result in "p->b" returning pre-initialization -garbage values. - -In short, rcu_dereference() is -not- optional when you are going to -dereference the resulting pointer. - - -WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE? - -First, please avoid using rcu_dereference_raw() and also please avoid -using rcu_dereference_check() and rcu_dereference_protected() with a -second argument with a constant value of 1 (or true, for that matter). -With that caution out of the way, here is some guidance for which -member of the rcu_dereference() to use in various situations: - -1. If the access needs to be within an RCU read-side critical - section, use rcu_dereference(). With the new consolidated - RCU flavors, an RCU read-side critical section is entered - using rcu_read_lock(), anything that disables bottom halves, - anything that disables interrupts, or anything that disables - preemption. - -2. If the access might be within an RCU read-side critical section - on the one hand, or protected by (say) my_lock on the other, - use rcu_dereference_check(), for example: - - p1 = rcu_dereference_check(p->rcu_protected_pointer, - lockdep_is_held(&my_lock)); - - -3. If the access might be within an RCU read-side critical section - on the one hand, or protected by either my_lock or your_lock on - the other, again use rcu_dereference_check(), for example: - - p1 = rcu_dereference_check(p->rcu_protected_pointer, - lockdep_is_held(&my_lock) || - lockdep_is_held(&your_lock)); - -4. If the access is on the update side, so that it is always protected - by my_lock, use rcu_dereference_protected(): - - p1 = rcu_dereference_protected(p->rcu_protected_pointer, - lockdep_is_held(&my_lock)); - - This can be extended to handle multiple locks as in #3 above, - and both can be extended to check other conditions as well. - -5. If the protection is supplied by the caller, and is thus unknown - to this code, that is the rare case when rcu_dereference_raw() - is appropriate. In addition, rcu_dereference_raw() might be - appropriate when the lockdep expression would be excessively - complex, except that a better approach in that case might be to - take a long hard look at your synchronization design. Still, - there are data-locking cases where any one of a very large number - of locks or reference counters suffices to protect the pointer, - so rcu_dereference_raw() does have its place. - - However, its place is probably quite a bit smaller than one - might expect given the number of uses in the current kernel. - Ditto for its synonym, rcu_dereference_check( ... , 1), and - its close relative, rcu_dereference_protected(... , 1). - - -SPARSE CHECKING OF RCU-PROTECTED POINTERS - -The sparse static-analysis tool checks for direct access to RCU-protected -pointers, which can result in "interesting" bugs due to compiler -optimizations involving invented loads and perhaps also load tearing. -For example, suppose someone mistakenly does something like this: - - p = q->rcu_protected_pointer; - do_something_with(p->a); - do_something_else_with(p->b); - -If register pressure is high, the compiler might optimize "p" out -of existence, transforming the code to something like this: - - do_something_with(q->rcu_protected_pointer->a); - do_something_else_with(q->rcu_protected_pointer->b); - -This could fatally disappoint your code if q->rcu_protected_pointer -changed in the meantime. Nor is this a theoretical problem: Exactly -this sort of bug cost Paul E. McKenney (and several of his innocent -colleagues) a three-day weekend back in the early 1990s. - -Load tearing could of course result in dereferencing a mashup of a pair -of pointers, which also might fatally disappoint your code. - -These problems could have been avoided simply by making the code instead -read as follows: - - p = rcu_dereference(q->rcu_protected_pointer); - do_something_with(p->a); - do_something_else_with(p->b); - -Unfortunately, these sorts of bugs can be extremely hard to spot during -review. This is where the sparse tool comes into play, along with the -"__rcu" marker. If you mark a pointer declaration, whether in a structure -or as a formal parameter, with "__rcu", which tells sparse to complain if -this pointer is accessed directly. It will also cause sparse to complain -if a pointer not marked with "__rcu" is accessed using rcu_dereference() -and friends. For example, ->rcu_protected_pointer might be declared as -follows: - - struct foo __rcu *rcu_protected_pointer; - -Use of "__rcu" is opt-in. If you choose not to use it, then you should -ignore the sparse warnings. --- a/Documentation/RCU/rcubarrier.rst +++ b/Documentation/RCU/rcubarrier.rst @@ -0,0 +1,353 @@ +.. _rcu_barrier: + +RCU and Unloadable Modules +========================== + +[Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/] + +RCU (read-copy update) is a synchronization mechanism that can be thought +of as a replacement for read-writer locking (among other things), but with +very low-overhead readers that are immune to deadlock, priority inversion, +and unbounded latency. RCU read-side critical sections are delimited +by rcu_read_lock() and rcu_read_unlock(), which, in non-CONFIG_PREEMPT +kernels, generate no code whatsoever. + +This means that RCU writers are unaware of the presence of concurrent +readers, so that RCU updates to shared data must be undertaken quite +carefully, leaving an old version of the data structure in place until all +pre-existing readers have finished. These old versions are needed because +such readers might hold a reference to them. RCU updates can therefore be +rather expensive, and RCU is thus best suited for read-mostly situations. + +How can an RCU writer possibly determine when all readers are finished, +given that readers might well leave absolutely no trace of their +presence? There is a synchronize_rcu() primitive that blocks until all +pre-existing readers have completed. An updater wishing to delete an +element p from a linked list might do the following, while holding an +appropriate lock, of course:: + + list_del_rcu(p); + synchronize_rcu(); + kfree(p); + +But the above code cannot be used in IRQ context -- the call_rcu() +primitive must be used instead. This primitive takes a pointer to an +rcu_head struct placed within the RCU-protected data structure and +another pointer to a function that may be invoked later to free that +structure. Code to delete an element p from the linked list from IRQ +context might then be as follows:: + + list_del_rcu(p); + call_rcu(&p->rcu, p_callback); + +Since call_rcu() never blocks, this code can safely be used from within +IRQ context. The function p_callback() might be defined as follows:: + + static void p_callback(struct rcu_head *rp) + { + struct pstruct *p = container_of(rp, struct pstruct, rcu); + + kfree(p); + } + + +Unloading Modules That Use call_rcu() +------------------------------------- + +But what if p_callback is defined in an unloadable module? + +If we unload the module while some RCU callbacks are pending, +the CPUs executing these callbacks are going to be severely +disappointed when they are later invoked, as fancifully depicted at +http://lwn.net/images/ns/kernel/rcu-drop.jpg. + +We could try placing a synchronize_rcu() in the module-exit code path, +but this is not sufficient. Although synchronize_rcu() does wait for a +grace period to elapse, it does not wait for the callbacks to complete. + +One might be tempted to try several back-to-back synchronize_rcu() +calls, but this is still not guaranteed to work. If there is a very +heavy RCU-callback load, then some of the callbacks might be deferred +in order to allow other processing to proceed. Such deferral is required +in realtime kernels in order to avoid excessive scheduling latencies. + + +rcu_barrier() +------------- + +We instead need the rcu_barrier() primitive. Rather than waiting for +a grace period to elapse, rcu_barrier() waits for all outstanding RCU +callbacks to complete. Please note that rcu_barrier() does **not** imply +synchronize_rcu(), in particular, if there are no RCU callbacks queued +anywhere, rcu_barrier() is within its rights to return immediately, +without waiting for a grace period to elapse. + +Pseudo-code using rcu_barrier() is as follows: + + 1. Prevent any new RCU callbacks from being posted. + 2. Execute rcu_barrier(). + 3. Allow the module to be unloaded. + +There is also an srcu_barrier() function for SRCU, and you of course +must match the flavor of rcu_barrier() with that of call_rcu(). If your +module uses multiple flavors of call_rcu(), then it must also use multiple +flavors of rcu_barrier() when unloading that module. For example, if +it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on +srcu_struct_2, then the following three lines of code will be required +when unloading:: + + 1 rcu_barrier(); + 2 srcu_barrier(&srcu_struct_1); + 3 srcu_barrier(&srcu_struct_2); + +The rcutorture module makes use of rcu_barrier() in its exit function +as follows:: + + 1 static void + 2 rcu_torture_cleanup(void) + 3 { + 4 int i; + 5 + 6 fullstop = 1; + 7 if (shuffler_task != NULL) { + 8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task"); + 9 kthread_stop(shuffler_task); + 10 } + 11 shuffler_task = NULL; + 12 + 13 if (writer_task != NULL) { + 14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task"); + 15 kthread_stop(writer_task); + 16 } + 17 writer_task = NULL; + 18 + 19 if (reader_tasks != NULL) { + 20 for (i = 0; i < nrealreaders; i++) { + 21 if (reader_tasks[i] != NULL) { + 22 VERBOSE_PRINTK_STRING( + 23 "Stopping rcu_torture_reader task"); + 24 kthread_stop(reader_tasks[i]); + 25 } + 26 reader_tasks[i] = NULL; + 27 } + 28 kfree(reader_tasks); + 29 reader_tasks = NULL; + 30 } + 31 rcu_torture_current = NULL; + 32 + 33 if (fakewriter_tasks != NULL) { + 34 for (i = 0; i < nfakewriters; i++) { + 35 if (fakewriter_tasks[i] != NULL) { + 36 VERBOSE_PRINTK_STRING( + 37 "Stopping rcu_torture_fakewriter task"); + 38 kthread_stop(fakewriter_tasks[i]); + 39 } + 40 fakewriter_tasks[i] = NULL; + 41 } + 42 kfree(fakewriter_tasks); + 43 fakewriter_tasks = NULL; + 44 } + 45 + 46 if (stats_task != NULL) { + 47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task"); + 48 kthread_stop(stats_task); + 49 } + 50 stats_task = NULL; + 51 + 52 /* Wait for all RCU callbacks to fire. */ + 53 rcu_barrier(); + 54 + 55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */ + 56 + 57 if (cur_ops->cleanup != NULL) + 58 cur_ops->cleanup(); + 59 if (atomic_read(&n_rcu_torture_error)) + 60 rcu_torture_print_module_parms("End of test: FAILURE"); + 61 else + 62 rcu_torture_print_module_parms("End of test: SUCCESS"); + 63 } + +Line 6 sets a global variable that prevents any RCU callbacks from +re-posting themselves. This will not be necessary in most cases, since +RCU callbacks rarely include calls to call_rcu(). However, the rcutorture +module is an exception to this rule, and therefore needs to set this +global variable. + +Lines 7-50 stop all the kernel tasks associated with the rcutorture +module. Therefore, once execution reaches line 53, no more rcutorture +RCU callbacks will be posted. The rcu_barrier() call on line 53 waits +for any pre-existing callbacks to complete. + +Then lines 55-62 print status and do operation-specific cleanup, and +then return, permitting the module-unload operation to be completed. + +.. _rcubarrier_quiz_1: + +Quick Quiz #1: + Is there any other situation where rcu_barrier() might + be required? + +:ref:`Answer to Quick Quiz #1 ` + +Your module might have additional complications. For example, if your +module invokes call_rcu() from timers, you will need to first cancel all +the timers, and only then invoke rcu_barrier() to wait for any remaining +RCU callbacks to complete. + +Of course, if you module uses call_rcu(), you will need to invoke +rcu_barrier() before unloading. Similarly, if your module uses +call_srcu(), you will need to invoke srcu_barrier() before unloading, +and on the same srcu_struct structure. If your module uses call_rcu() +**and** call_srcu(), then you will need to invoke rcu_barrier() **and** +srcu_barrier(). + + +Implementing rcu_barrier() +-------------------------- + +Dipankar Sarma's implementation of rcu_barrier() makes use of the fact +that RCU callbacks are never reordered once queued on one of the per-CPU +queues. His implementation queues an RCU callback on each of the per-CPU +callback queues, and then waits until they have all started executing, at +which point, all earlier RCU callbacks are guaranteed to have completed. + +The original code for rcu_barrier() was as follows:: + + 1 void rcu_barrier(void) + 2 { + 3 BUG_ON(in_interrupt()); + 4 /* Take cpucontrol mutex to protect against CPU hotplug */ + 5 mutex_lock(&rcu_barrier_mutex); + 6 init_completion(&rcu_barrier_completion); + 7 atomic_set(&rcu_barrier_cpu_count, 0); + 8 on_each_cpu(rcu_barrier_func, NULL, 0, 1); + 9 wait_for_completion(&rcu_barrier_completion); + 10 mutex_unlock(&rcu_barrier_mutex); + 11 } + +Line 3 verifies that the caller is in process context, and lines 5 and 10 +use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the +global completion and counters at a time, which are initialized on lines +6 and 7. Line 8 causes each CPU to invoke rcu_barrier_func(), which is +shown below. Note that the final "1" in on_each_cpu()'s argument list +ensures that all the calls to rcu_barrier_func() will have completed +before on_each_cpu() returns. Line 9 then waits for the completion. + +This code was rewritten in 2008 and several times thereafter, but this +still gives the general idea. + +The rcu_barrier_func() runs on each CPU, where it invokes call_rcu() +to post an RCU callback, as follows:: + + 1 static void rcu_barrier_func(void *notused) + 2 { + 3 int cpu = smp_processor_id(); + 4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu); + 5 struct rcu_head *head; + 6 + 7 head = &rdp->barrier; + 8 atomic_inc(&rcu_barrier_cpu_count); + 9 call_rcu(head, rcu_barrier_callback); + 10 } + +Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure, +which contains the struct rcu_head that needed for the later call to +call_rcu(). Line 7 picks up a pointer to this struct rcu_head, and line +8 increments a global counter. This counter will later be decremented +by the callback. Line 9 then registers the rcu_barrier_callback() on +the current CPU's queue. + +The rcu_barrier_callback() function simply atomically decrements the +rcu_barrier_cpu_count variable and finalizes the completion when it +reaches zero, as follows:: + + 1 static void rcu_barrier_callback(struct rcu_head *notused) + 2 { + 3 if (atomic_dec_and_test(&rcu_barrier_cpu_count)) + 4 complete(&rcu_barrier_completion); + 5 } + +.. _rcubarrier_quiz_2: + +Quick Quiz #2: + What happens if CPU 0's rcu_barrier_func() executes + immediately (thus incrementing rcu_barrier_cpu_count to the + value one), but the other CPU's rcu_barrier_func() invocations + are delayed for a full grace period? Couldn't this result in + rcu_barrier() returning prematurely? + +:ref:`Answer to Quick Quiz #2 ` + +The current rcu_barrier() implementation is more complex, due to the need +to avoid disturbing idle CPUs (especially on battery-powered systems) +and the need to minimally disturb non-idle CPUs in real-time systems. +However, the code above illustrates the concepts. + + +rcu_barrier() Summary +--------------------- + +The rcu_barrier() primitive has seen relatively little use, since most +code using RCU is in the core kernel rather than in modules. However, if +you are using RCU from an unloadable module, you need to use rcu_barrier() +so that your module may be safely unloaded. + + +Answers to Quick Quizzes +------------------------ + +.. _answer_rcubarrier_quiz_1: + +Quick Quiz #1: + Is there any other situation where rcu_barrier() might + be required? + +Answer: Interestingly enough, rcu_barrier() was not originally + implemented for module unloading. Nikita Danilov was using + RCU in a filesystem, which resulted in a similar situation at + filesystem-unmount time. Dipankar Sarma coded up rcu_barrier() + in response, so that Nikita could invoke it during the + filesystem-unmount process. + + Much later, yours truly hit the RCU module-unload problem when + implementing rcutorture, and found that rcu_barrier() solves + this problem as well. + +:ref:`Back to Quick Quiz #1 ` + +.. _answer_rcubarrier_quiz_2: + +Quick Quiz #2: + What happens if CPU 0's rcu_barrier_func() executes + immediately (thus incrementing rcu_barrier_cpu_count to the + value one), but the other CPU's rcu_barrier_func() invocations + are delayed for a full grace period? Couldn't this result in + rcu_barrier() returning prematurely? + +Answer: This cannot happen. The reason is that on_each_cpu() has its last + argument, the wait flag, set to "1". This flag is passed through + to smp_call_function() and further to smp_call_function_on_cpu(), + causing this latter to spin until the cross-CPU invocation of + rcu_barrier_func() has completed. This by itself would prevent + a grace period from completing on non-CONFIG_PREEMPT kernels, + since each CPU must undergo a context switch (or other quiescent + state) before the grace period can complete. However, this is + of no use in CONFIG_PREEMPT kernels. + + Therefore, on_each_cpu() disables preemption across its call + to smp_call_function() and also across the local call to + rcu_barrier_func(). This prevents the local CPU from context + switching, again preventing grace periods from completing. This + means that all CPUs have executed rcu_barrier_func() before + the first rcu_barrier_callback() can possibly execute, in turn + preventing rcu_barrier_cpu_count from prematurely reaching zero. + + Currently, -rt implementations of RCU keep but a single global + queue for RCU callbacks, and thus do not suffer from this + problem. However, when the -rt RCU eventually does have per-CPU + callback queues, things will have to change. One simple change + is to add an rcu_read_lock() before line 8 of rcu_barrier() + and an rcu_read_unlock() after line 8 of this same function. If + you can think of a better change, please let me know! + +:ref:`Back to Quick Quiz #2 ` --- a/Documentation/RCU/rcubarrier.txt +++ b/Documentation/RCU/rcubarrier.txt @@ -1,325 +0,0 @@ -RCU and Unloadable Modules - -[Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/] - -RCU (read-copy update) is a synchronization mechanism that can be thought -of as a replacement for read-writer locking (among other things), but with -very low-overhead readers that are immune to deadlock, priority inversion, -and unbounded latency. RCU read-side critical sections are delimited -by rcu_read_lock() and rcu_read_unlock(), which, in non-CONFIG_PREEMPT -kernels, generate no code whatsoever. - -This means that RCU writers are unaware of the presence of concurrent -readers, so that RCU updates to shared data must be undertaken quite -carefully, leaving an old version of the data structure in place until all -pre-existing readers have finished. These old versions are needed because -such readers might hold a reference to them. RCU updates can therefore be -rather expensive, and RCU is thus best suited for read-mostly situations. - -How can an RCU writer possibly determine when all readers are finished, -given that readers might well leave absolutely no trace of their -presence? There is a synchronize_rcu() primitive that blocks until all -pre-existing readers have completed. An updater wishing to delete an -element p from a linked list might do the following, while holding an -appropriate lock, of course: - - list_del_rcu(p); - synchronize_rcu(); - kfree(p); - -But the above code cannot be used in IRQ context -- the call_rcu() -primitive must be used instead. This primitive takes a pointer to an -rcu_head struct placed within the RCU-protected data structure and -another pointer to a function that may be invoked later to free that -structure. Code to delete an element p from the linked list from IRQ -context might then be as follows: - - list_del_rcu(p); - call_rcu(&p->rcu, p_callback); - -Since call_rcu() never blocks, this code can safely be used from within -IRQ context. The function p_callback() might be defined as follows: - - static void p_callback(struct rcu_head *rp) - { - struct pstruct *p = container_of(rp, struct pstruct, rcu); - - kfree(p); - } - - -Unloading Modules That Use call_rcu() - -But what if p_callback is defined in an unloadable module? - -If we unload the module while some RCU callbacks are pending, -the CPUs executing these callbacks are going to be severely -disappointed when they are later invoked, as fancifully depicted at -http://lwn.net/images/ns/kernel/rcu-drop.jpg. - -We could try placing a synchronize_rcu() in the module-exit code path, -but this is not sufficient. Although synchronize_rcu() does wait for a -grace period to elapse, it does not wait for the callbacks to complete. - -One might be tempted to try several back-to-back synchronize_rcu() -calls, but this is still not guaranteed to work. If there is a very -heavy RCU-callback load, then some of the callbacks might be deferred -in order to allow other processing to proceed. Such deferral is required -in realtime kernels in order to avoid excessive scheduling latencies. - - -rcu_barrier() - -We instead need the rcu_barrier() primitive. Rather than waiting for -a grace period to elapse, rcu_barrier() waits for all outstanding RCU -callbacks to complete. Please note that rcu_barrier() does -not- imply -synchronize_rcu(), in particular, if there are no RCU callbacks queued -anywhere, rcu_barrier() is within its rights to return immediately, -without waiting for a grace period to elapse. - -Pseudo-code using rcu_barrier() is as follows: - - 1. Prevent any new RCU callbacks from being posted. - 2. Execute rcu_barrier(). - 3. Allow the module to be unloaded. - -There is also an srcu_barrier() function for SRCU, and you of course -must match the flavor of rcu_barrier() with that of call_rcu(). If your -module uses multiple flavors of call_rcu(), then it must also use multiple -flavors of rcu_barrier() when unloading that module. For example, if -it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on -srcu_struct_2(), then the following three lines of code will be required -when unloading: - - 1 rcu_barrier(); - 2 srcu_barrier(&srcu_struct_1); - 3 srcu_barrier(&srcu_struct_2); - -The rcutorture module makes use of rcu_barrier() in its exit function -as follows: - - 1 static void - 2 rcu_torture_cleanup(void) - 3 { - 4 int i; - 5 - 6 fullstop = 1; - 7 if (shuffler_task != NULL) { - 8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task"); - 9 kthread_stop(shuffler_task); -10 } -11 shuffler_task = NULL; -12 -13 if (writer_task != NULL) { -14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task"); -15 kthread_stop(writer_task); -16 } -17 writer_task = NULL; -18 -19 if (reader_tasks != NULL) { -20 for (i = 0; i < nrealreaders; i++) { -21 if (reader_tasks[i] != NULL) { -22 VERBOSE_PRINTK_STRING( -23 "Stopping rcu_torture_reader task"); -24 kthread_stop(reader_tasks[i]); -25 } -26 reader_tasks[i] = NULL; -27 } -28 kfree(reader_tasks); -29 reader_tasks = NULL; -30 } -31 rcu_torture_current = NULL; -32 -33 if (fakewriter_tasks != NULL) { -34 for (i = 0; i < nfakewriters; i++) { -35 if (fakewriter_tasks[i] != NULL) { -36 VERBOSE_PRINTK_STRING( -37 "Stopping rcu_torture_fakewriter task"); -38 kthread_stop(fakewriter_tasks[i]); -39 } -40 fakewriter_tasks[i] = NULL; -41 } -42 kfree(fakewriter_tasks); -43 fakewriter_tasks = NULL; -44 } -45 -46 if (stats_task != NULL) { -47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task"); -48 kthread_stop(stats_task); -49 } -50 stats_task = NULL; -51 -52 /* Wait for all RCU callbacks to fire. */ -53 rcu_barrier(); -54 -55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */ -56 -57 if (cur_ops->cleanup != NULL) -58 cur_ops->cleanup(); -59 if (atomic_read(&n_rcu_torture_error)) -60 rcu_torture_print_module_parms("End of test: FAILURE"); -61 else -62 rcu_torture_print_module_parms("End of test: SUCCESS"); -63 } - -Line 6 sets a global variable that prevents any RCU callbacks from -re-posting themselves. This will not be necessary in most cases, since -RCU callbacks rarely include calls to call_rcu(). However, the rcutorture -module is an exception to this rule, and therefore needs to set this -global variable. - -Lines 7-50 stop all the kernel tasks associated with the rcutorture -module. Therefore, once execution reaches line 53, no more rcutorture -RCU callbacks will be posted. The rcu_barrier() call on line 53 waits -for any pre-existing callbacks to complete. - -Then lines 55-62 print status and do operation-specific cleanup, and -then return, permitting the module-unload operation to be completed. - -Quick Quiz #1: Is there any other situation where rcu_barrier() might - be required? - -Your module might have additional complications. For example, if your -module invokes call_rcu() from timers, you will need to first cancel all -the timers, and only then invoke rcu_barrier() to wait for any remaining -RCU callbacks to complete. - -Of course, if you module uses call_rcu(), you will need to invoke -rcu_barrier() before unloading. Similarly, if your module uses -call_srcu(), you will need to invoke srcu_barrier() before unloading, -and on the same srcu_struct structure. If your module uses call_rcu() --and- call_srcu(), then you will need to invoke rcu_barrier() -and- -srcu_barrier(). - - -Implementing rcu_barrier() - -Dipankar Sarma's implementation of rcu_barrier() makes use of the fact -that RCU callbacks are never reordered once queued on one of the per-CPU -queues. His implementation queues an RCU callback on each of the per-CPU -callback queues, and then waits until they have all started executing, at -which point, all earlier RCU callbacks are guaranteed to have completed. - -The original code for rcu_barrier() was as follows: - - 1 void rcu_barrier(void) - 2 { - 3 BUG_ON(in_interrupt()); - 4 /* Take cpucontrol mutex to protect against CPU hotplug */ - 5 mutex_lock(&rcu_barrier_mutex); - 6 init_completion(&rcu_barrier_completion); - 7 atomic_set(&rcu_barrier_cpu_count, 0); - 8 on_each_cpu(rcu_barrier_func, NULL, 0, 1); - 9 wait_for_completion(&rcu_barrier_completion); -10 mutex_unlock(&rcu_barrier_mutex); -11 } - -Line 3 verifies that the caller is in process context, and lines 5 and 10 -use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the -global completion and counters at a time, which are initialized on lines -6 and 7. Line 8 causes each CPU to invoke rcu_barrier_func(), which is -shown below. Note that the final "1" in on_each_cpu()'s argument list -ensures that all the calls to rcu_barrier_func() will have completed -before on_each_cpu() returns. Line 9 then waits for the completion. - -This code was rewritten in 2008 and several times thereafter, but this -still gives the general idea. - -The rcu_barrier_func() runs on each CPU, where it invokes call_rcu() -to post an RCU callback, as follows: - - 1 static void rcu_barrier_func(void *notused) - 2 { - 3 int cpu = smp_processor_id(); - 4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu); - 5 struct rcu_head *head; - 6 - 7 head = &rdp->barrier; - 8 atomic_inc(&rcu_barrier_cpu_count); - 9 call_rcu(head, rcu_barrier_callback); -10 } - -Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure, -which contains the struct rcu_head that needed for the later call to -call_rcu(). Line 7 picks up a pointer to this struct rcu_head, and line -8 increments a global counter. This counter will later be decremented -by the callback. Line 9 then registers the rcu_barrier_callback() on -the current CPU's queue. - -The rcu_barrier_callback() function simply atomically decrements the -rcu_barrier_cpu_count variable and finalizes the completion when it -reaches zero, as follows: - - 1 static void rcu_barrier_callback(struct rcu_head *notused) - 2 { - 3 if (atomic_dec_and_test(&rcu_barrier_cpu_count)) - 4 complete(&rcu_barrier_completion); - 5 } - -Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes - immediately (thus incrementing rcu_barrier_cpu_count to the - value one), but the other CPU's rcu_barrier_func() invocations - are delayed for a full grace period? Couldn't this result in - rcu_barrier() returning prematurely? - -The current rcu_barrier() implementation is more complex, due to the need -to avoid disturbing idle CPUs (especially on battery-powered systems) -and the need to minimally disturb non-idle CPUs in real-time systems. -However, the code above illustrates the concepts. - - -rcu_barrier() Summary - -The rcu_barrier() primitive has seen relatively little use, since most -code using RCU is in the core kernel rather than in modules. However, if -you are using RCU from an unloadable module, you need to use rcu_barrier() -so that your module may be safely unloaded. - - -Answers to Quick Quizzes - -Quick Quiz #1: Is there any other situation where rcu_barrier() might - be required? - -Answer: Interestingly enough, rcu_barrier() was not originally - implemented for module unloading. Nikita Danilov was using - RCU in a filesystem, which resulted in a similar situation at - filesystem-unmount time. Dipankar Sarma coded up rcu_barrier() - in response, so that Nikita could invoke it during the - filesystem-unmount process. - - Much later, yours truly hit the RCU module-unload problem when - implementing rcutorture, and found that rcu_barrier() solves - this problem as well. - -Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes - immediately (thus incrementing rcu_barrier_cpu_count to the - value one), but the other CPU's rcu_barrier_func() invocations - are delayed for a full grace period? Couldn't this result in - rcu_barrier() returning prematurely? - -Answer: This cannot happen. The reason is that on_each_cpu() has its last - argument, the wait flag, set to "1". This flag is passed through - to smp_call_function() and further to smp_call_function_on_cpu(), - causing this latter to spin until the cross-CPU invocation of - rcu_barrier_func() has completed. This by itself would prevent - a grace period from completing on non-CONFIG_PREEMPT kernels, - since each CPU must undergo a context switch (or other quiescent - state) before the grace period can complete. However, this is - of no use in CONFIG_PREEMPT kernels. - - Therefore, on_each_cpu() disables preemption across its call - to smp_call_function() and also across the local call to - rcu_barrier_func(). This prevents the local CPU from context - switching, again preventing grace periods from completing. This - means that all CPUs have executed rcu_barrier_func() before - the first rcu_barrier_callback() can possibly execute, in turn - preventing rcu_barrier_cpu_count from prematurely reaching zero. - - Currently, -rt implementations of RCU keep but a single global - queue for RCU callbacks, and thus do not suffer from this - problem. However, when the -rt RCU eventually does have per-CPU - callback queues, things will have to change. One simple change - is to add an rcu_read_lock() before line 8 of rcu_barrier() - and an rcu_read_unlock() after line 8 of this same function. If - you can think of a better change, please let me know! --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -225,18 +225,13 @@ In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed for each CPU: - 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D + 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1 The "last_accelerate:" prints the low-order 16 bits (in hex) of the jiffies counter when this CPU last invoked rcu_try_advance_all_cbs() from rcu_needs_cpu() or last invoked rcu_accelerate_cbs() from -rcu_prepare_for_idle(). The "Nonlazy posted:" indicates lazy-callback -status, so that an "l" indicates that all callbacks were lazy at the start -of the last idle period and an "L" indicates that there are currently -no non-lazy callbacks (in both cases, "." is printed otherwise, as -shown above) and "D" indicates that dyntick-idle processing is enabled -("." is printed otherwise, for example, if disabled via the "nohz=" -kernel boot parameter). +rcu_prepare_for_idle(). "dyntick_enabled: 1" indicates that dyntick-idle +processing is enabled. If the grace period ends just as the stall warning starts printing, there will be a spurious stall-warning message, which will include --- a/Documentation/RCU/whatisRCU.rst +++ b/Documentation/RCU/whatisRCU.rst @@ -0,0 +1,1154 @@ +.. _whatisrcu_doc: + +What is RCU? -- "Read, Copy, Update" +====================================== + +Please note that the "What is RCU?" LWN series is an excellent place +to start learning about RCU: + +| 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ +| 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ +| 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ +| 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ +| 2010 Big API Table http://lwn.net/Articles/419086/ +| 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/ +| 2014 Big API Table http://lwn.net/Articles/609973/ + + +What is RCU? + +RCU is a synchronization mechanism that was added to the Linux kernel +during the 2.5 development effort that is optimized for read-mostly +situations. Although RCU is actually quite simple once you understand it, +getting there can sometimes be a challenge. Part of the problem is that +most of the past descriptions of RCU have been written with the mistaken +assumption that there is "one true way" to describe RCU. Instead, +the experience has been that different people must take different paths +to arrive at an understanding of RCU. This document provides several +different paths, as follows: + +:ref:`1. RCU OVERVIEW <1_whatisRCU>` + +:ref:`2. WHAT IS RCU'S CORE API? <2_whatisRCU>` + +:ref:`3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>` + +:ref:`4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>` + +:ref:`5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>` + +:ref:`6. ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>` + +:ref:`7. FULL LIST OF RCU APIs <7_whatisRCU>` + +:ref:`8. ANSWERS TO QUICK QUIZZES <8_whatisRCU>` + +People who prefer starting with a conceptual overview should focus on +Section 1, though most readers will profit by reading this section at +some point. People who prefer to start with an API that they can then +experiment with should focus on Section 2. People who prefer to start +with example uses should focus on Sections 3 and 4. People who need to +understand the RCU implementation should focus on Section 5, then dive +into the kernel source code. People who reason best by analogy should +focus on Section 6. Section 7 serves as an index to the docbook API +documentation, and Section 8 is the traditional answer key. + +So, start with the section that makes the most sense to you and your +preferred method of learning. If you need to know everything about +everything, feel free to read the whole thing -- but if you are really +that type of person, you have perused the source code and will therefore +never need this document anyway. ;-) + +.. _1_whatisRCU: + +1. RCU OVERVIEW +---------------- + +The basic idea behind RCU is to split updates into "removal" and +"reclamation" phases. The removal phase removes references to data items +within a data structure (possibly by replacing them with references to +new versions of these data items), and can run concurrently with readers. +The reason that it is safe to run the removal phase concurrently with +readers is the semantics of modern CPUs guarantee that readers will see +either the old or the new version of the data structure rather than a +partially updated reference. The reclamation phase does the work of reclaiming +(e.g., freeing) the data items removed from the data structure during the +removal phase. Because reclaiming data items can disrupt any readers +concurrently referencing those data items, the reclamation phase must +not start until readers no longer hold references to those data items. + +Splitting the update into removal and reclamation phases permits the +updater to perform the removal phase immediately, and to defer the +reclamation phase until all readers active during the removal phase have +completed, either by blocking until they finish or by registering a +callback that is invoked after they finish. Only readers that are active +during the removal phase need be considered, because any reader starting +after the removal phase will be unable to gain a reference to the removed +data items, and therefore cannot be disrupted by the reclamation phase. + +So the typical RCU update sequence goes something like the following: + +a. Remove pointers to a data structure, so that subsequent + readers cannot gain a reference to it. + +b. Wait for all previous readers to complete their RCU read-side + critical sections. + +c. At this point, there cannot be any readers who hold references + to the data structure, so it now may safely be reclaimed + (e.g., kfree()d). + +Step (b) above is the key idea underlying RCU's deferred destruction. +The ability to wait until all readers are done allows RCU readers to +use much lighter-weight synchronization, in some cases, absolutely no +synchronization at all. In contrast, in more conventional lock-based +schemes, readers must use heavy-weight synchronization in order to +prevent an updater from deleting the data structure out from under them. +This is because lock-based updaters typically update data items in place, +and must therefore exclude readers. In contrast, RCU-based updaters +typically take advantage of the fact that writes to single aligned +pointers are atomic on modern CPUs, allowing atomic insertion, removal, +and replacement of data items in a linked structure without disrupting +readers. Concurrent RCU readers can then continue accessing the old +versions, and can dispense with the atomic operations, memory barriers, +and communications cache misses that are so expensive on present-day +SMP computer systems, even in absence of lock contention. + +In the three-step procedure shown above, the updater is performing both +the removal and the reclamation step, but it is often helpful for an +entirely different thread to do the reclamation, as is in fact the case +in the Linux kernel's directory-entry cache (dcache). Even if the same +thread performs both the update step (step (a) above) and the reclamation +step (step (c) above), it is often helpful to think of them separately. +For example, RCU readers and updaters need not communicate at all, +but RCU provides implicit low-overhead communication between readers +and reclaimers, namely, in step (b) above. + +So how the heck can a reclaimer tell when a reader is done, given +that readers are not doing any sort of synchronization operations??? +Read on to learn about how RCU's API makes this easy. + +.. _2_whatisRCU: + +2. WHAT IS RCU'S CORE API? +--------------------------- + +The core RCU API is quite small: + +a. rcu_read_lock() +b. rcu_read_unlock() +c. synchronize_rcu() / call_rcu() +d. rcu_assign_pointer() +e. rcu_dereference() + +There are many other members of the RCU API, but the rest can be +expressed in terms of these five, though most implementations instead +express synchronize_rcu() in terms of the call_rcu() callback API. + +The five core RCU APIs are described below, the other 18 will be enumerated +later. See the kernel docbook documentation for more info, or look directly +at the function header comments. + +rcu_read_lock() +^^^^^^^^^^^^^^^ + void rcu_read_lock(void); + + Used by a reader to inform the reclaimer that the reader is + entering an RCU read-side critical section. It is illegal + to block while in an RCU read-side critical section, though + kernels built with CONFIG_PREEMPT_RCU can preempt RCU + read-side critical sections. Any RCU-protected data structure + accessed during an RCU read-side critical section is guaranteed to + remain unreclaimed for the full duration of that critical section. + Reference counts may be used in conjunction with RCU to maintain + longer-term references to data structures. + +rcu_read_unlock() +^^^^^^^^^^^^^^^^^ + void rcu_read_unlock(void); + + Used by a reader to inform the reclaimer that the reader is + exiting an RCU read-side critical section. Note that RCU + read-side critical sections may be nested and/or overlapping. + +synchronize_rcu() +^^^^^^^^^^^^^^^^^ + void synchronize_rcu(void); + + Marks the end of updater code and the beginning of reclaimer + code. It does this by blocking until all pre-existing RCU + read-side critical sections on all CPUs have completed. + Note that synchronize_rcu() will **not** necessarily wait for + any subsequent RCU read-side critical sections to complete. + For example, consider the following sequence of events:: + + CPU 0 CPU 1 CPU 2 + ----------------- ------------------------- --------------- + 1. rcu_read_lock() + 2. enters synchronize_rcu() + 3. rcu_read_lock() + 4. rcu_read_unlock() + 5. exits synchronize_rcu() + 6. rcu_read_unlock() + + To reiterate, synchronize_rcu() waits only for ongoing RCU + read-side critical sections to complete, not necessarily for + any that begin after synchronize_rcu() is invoked. + + Of course, synchronize_rcu() does not necessarily return + **immediately** after the last pre-existing RCU read-side critical + section completes. For one thing, there might well be scheduling + delays. For another thing, many RCU implementations process + requests in batches in order to improve efficiencies, which can + further delay synchronize_rcu(). + + Since synchronize_rcu() is the API that must figure out when + readers are done, its implementation is key to RCU. For RCU + to be useful in all but the most read-intensive situations, + synchronize_rcu()'s overhead must also be quite small. + + The call_rcu() API is a callback form of synchronize_rcu(), + and is described in more detail in a later section. Instead of + blocking, it registers a function and argument which are invoked + after all ongoing RCU read-side critical sections have completed. + This callback variant is particularly useful in situations where + it is illegal to block or where update-side performance is + critically important. + + However, the call_rcu() API should not be used lightly, as use + of the synchronize_rcu() API generally results in simpler code. + In addition, the synchronize_rcu() API has the nice property + of automatically limiting update rate should grace periods + be delayed. This property results in system resilience in face + of denial-of-service attacks. Code using call_rcu() should limit + update rate in order to gain this same sort of resilience. See + checklist.txt for some approaches to limiting the update rate. + +rcu_assign_pointer() +^^^^^^^^^^^^^^^^^^^^ + void rcu_assign_pointer(p, typeof(p) v); + + Yes, rcu_assign_pointer() **is** implemented as a macro, though it + would be cool to be able to declare a function in this manner. + (Compiler experts will no doubt disagree.) + + The updater uses this function to assign a new value to an + RCU-protected pointer, in order to safely communicate the change + in value from the updater to the reader. This macro does not + evaluate to an rvalue, but it does execute any memory-barrier + instructions required for a given CPU architecture. + + Perhaps just as important, it serves to document (1) which + pointers are protected by RCU and (2) the point at which a + given structure becomes accessible to other CPUs. That said, + rcu_assign_pointer() is most frequently used indirectly, via + the _rcu list-manipulation primitives such as list_add_rcu(). + +rcu_dereference() +^^^^^^^^^^^^^^^^^ + typeof(p) rcu_dereference(p); + + Like rcu_assign_pointer(), rcu_dereference() must be implemented + as a macro. + + The reader uses rcu_dereference() to fetch an RCU-protected + pointer, which returns a value that may then be safely + dereferenced. Note that rcu_dereference() does not actually + dereference the pointer, instead, it protects the pointer for + later dereferencing. It also executes any needed memory-barrier + instructions for a given CPU architecture. Currently, only Alpha + needs memory barriers within rcu_dereference() -- on other CPUs, + it compiles to nothing, not even a compiler directive. + + Common coding practice uses rcu_dereference() to copy an + RCU-protected pointer to a local variable, then dereferences + this local variable, for example as follows:: + + p = rcu_dereference(head.next); + return p->data; + + However, in this case, one could just as easily combine these + into one statement:: + + return rcu_dereference(head.next)->data; + + If you are going to be fetching multiple fields from the + RCU-protected structure, using the local variable is of + course preferred. Repeated rcu_dereference() calls look + ugly, do not guarantee that the same pointer will be returned + if an update happened while in the critical section, and incur + unnecessary overhead on Alpha CPUs. + + Note that the value returned by rcu_dereference() is valid + only within the enclosing RCU read-side critical section [1]_. + For example, the following is **not** legal:: + + rcu_read_lock(); + p = rcu_dereference(head.next); + rcu_read_unlock(); + x = p->address; /* BUG!!! */ + rcu_read_lock(); + y = p->data; /* BUG!!! */ + rcu_read_unlock(); + + Holding a reference from one RCU read-side critical section + to another is just as illegal as holding a reference from + one lock-based critical section to another! Similarly, + using a reference outside of the critical section in which + it was acquired is just as illegal as doing so with normal + locking. + + As with rcu_assign_pointer(), an important function of + rcu_dereference() is to document which pointers are protected by + RCU, in particular, flagging a pointer that is subject to changing + at any time, including immediately after the rcu_dereference(). + And, again like rcu_assign_pointer(), rcu_dereference() is + typically used indirectly, via the _rcu list-manipulation + primitives, such as list_for_each_entry_rcu() [2]_. + +.. [1] The variant rcu_dereference_protected() can be used outside + of an RCU read-side critical section as long as the usage is + protected by locks acquired by the update-side code. This variant + avoids the lockdep warning that would happen when using (for + example) rcu_dereference() without rcu_read_lock() protection. + Using rcu_dereference_protected() also has the advantage + of permitting compiler optimizations that rcu_dereference() + must prohibit. The rcu_dereference_protected() variant takes + a lockdep expression to indicate which locks must be acquired + by the caller. If the indicated protection is not provided, + a lockdep splat is emitted. See Documentation/RCU/Design/Requirements/Requirements.rst + and the API's code comments for more details and example usage. + +.. [2] If the list_for_each_entry_rcu() instance might be used by + update-side code as well as by RCU readers, then an additional + lockdep expression can be added to its list of arguments. + For example, given an additional "lock_is_held(&mylock)" argument, + the RCU lockdep code would complain only if this instance was + invoked outside of an RCU read-side critical section and without + the protection of mylock. + +The following diagram shows how each API communicates among the +reader, updater, and reclaimer. +:: + + + rcu_assign_pointer() + +--------+ + +---------------------->| reader |---------+ + | +--------+ | + | | | + | | | Protect: + | | | rcu_read_lock() + | | | rcu_read_unlock() + | rcu_dereference() | | + +---------+ | | + | updater |<----------------+ | + +---------+ V + | +-----------+ + +----------------------------------->| reclaimer | + +-----------+ + Defer: + synchronize_rcu() & call_rcu() + + +The RCU infrastructure observes the time sequence of rcu_read_lock(), +rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in +order to determine when (1) synchronize_rcu() invocations may return +to their callers and (2) call_rcu() callbacks may be invoked. Efficient +implementations of the RCU infrastructure make heavy use of batching in +order to amortize their overhead over many uses of the corresponding APIs. + +There are at least three flavors of RCU usage in the Linux kernel. The diagram +above shows the most common one. On the updater side, the rcu_assign_pointer(), +sychronize_rcu() and call_rcu() primitives used are the same for all three +flavors. However for protection (on the reader side), the primitives used vary +depending on the flavor: + +a. rcu_read_lock() / rcu_read_unlock() + rcu_dereference() + +b. rcu_read_lock_bh() / rcu_read_unlock_bh() + local_bh_disable() / local_bh_enable() + rcu_dereference_bh() + +c. rcu_read_lock_sched() / rcu_read_unlock_sched() + preempt_disable() / preempt_enable() + local_irq_save() / local_irq_restore() + hardirq enter / hardirq exit + NMI enter / NMI exit + rcu_dereference_sched() + +These three flavors are used as follows: + +a. RCU applied to normal data structures. + +b. RCU applied to networking data structures that may be subjected + to remote denial-of-service attacks. + +c. RCU applied to scheduler and interrupt/NMI-handler tasks. + +Again, most uses will be of (a). The (b) and (c) cases are important +for specialized uses, but are relatively uncommon. + +.. _3_whatisRCU: + +3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? +----------------------------------------------- + +This section shows a simple use of the core RCU API to protect a +global pointer to a dynamically allocated structure. More-typical +uses of RCU may be found in :ref:`listRCU.rst `, +:ref:`arrayRCU.rst `, and :ref:`NMI-RCU.rst `. +:: + + struct foo { + int a; + char b; + long c; + }; + DEFINE_SPINLOCK(foo_mutex); + + struct foo __rcu *gbl_foo; + + /* + * Create a new struct foo that is the same as the one currently + * pointed to by gbl_foo, except that field "a" is replaced + * with "new_a". Points gbl_foo to the new structure, and + * frees up the old structure after a grace period. + * + * Uses rcu_assign_pointer() to ensure that concurrent readers + * see the initialized version of the new structure. + * + * Uses synchronize_rcu() to ensure that any readers that might + * have references to the old structure complete before freeing + * the old structure. + */ + void foo_update_a(int new_a) + { + struct foo *new_fp; + struct foo *old_fp; + + new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); + spin_lock(&foo_mutex); + old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex)); + *new_fp = *old_fp; + new_fp->a = new_a; + rcu_assign_pointer(gbl_foo, new_fp); + spin_unlock(&foo_mutex); + synchronize_rcu(); + kfree(old_fp); + } + + /* + * Return the value of field "a" of the current gbl_foo + * structure. Use rcu_read_lock() and rcu_read_unlock() + * to ensure that the structure does not get deleted out + * from under us, and use rcu_dereference() to ensure that + * we see the initialized version of the structure (important + * for DEC Alpha and for people reading the code). + */ + int foo_get_a(void) + { + int retval; + + rcu_read_lock(); + retval = rcu_dereference(gbl_foo)->a; + rcu_read_unlock(); + return retval; + } + +So, to sum up: + +- Use rcu_read_lock() and rcu_read_unlock() to guard RCU + read-side critical sections. + +- Within an RCU read-side critical section, use rcu_dereference() + to dereference RCU-protected pointers. + +- Use some solid scheme (such as locks or semaphores) to + keep concurrent updates from interfering with each other. + +- Use rcu_assign_pointer() to update an RCU-protected pointer. + This primitive protects concurrent readers from the updater, + **not** concurrent updates from each other! You therefore still + need to use locking (or something similar) to keep concurrent + rcu_assign_pointer() primitives from interfering with each other. + +- Use synchronize_rcu() **after** removing a data element from an + RCU-protected data structure, but **before** reclaiming/freeing + the data element, in order to wait for the completion of all + RCU read-side critical sections that might be referencing that + data item. + +See checklist.txt for additional rules to follow when using RCU. +And again, more-typical uses of RCU may be found in :ref:`listRCU.rst +`, :ref:`arrayRCU.rst `, and :ref:`NMI-RCU.rst +`. + +.. _4_whatisRCU: + +4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? +-------------------------------------------- + +In the example above, foo_update_a() blocks until a grace period elapses. +This is quite simple, but in some cases one cannot afford to wait so +long -- there might be other high-priority work to be done. + +In such cases, one uses call_rcu() rather than synchronize_rcu(). +The call_rcu() API is as follows:: + + void call_rcu(struct rcu_head * head, + void (*func)(struct rcu_head *head)); + +This function invokes func(head) after a grace period has elapsed. +This invocation might happen from either softirq or process context, +so the function is not permitted to block. The foo struct needs to +have an rcu_head structure added, perhaps as follows:: + + struct foo { + int a; + char b; + long c; + struct rcu_head rcu; + }; + +The foo_update_a() function might then be written as follows:: + + /* + * Create a new struct foo that is the same as the one currently + * pointed to by gbl_foo, except that field "a" is replaced + * with "new_a". Points gbl_foo to the new structure, and + * frees up the old structure after a grace period. + * + * Uses rcu_assign_pointer() to ensure that concurrent readers + * see the initialized version of the new structure. + * + * Uses call_rcu() to ensure that any readers that might have + * references to the old structure complete before freeing the + * old structure. + */ + void foo_update_a(int new_a) + { + struct foo *new_fp; + struct foo *old_fp; + + new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); + spin_lock(&foo_mutex); + old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex)); + *new_fp = *old_fp; + new_fp->a = new_a; + rcu_assign_pointer(gbl_foo, new_fp); + spin_unlock(&foo_mutex); + call_rcu(&old_fp->rcu, foo_reclaim); + } + +The foo_reclaim() function might appear as follows:: + + void foo_reclaim(struct rcu_head *rp) + { + struct foo *fp = container_of(rp, struct foo, rcu); + + foo_cleanup(fp->a); + + kfree(fp); + } + +The container_of() primitive is a macro that, given a pointer into a +struct, the type of the struct, and the pointed-to field within the +struct, returns a pointer to the beginning of the struct. + +The use of call_rcu() permits the caller of foo_update_a() to +immediately regain control, without needing to worry further about the +old version of the newly updated element. It also clearly shows the +RCU distinction between updater, namely foo_update_a(), and reclaimer, +namely foo_reclaim(). + +The summary of advice is the same as for the previous section, except +that we are now using call_rcu() rather than synchronize_rcu(): + +- Use call_rcu() **after** removing a data element from an + RCU-protected data structure in order to register a callback + function that will be invoked after the completion of all RCU + read-side critical sections that might be referencing that + data item. + +If the callback for call_rcu() is not doing anything more than calling +kfree() on the structure, you can use kfree_rcu() instead of call_rcu() +to avoid having to write your own callback:: + + kfree_rcu(old_fp, rcu); + +Again, see checklist.txt for additional rules governing the use of RCU. + +.. _5_whatisRCU: + +5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? +------------------------------------------------ + +One of the nice things about RCU is that it has extremely simple "toy" +implementations that are a good first step towards understanding the +production-quality implementations in the Linux kernel. This section +presents two such "toy" implementations of RCU, one that is implemented +in terms of familiar locking primitives, and another that more closely +resembles "classic" RCU. Both are way too simple for real-world use, +lacking both functionality and performance. However, they are useful +in getting a feel for how RCU works. See kernel/rcu/update.c for a +production-quality implementation, and see: + + http://www.rdrop.com/users/paulmck/RCU + +for papers describing the Linux kernel RCU implementation. The OLS'01 +and OLS'02 papers are a good introduction, and the dissertation provides +more details on the current implementation as of early 2004. + + +5A. "TOY" IMPLEMENTATION #1: LOCKING +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +This section presents a "toy" RCU implementation that is based on +familiar locking primitives. Its overhead makes it a non-starter for +real-life use, as does its lack of scalability. It is also unsuitable +for realtime use, since it allows scheduling latency to "bleed" from +one read-side critical section to another. It also assumes recursive +reader-writer locks: If you try this with non-recursive locks, and +you allow nested rcu_read_lock() calls, you can deadlock. + +However, it is probably the easiest implementation to relate to, so is +a good starting point. + +It is extremely simple:: + + static DEFINE_RWLOCK(rcu_gp_mutex); + + void rcu_read_lock(void) + { + read_lock(&rcu_gp_mutex); + } + + void rcu_read_unlock(void) + { + read_unlock(&rcu_gp_mutex); + } + + void synchronize_rcu(void) + { + write_lock(&rcu_gp_mutex); + smp_mb__after_spinlock(); + write_unlock(&rcu_gp_mutex); + } + +[You can ignore rcu_assign_pointer() and rcu_dereference() without missing +much. But here are simplified versions anyway. And whatever you do, +don't forget about them when submitting patches making use of RCU!]:: + + #define rcu_assign_pointer(p, v) \ + ({ \ + smp_store_release(&(p), (v)); \ + }) + + #define rcu_dereference(p) \ + ({ \ + typeof(p) _________p1 = READ_ONCE(p); \ + (_________p1); \ + }) + + +The rcu_read_lock() and rcu_read_unlock() primitive read-acquire +and release a global reader-writer lock. The synchronize_rcu() +primitive write-acquires this same lock, then releases it. This means +that once synchronize_rcu() exits, all RCU read-side critical sections +that were in progress before synchronize_rcu() was called are guaranteed +to have completed -- there is no way that synchronize_rcu() would have +been able to write-acquire the lock otherwise. The smp_mb__after_spinlock() +promotes synchronize_rcu() to a full memory barrier in compliance with +the "Memory-Barrier Guarantees" listed in: + + Documentation/RCU/Design/Requirements/Requirements.rst + +It is possible to nest rcu_read_lock(), since reader-writer locks may +be recursively acquired. Note also that rcu_read_lock() is immune +from deadlock (an important property of RCU). The reason for this is +that the only thing that can block rcu_read_lock() is a synchronize_rcu(). +But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex, +so there can be no deadlock cycle. + +.. _quiz_1: + +Quick Quiz #1: + Why is this argument naive? How could a deadlock + occur when using this algorithm in a real-world Linux + kernel? How could this deadlock be avoided? + +:ref:`Answers to Quick Quiz <8_whatisRCU>` + +5B. "TOY" EXAMPLE #2: CLASSIC RCU +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +This section presents a "toy" RCU implementation that is based on +"classic RCU". It is also short on performance (but only for updates) and +on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT +kernels. The definitions of rcu_dereference() and rcu_assign_pointer() +are the same as those shown in the preceding section, so they are omitted. +:: + + void rcu_read_lock(void) { } + + void rcu_read_unlock(void) { } + + void synchronize_rcu(void) + { + int cpu; + + for_each_possible_cpu(cpu) + run_on(cpu); + } + +Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing. +This is the great strength of classic RCU in a non-preemptive kernel: +read-side overhead is precisely zero, at least on non-Alpha CPUs. +And there is absolutely no way that rcu_read_lock() can possibly +participate in a deadlock cycle! + +The implementation of synchronize_rcu() simply schedules itself on each +CPU in turn. The run_on() primitive can be implemented straightforwardly +in terms of the sched_setaffinity() primitive. Of course, a somewhat less +"toy" implementation would restore the affinity upon completion rather +than just leaving all tasks running on the last CPU, but when I said +"toy", I meant **toy**! + +So how the heck is this supposed to work??? + +Remember that it is illegal to block while in an RCU read-side critical +section. Therefore, if a given CPU executes a context switch, we know +that it must have completed all preceding RCU read-side critical sections. +Once **all** CPUs have executed a context switch, then **all** preceding +RCU read-side critical sections will have completed. + +So, suppose that we remove a data item from its structure and then invoke +synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed +that there are no RCU read-side critical sections holding a reference +to that data item, so we can safely reclaim it. + +.. _quiz_2: + +Quick Quiz #2: + Give an example where Classic RCU's read-side + overhead is **negative**. + +:ref:`Answers to Quick Quiz <8_whatisRCU>` + +.. _quiz_3: + +Quick Quiz #3: + If it is illegal to block in an RCU read-side + critical section, what the heck do you do in + PREEMPT_RT, where normal spinlocks can block??? + +:ref:`Answers to Quick Quiz <8_whatisRCU>` + +.. _6_whatisRCU: + +6. ANALOGY WITH READER-WRITER LOCKING +-------------------------------------- + +Although RCU can be used in many different ways, a very common use of +RCU is analogous to reader-writer locking. The following unified +diff shows how closely related RCU and reader-writer locking can be. +:: + + @@ -5,5 +5,5 @@ struct el { + int data; + /* Other data fields */ + }; + -rwlock_t listmutex; + +spinlock_t listmutex; + struct el head; + + @@ -13,15 +14,15 @@ + struct list_head *lp; + struct el *p; + + - read_lock(&listmutex); + - list_for_each_entry(p, head, lp) { + + rcu_read_lock(); + + list_for_each_entry_rcu(p, head, lp) { + if (p->key == key) { + *result = p->data; + - read_unlock(&listmutex); + + rcu_read_unlock(); + return 1; + } + } + - read_unlock(&listmutex); + + rcu_read_unlock(); + return 0; + } + + @@ -29,15 +30,16 @@ + { + struct el *p; + + - write_lock(&listmutex); + + spin_lock(&listmutex); + list_for_each_entry(p, head, lp) { + if (p->key == key) { + - list_del(&p->list); + - write_unlock(&listmutex); + + list_del_rcu(&p->list); + + spin_unlock(&listmutex); + + synchronize_rcu(); + kfree(p); + return 1; + } + } + - write_unlock(&listmutex); + + spin_unlock(&listmutex); + return 0; + } + +Or, for those who prefer a side-by-side listing:: + + 1 struct el { 1 struct el { + 2 struct list_head list; 2 struct list_head list; + 3 long key; 3 long key; + 4 spinlock_t mutex; 4 spinlock_t mutex; + 5 int data; 5 int data; + 6 /* Other data fields */ 6 /* Other data fields */ + 7 }; 7 }; + 8 rwlock_t listmutex; 8 spinlock_t listmutex; + 9 struct el head; 9 struct el head; + +:: + + 1 int search(long key, int *result) 1 int search(long key, int *result) + 2 { 2 { + 3 struct list_head *lp; 3 struct list_head *lp; + 4 struct el *p; 4 struct el *p; + 5 5 + 6 read_lock(&listmutex); 6 rcu_read_lock(); + 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { + 8 if (p->key == key) { 8 if (p->key == key) { + 9 *result = p->data; 9 *result = p->data; + 10 read_unlock(&listmutex); 10 rcu_read_unlock(); + 11 return 1; 11 return 1; + 12 } 12 } + 13 } 13 } + 14 read_unlock(&listmutex); 14 rcu_read_unlock(); + 15 return 0; 15 return 0; + 16 } 16 } + +:: + + 1 int delete(long key) 1 int delete(long key) + 2 { 2 { + 3 struct el *p; 3 struct el *p; + 4 4 + 5 write_lock(&listmutex); 5 spin_lock(&listmutex); + 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { + 7 if (p->key == key) { 7 if (p->key == key) { + 8 list_del(&p->list); 8 list_del_rcu(&p->list); + 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); + 10 synchronize_rcu(); + 10 kfree(p); 11 kfree(p); + 11 return 1; 12 return 1; + 12 } 13 } + 13 } 14 } + 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); + 15 return 0; 16 return 0; + 16 } 17 } + +Either way, the differences are quite small. Read-side locking moves +to rcu_read_lock() and rcu_read_unlock, update-side locking moves from +a reader-writer lock to a simple spinlock, and a synchronize_rcu() +precedes the kfree(). + +However, there is one potential catch: the read-side and update-side +critical sections can now run concurrently. In many cases, this will +not be a problem, but it is necessary to check carefully regardless. +For example, if multiple independent list updates must be seen as +a single atomic update, converting to RCU will require special care. + +Also, the presence of synchronize_rcu() means that the RCU version of +delete() can now block. If this is a problem, there is a callback-based +mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can +be used in place of synchronize_rcu(). + +.. _7_whatisRCU: + +7. FULL LIST OF RCU APIs +------------------------- + +The RCU APIs are documented in docbook-format header comments in the +Linux-kernel source code, but it helps to have a full list of the +APIs, since there does not appear to be a way to categorize them +in docbook. Here is the list, by category. + +RCU list traversal:: + + list_entry_rcu + list_entry_lockless + list_first_entry_rcu + list_next_rcu + list_for_each_entry_rcu + list_for_each_entry_continue_rcu + list_for_each_entry_from_rcu + list_first_or_null_rcu + list_next_or_null_rcu + hlist_first_rcu + hlist_next_rcu + hlist_pprev_rcu + hlist_for_each_entry_rcu + hlist_for_each_entry_rcu_bh + hlist_for_each_entry_from_rcu + hlist_for_each_entry_continue_rcu + hlist_for_each_entry_continue_rcu_bh + hlist_nulls_first_rcu + hlist_nulls_for_each_entry_rcu + hlist_bl_first_rcu + hlist_bl_for_each_entry_rcu + +RCU pointer/list update:: + + rcu_assign_pointer + list_add_rcu + list_add_tail_rcu + list_del_rcu + list_replace_rcu + hlist_add_behind_rcu + hlist_add_before_rcu + hlist_add_head_rcu + hlist_add_tail_rcu + hlist_del_rcu + hlist_del_init_rcu + hlist_replace_rcu + list_splice_init_rcu + list_splice_tail_init_rcu + hlist_nulls_del_init_rcu + hlist_nulls_del_rcu + hlist_nulls_add_head_rcu + hlist_bl_add_head_rcu + hlist_bl_del_init_rcu + hlist_bl_del_rcu + hlist_bl_set_first_rcu + +RCU:: + + Critical sections Grace period Barrier + + rcu_read_lock synchronize_net rcu_barrier + rcu_read_unlock synchronize_rcu + rcu_dereference synchronize_rcu_expedited + rcu_read_lock_held call_rcu + rcu_dereference_check kfree_rcu + rcu_dereference_protected + +bh:: + + Critical sections Grace period Barrier + + rcu_read_lock_bh call_rcu rcu_barrier + rcu_read_unlock_bh synchronize_rcu + [local_bh_disable] synchronize_rcu_expedited + [and friends] + rcu_dereference_bh + rcu_dereference_bh_check + rcu_dereference_bh_protected + rcu_read_lock_bh_held + +sched:: + + Critical sections Grace period Barrier + + rcu_read_lock_sched call_rcu rcu_barrier + rcu_read_unlock_sched synchronize_rcu + [preempt_disable] synchronize_rcu_expedited + [and friends] + rcu_read_lock_sched_notrace + rcu_read_unlock_sched_notrace + rcu_dereference_sched + rcu_dereference_sched_check + rcu_dereference_sched_protected + rcu_read_lock_sched_held + + +SRCU:: + + Critical sections Grace period Barrier + + srcu_read_lock call_srcu srcu_barrier + srcu_read_unlock synchronize_srcu + srcu_dereference synchronize_srcu_expedited + srcu_dereference_check + srcu_read_lock_held + +SRCU: Initialization/cleanup:: + + DEFINE_SRCU + DEFINE_STATIC_SRCU + init_srcu_struct + cleanup_srcu_struct + +All: lockdep-checked RCU-protected pointer access:: + + rcu_access_pointer + rcu_dereference_raw + RCU_LOCKDEP_WARN + rcu_sleep_check + RCU_NONIDLE + +See the comment headers in the source code (or the docbook generated +from them) for more information. + +However, given that there are no fewer than four families of RCU APIs +in the Linux kernel, how do you choose which one to use? The following +list can be helpful: + +a. Will readers need to block? If so, you need SRCU. + +b. What about the -rt patchset? If readers would need to block + in an non-rt kernel, you need SRCU. If readers would block + in a -rt kernel, but not in a non-rt kernel, SRCU is not + necessary. (The -rt patchset turns spinlocks into sleeplocks, + hence this distinction.) + +c. Do you need to treat NMI handlers, hardirq handlers, + and code segments with preemption disabled (whether + via preempt_disable(), local_irq_save(), local_bh_disable(), + or some other mechanism) as if they were explicit RCU readers? + If so, RCU-sched is the only choice that will work for you. + +d. Do you need RCU grace periods to complete even in the face + of softirq monopolization of one or more of the CPUs? For + example, is your code subject to network-based denial-of-service + attacks? If so, you should disable softirq across your readers, + for example, by using rcu_read_lock_bh(). + +e. Is your workload too update-intensive for normal use of + RCU, but inappropriate for other synchronization mechanisms? + If so, consider SLAB_TYPESAFE_BY_RCU (which was originally + named SLAB_DESTROY_BY_RCU). But please be careful! + +f. Do you need read-side critical sections that are respected + even though they are in the middle of the idle loop, during + user-mode execution, or on an offlined CPU? If so, SRCU is the + only choice that will work for you. + +g. Otherwise, use RCU. + +Of course, this all assumes that you have determined that RCU is in fact +the right tool for your job. + +.. _8_whatisRCU: + +8. ANSWERS TO QUICK QUIZZES +---------------------------- + +Quick Quiz #1: + Why is this argument naive? How could a deadlock + occur when using this algorithm in a real-world Linux + kernel? [Referring to the lock-based "toy" RCU + algorithm.] + +Answer: + Consider the following sequence of events: + + 1. CPU 0 acquires some unrelated lock, call it + "problematic_lock", disabling irq via + spin_lock_irqsave(). + + 2. CPU 1 enters synchronize_rcu(), write-acquiring + rcu_gp_mutex. + + 3. CPU 0 enters rcu_read_lock(), but must wait + because CPU 1 holds rcu_gp_mutex. + + 4. CPU 1 is interrupted, and the irq handler + attempts to acquire problematic_lock. + + The system is now deadlocked. + + One way to avoid this deadlock is to use an approach like + that of CONFIG_PREEMPT_RT, where all normal spinlocks + become blocking locks, and all irq handlers execute in + the context of special tasks. In this case, in step 4 + above, the irq handler would block, allowing CPU 1 to + release rcu_gp_mutex, avoiding the deadlock. + + Even in the absence of deadlock, this RCU implementation + allows latency to "bleed" from readers to other + readers through synchronize_rcu(). To see this, + consider task A in an RCU read-side critical section + (thus read-holding rcu_gp_mutex), task B blocked + attempting to write-acquire rcu_gp_mutex, and + task C blocked in rcu_read_lock() attempting to + read_acquire rcu_gp_mutex. Task A's RCU read-side + latency is holding up task C, albeit indirectly via + task B. + + Realtime RCU implementations therefore use a counter-based + approach where tasks in RCU read-side critical sections + cannot be blocked by tasks executing synchronize_rcu(). + +:ref:`Back to Quick Quiz #1 ` + +Quick Quiz #2: + Give an example where Classic RCU's read-side + overhead is **negative**. + +Answer: + Imagine a single-CPU system with a non-CONFIG_PREEMPT + kernel where a routing table is used by process-context + code, but can be updated by irq-context code (for example, + by an "ICMP REDIRECT" packet). The usual way of handling + this would be to have the process-context code disable + interrupts while searching the routing table. Use of + RCU allows such interrupt-disabling to be dispensed with. + Thus, without RCU, you pay the cost of disabling interrupts, + and with RCU you don't. + + One can argue that the overhead of RCU in this + case is negative with respect to the single-CPU + interrupt-disabling approach. Others might argue that + the overhead of RCU is merely zero, and that replacing + the positive overhead of the interrupt-disabling scheme + with the zero-overhead RCU scheme does not constitute + negative overhead. + + In real life, of course, things are more complex. But + even the theoretical possibility of negative overhead for + a synchronization primitive is a bit unexpected. ;-) + +:ref:`Back to Quick Quiz #2 ` + +Quick Quiz #3: + If it is illegal to block in an RCU read-side + critical section, what the heck do you do in + PREEMPT_RT, where normal spinlocks can block??? + +Answer: + Just as PREEMPT_RT permits preemption of spinlock + critical sections, it permits preemption of RCU + read-side critical sections. It also permits + spinlocks blocking while in RCU read-side critical + sections. + + Why the apparent inconsistency? Because it is + possible to use priority boosting to keep the RCU + grace periods short if need be (for example, if running + short of memory). In contrast, if blocking waiting + for (say) network reception, there is no way to know + what should be boosted. Especially given that the + process we need to boost might well be a human being + who just went out for a pizza or something. And although + a computer-operated cattle prod might arouse serious + interest, it might also provoke serious objections. + Besides, how does the computer know what pizza parlor + the human being went to??? + +:ref:`Back to Quick Quiz #3 ` + +ACKNOWLEDGEMENTS + +My thanks to the people who helped make this human-readable, including +Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern. + + +For more information, see http://www.rdrop.com/users/paulmck/RCU. --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -1,1079 +0,0 @@ -What is RCU? -- "Read, Copy, Update" - -Please note that the "What is RCU?" LWN series is an excellent place -to start learning about RCU: - -1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ -2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ -3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ -4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ - 2010 Big API Table http://lwn.net/Articles/419086/ -5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/ - 2014 Big API Table http://lwn.net/Articles/609973/ - - -What is RCU? - -RCU is a synchronization mechanism that was added to the Linux kernel -during the 2.5 development effort that is optimized for read-mostly -situations. Although RCU is actually quite simple once you understand it, -getting there can sometimes be a challenge. Part of the problem is that -most of the past descriptions of RCU have been written with the mistaken -assumption that there is "one true way" to describe RCU. Instead, -the experience has been that different people must take different paths -to arrive at an understanding of RCU. This document provides several -different paths, as follows: - -1. RCU OVERVIEW -2. WHAT IS RCU'S CORE API? -3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? -4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? -5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? -6. ANALOGY WITH READER-WRITER LOCKING -7. FULL LIST OF RCU APIs -8. ANSWERS TO QUICK QUIZZES - -People who prefer starting with a conceptual overview should focus on -Section 1, though most readers will profit by reading this section at -some point. People who prefer to start with an API that they can then -experiment with should focus on Section 2. People who prefer to start -with example uses should focus on Sections 3 and 4. People who need to -understand the RCU implementation should focus on Section 5, then dive -into the kernel source code. People who reason best by analogy should -focus on Section 6. Section 7 serves as an index to the docbook API -documentation, and Section 8 is the traditional answer key. - -So, start with the section that makes the most sense to you and your -preferred method of learning. If you need to know everything about -everything, feel free to read the whole thing -- but if you are really -that type of person, you have perused the source code and will therefore -never need this document anyway. ;-) - - -1. RCU OVERVIEW - -The basic idea behind RCU is to split updates into "removal" and -"reclamation" phases. The removal phase removes references to data items -within a data structure (possibly by replacing them with references to -new versions of these data items), and can run concurrently with readers. -The reason that it is safe to run the removal phase concurrently with -readers is the semantics of modern CPUs guarantee that readers will see -either the old or the new version of the data structure rather than a -partially updated reference. The reclamation phase does the work of reclaiming -(e.g., freeing) the data items removed from the data structure during the -removal phase. Because reclaiming data items can disrupt any readers -concurrently referencing those data items, the reclamation phase must -not start until readers no longer hold references to those data items. - -Splitting the update into removal and reclamation phases permits the -updater to perform the removal phase immediately, and to defer the -reclamation phase until all readers active during the removal phase have -completed, either by blocking until they finish or by registering a -callback that is invoked after they finish. Only readers that are active -during the removal phase need be considered, because any reader starting -after the removal phase will be unable to gain a reference to the removed -data items, and therefore cannot be disrupted by the reclamation phase. - -So the typical RCU update sequence goes something like the following: - -a. Remove pointers to a data structure, so that subsequent - readers cannot gain a reference to it. - -b. Wait for all previous readers to complete their RCU read-side - critical sections. - -c. At this point, there cannot be any readers who hold references - to the data structure, so it now may safely be reclaimed - (e.g., kfree()d). - -Step (b) above is the key idea underlying RCU's deferred destruction. -The ability to wait until all readers are done allows RCU readers to -use much lighter-weight synchronization, in some cases, absolutely no -synchronization at all. In contrast, in more conventional lock-based -schemes, readers must use heavy-weight synchronization in order to -prevent an updater from deleting the data structure out from under them. -This is because lock-based updaters typically update data items in place, -and must therefore exclude readers. In contrast, RCU-based updaters -typically take advantage of the fact that writes to single aligned -pointers are atomic on modern CPUs, allowing atomic insertion, removal, -and replacement of data items in a linked structure without disrupting -readers. Concurrent RCU readers can then continue accessing the old -versions, and can dispense with the atomic operations, memory barriers, -and communications cache misses that are so expensive on present-day -SMP computer systems, even in absence of lock contention. - -In the three-step procedure shown above, the updater is performing both -the removal and the reclamation step, but it is often helpful for an -entirely different thread to do the reclamation, as is in fact the case -in the Linux kernel's directory-entry cache (dcache). Even if the same -thread performs both the update step (step (a) above) and the reclamation -step (step (c) above), it is often helpful to think of them separately. -For example, RCU readers and updaters need not communicate at all, -but RCU provides implicit low-overhead communication between readers -and reclaimers, namely, in step (b) above. - -So how the heck can a reclaimer tell when a reader is done, given -that readers are not doing any sort of synchronization operations??? -Read on to learn about how RCU's API makes this easy. - - -2. WHAT IS RCU'S CORE API? - -The core RCU API is quite small: - -a. rcu_read_lock() -b. rcu_read_unlock() -c. synchronize_rcu() / call_rcu() -d. rcu_assign_pointer() -e. rcu_dereference() - -There are many other members of the RCU API, but the rest can be -expressed in terms of these five, though most implementations instead -express synchronize_rcu() in terms of the call_rcu() callback API. - -The five core RCU APIs are described below, the other 18 will be enumerated -later. See the kernel docbook documentation for more info, or look directly -at the function header comments. - -rcu_read_lock() - - void rcu_read_lock(void); - - Used by a reader to inform the reclaimer that the reader is - entering an RCU read-side critical section. It is illegal - to block while in an RCU read-side critical section, though - kernels built with CONFIG_PREEMPT_RCU can preempt RCU - read-side critical sections. Any RCU-protected data structure - accessed during an RCU read-side critical section is guaranteed to - remain unreclaimed for the full duration of that critical section. - Reference counts may be used in conjunction with RCU to maintain - longer-term references to data structures. - -rcu_read_unlock() - - void rcu_read_unlock(void); - - Used by a reader to inform the reclaimer that the reader is - exiting an RCU read-side critical section. Note that RCU - read-side critical sections may be nested and/or overlapping. - -synchronize_rcu() - - void synchronize_rcu(void); - - Marks the end of updater code and the beginning of reclaimer - code. It does this by blocking until all pre-existing RCU - read-side critical sections on all CPUs have completed. - Note that synchronize_rcu() will -not- necessarily wait for - any subsequent RCU read-side critical sections to complete. - For example, consider the following sequence of events: - - CPU 0 CPU 1 CPU 2 - ----------------- ------------------------- --------------- - 1. rcu_read_lock() - 2. enters synchronize_rcu() - 3. rcu_read_lock() - 4. rcu_read_unlock() - 5. exits synchronize_rcu() - 6. rcu_read_unlock() - - To reiterate, synchronize_rcu() waits only for ongoing RCU - read-side critical sections to complete, not necessarily for - any that begin after synchronize_rcu() is invoked. - - Of course, synchronize_rcu() does not necessarily return - -immediately- after the last pre-existing RCU read-side critical - section completes. For one thing, there might well be scheduling - delays. For another thing, many RCU implementations process - requests in batches in order to improve efficiencies, which can - further delay synchronize_rcu(). - - Since synchronize_rcu() is the API that must figure out when - readers are done, its implementation is key to RCU. For RCU - to be useful in all but the most read-intensive situations, - synchronize_rcu()'s overhead must also be quite small. - - The call_rcu() API is a callback form of synchronize_rcu(), - and is described in more detail in a later section. Instead of - blocking, it registers a function and argument which are invoked - after all ongoing RCU read-side critical sections have completed. - This callback variant is particularly useful in situations where - it is illegal to block or where update-side performance is - critically important. - - However, the call_rcu() API should not be used lightly, as use - of the synchronize_rcu() API generally results in simpler code. - In addition, the synchronize_rcu() API has the nice property - of automatically limiting update rate should grace periods - be delayed. This property results in system resilience in face - of denial-of-service attacks. Code using call_rcu() should limit - update rate in order to gain this same sort of resilience. See - checklist.txt for some approaches to limiting the update rate. - -rcu_assign_pointer() - - void rcu_assign_pointer(p, typeof(p) v); - - Yes, rcu_assign_pointer() -is- implemented as a macro, though it - would be cool to be able to declare a function in this manner. - (Compiler experts will no doubt disagree.) - - The updater uses this function to assign a new value to an - RCU-protected pointer, in order to safely communicate the change - in value from the updater to the reader. This macro does not - evaluate to an rvalue, but it does execute any memory-barrier - instructions required for a given CPU architecture. - - Perhaps just as important, it serves to document (1) which - pointers are protected by RCU and (2) the point at which a - given structure becomes accessible to other CPUs. That said, - rcu_assign_pointer() is most frequently used indirectly, via - the _rcu list-manipulation primitives such as list_add_rcu(). - -rcu_dereference() - - typeof(p) rcu_dereference(p); - - Like rcu_assign_pointer(), rcu_dereference() must be implemented - as a macro. - - The reader uses rcu_dereference() to fetch an RCU-protected - pointer, which returns a value that may then be safely - dereferenced. Note that rcu_dereference() does not actually - dereference the pointer, instead, it protects the pointer for - later dereferencing. It also executes any needed memory-barrier - instructions for a given CPU architecture. Currently, only Alpha - needs memory barriers within rcu_dereference() -- on other CPUs, - it compiles to nothing, not even a compiler directive. - - Common coding practice uses rcu_dereference() to copy an - RCU-protected pointer to a local variable, then dereferences - this local variable, for example as follows: - - p = rcu_dereference(head.next); - return p->data; - - However, in this case, one could just as easily combine these - into one statement: - - return rcu_dereference(head.next)->data; - - If you are going to be fetching multiple fields from the - RCU-protected structure, using the local variable is of - course preferred. Repeated rcu_dereference() calls look - ugly, do not guarantee that the same pointer will be returned - if an update happened while in the critical section, and incur - unnecessary overhead on Alpha CPUs. - - Note that the value returned by rcu_dereference() is valid - only within the enclosing RCU read-side critical section [1]. - For example, the following is -not- legal: - - rcu_read_lock(); - p = rcu_dereference(head.next); - rcu_read_unlock(); - x = p->address; /* BUG!!! */ - rcu_read_lock(); - y = p->data; /* BUG!!! */ - rcu_read_unlock(); - - Holding a reference from one RCU read-side critical section - to another is just as illegal as holding a reference from - one lock-based critical section to another! Similarly, - using a reference outside of the critical section in which - it was acquired is just as illegal as doing so with normal - locking. - - As with rcu_assign_pointer(), an important function of - rcu_dereference() is to document which pointers are protected by - RCU, in particular, flagging a pointer that is subject to changing - at any time, including immediately after the rcu_dereference(). - And, again like rcu_assign_pointer(), rcu_dereference() is - typically used indirectly, via the _rcu list-manipulation - primitives, such as list_for_each_entry_rcu() [2]. - - [1] The variant rcu_dereference_protected() can be used outside - of an RCU read-side critical section as long as the usage is - protected by locks acquired by the update-side code. This variant - avoids the lockdep warning that would happen when using (for - example) rcu_dereference() without rcu_read_lock() protection. - Using rcu_dereference_protected() also has the advantage - of permitting compiler optimizations that rcu_dereference() - must prohibit. The rcu_dereference_protected() variant takes - a lockdep expression to indicate which locks must be acquired - by the caller. If the indicated protection is not provided, - a lockdep splat is emitted. See Documentation/RCU/Design/Requirements/Requirements.rst - and the API's code comments for more details and example usage. - - [2] If the list_for_each_entry_rcu() instance might be used by - update-side code as well as by RCU readers, then an additional - lockdep expression can be added to its list of arguments. - For example, given an additional "lock_is_held(&mylock)" argument, - the RCU lockdep code would complain only if this instance was - invoked outside of an RCU read-side critical section and without - the protection of mylock. - -The following diagram shows how each API communicates among the -reader, updater, and reclaimer. - - - rcu_assign_pointer() - +--------+ - +---------------------->| reader |---------+ - | +--------+ | - | | | - | | | Protect: - | | | rcu_read_lock() - | | | rcu_read_unlock() - | rcu_dereference() | | - +---------+ | | - | updater |<----------------+ | - +---------+ V - | +-----------+ - +----------------------------------->| reclaimer | - +-----------+ - Defer: - synchronize_rcu() & call_rcu() - - -The RCU infrastructure observes the time sequence of rcu_read_lock(), -rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in -order to determine when (1) synchronize_rcu() invocations may return -to their callers and (2) call_rcu() callbacks may be invoked. Efficient -implementations of the RCU infrastructure make heavy use of batching in -order to amortize their overhead over many uses of the corresponding APIs. - -There are at least three flavors of RCU usage in the Linux kernel. The diagram -above shows the most common one. On the updater side, the rcu_assign_pointer(), -sychronize_rcu() and call_rcu() primitives used are the same for all three -flavors. However for protection (on the reader side), the primitives used vary -depending on the flavor: - -a. rcu_read_lock() / rcu_read_unlock() - rcu_dereference() - -b. rcu_read_lock_bh() / rcu_read_unlock_bh() - local_bh_disable() / local_bh_enable() - rcu_dereference_bh() - -c. rcu_read_lock_sched() / rcu_read_unlock_sched() - preempt_disable() / preempt_enable() - local_irq_save() / local_irq_restore() - hardirq enter / hardirq exit - NMI enter / NMI exit - rcu_dereference_sched() - -These three flavors are used as follows: - -a. RCU applied to normal data structures. - -b. RCU applied to networking data structures that may be subjected - to remote denial-of-service attacks. - -c. RCU applied to scheduler and interrupt/NMI-handler tasks. - -Again, most uses will be of (a). The (b) and (c) cases are important -for specialized uses, but are relatively uncommon. - - -3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? - -This section shows a simple use of the core RCU API to protect a -global pointer to a dynamically allocated structure. More-typical -uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. - - struct foo { - int a; - char b; - long c; - }; - DEFINE_SPINLOCK(foo_mutex); - - struct foo __rcu *gbl_foo; - - /* - * Create a new struct foo that is the same as the one currently - * pointed to by gbl_foo, except that field "a" is replaced - * with "new_a". Points gbl_foo to the new structure, and - * frees up the old structure after a grace period. - * - * Uses rcu_assign_pointer() to ensure that concurrent readers - * see the initialized version of the new structure. - * - * Uses synchronize_rcu() to ensure that any readers that might - * have references to the old structure complete before freeing - * the old structure. - */ - void foo_update_a(int new_a) - { - struct foo *new_fp; - struct foo *old_fp; - - new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); - spin_lock(&foo_mutex); - old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex)); - *new_fp = *old_fp; - new_fp->a = new_a; - rcu_assign_pointer(gbl_foo, new_fp); - spin_unlock(&foo_mutex); - synchronize_rcu(); - kfree(old_fp); - } - - /* - * Return the value of field "a" of the current gbl_foo - * structure. Use rcu_read_lock() and rcu_read_unlock() - * to ensure that the structure does not get deleted out - * from under us, and use rcu_dereference() to ensure that - * we see the initialized version of the structure (important - * for DEC Alpha and for people reading the code). - */ - int foo_get_a(void) - { - int retval; - - rcu_read_lock(); - retval = rcu_dereference(gbl_foo)->a; - rcu_read_unlock(); - return retval; - } - -So, to sum up: - -o Use rcu_read_lock() and rcu_read_unlock() to guard RCU - read-side critical sections. - -o Within an RCU read-side critical section, use rcu_dereference() - to dereference RCU-protected pointers. - -o Use some solid scheme (such as locks or semaphores) to - keep concurrent updates from interfering with each other. - -o Use rcu_assign_pointer() to update an RCU-protected pointer. - This primitive protects concurrent readers from the updater, - -not- concurrent updates from each other! You therefore still - need to use locking (or something similar) to keep concurrent - rcu_assign_pointer() primitives from interfering with each other. - -o Use synchronize_rcu() -after- removing a data element from an - RCU-protected data structure, but -before- reclaiming/freeing - the data element, in order to wait for the completion of all - RCU read-side critical sections that might be referencing that - data item. - -See checklist.txt for additional rules to follow when using RCU. -And again, more-typical uses of RCU may be found in listRCU.txt, -arrayRCU.txt, and NMI-RCU.txt. - - -4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? - -In the example above, foo_update_a() blocks until a grace period elapses. -This is quite simple, but in some cases one cannot afford to wait so -long -- there might be other high-priority work to be done. - -In such cases, one uses call_rcu() rather than synchronize_rcu(). -The call_rcu() API is as follows: - - void call_rcu(struct rcu_head * head, - void (*func)(struct rcu_head *head)); - -This function invokes func(head) after a grace period has elapsed. -This invocation might happen from either softirq or process context, -so the function is not permitted to block. The foo struct needs to -have an rcu_head structure added, perhaps as follows: - - struct foo { - int a; - char b; - long c; - struct rcu_head rcu; - }; - -The foo_update_a() function might then be written as follows: - - /* - * Create a new struct foo that is the same as the one currently - * pointed to by gbl_foo, except that field "a" is replaced - * with "new_a". Points gbl_foo to the new structure, and - * frees up the old structure after a grace period. - * - * Uses rcu_assign_pointer() to ensure that concurrent readers - * see the initialized version of the new structure. - * - * Uses call_rcu() to ensure that any readers that might have - * references to the old structure complete before freeing the - * old structure. - */ - void foo_update_a(int new_a) - { - struct foo *new_fp; - struct foo *old_fp; - - new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); - spin_lock(&foo_mutex); - old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex)); - *new_fp = *old_fp; - new_fp->a = new_a; - rcu_assign_pointer(gbl_foo, new_fp); - spin_unlock(&foo_mutex); - call_rcu(&old_fp->rcu, foo_reclaim); - } - -The foo_reclaim() function might appear as follows: - - void foo_reclaim(struct rcu_head *rp) - { - struct foo *fp = container_of(rp, struct foo, rcu); - - foo_cleanup(fp->a); - - kfree(fp); - } - -The container_of() primitive is a macro that, given a pointer into a -struct, the type of the struct, and the pointed-to field within the -struct, returns a pointer to the beginning of the struct. - -The use of call_rcu() permits the caller of foo_update_a() to -immediately regain control, without needing to worry further about the -old version of the newly updated element. It also clearly shows the -RCU distinction between updater, namely foo_update_a(), and reclaimer, -namely foo_reclaim(). - -The summary of advice is the same as for the previous section, except -that we are now using call_rcu() rather than synchronize_rcu(): - -o Use call_rcu() -after- removing a data element from an - RCU-protected data structure in order to register a callback - function that will be invoked after the completion of all RCU - read-side critical sections that might be referencing that - data item. - -If the callback for call_rcu() is not doing anything more than calling -kfree() on the structure, you can use kfree_rcu() instead of call_rcu() -to avoid having to write your own callback: - - kfree_rcu(old_fp, rcu); - -Again, see checklist.txt for additional rules governing the use of RCU. - - -5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? - -One of the nice things about RCU is that it has extremely simple "toy" -implementations that are a good first step towards understanding the -production-quality implementations in the Linux kernel. This section -presents two such "toy" implementations of RCU, one that is implemented -in terms of familiar locking primitives, and another that more closely -resembles "classic" RCU. Both are way too simple for real-world use, -lacking both functionality and performance. However, they are useful -in getting a feel for how RCU works. See kernel/rcu/update.c for a -production-quality implementation, and see: - - http://www.rdrop.com/users/paulmck/RCU - -for papers describing the Linux kernel RCU implementation. The OLS'01 -and OLS'02 papers are a good introduction, and the dissertation provides -more details on the current implementation as of early 2004. - - -5A. "TOY" IMPLEMENTATION #1: LOCKING - -This section presents a "toy" RCU implementation that is based on -familiar locking primitives. Its overhead makes it a non-starter for -real-life use, as does its lack of scalability. It is also unsuitable -for realtime use, since it allows scheduling latency to "bleed" from -one read-side critical section to another. It also assumes recursive -reader-writer locks: If you try this with non-recursive locks, and -you allow nested rcu_read_lock() calls, you can deadlock. - -However, it is probably the easiest implementation to relate to, so is -a good starting point. - -It is extremely simple: - - static DEFINE_RWLOCK(rcu_gp_mutex); - - void rcu_read_lock(void) - { - read_lock(&rcu_gp_mutex); - } - - void rcu_read_unlock(void) - { - read_unlock(&rcu_gp_mutex); - } - - void synchronize_rcu(void) - { - write_lock(&rcu_gp_mutex); - smp_mb__after_spinlock(); - write_unlock(&rcu_gp_mutex); - } - -[You can ignore rcu_assign_pointer() and rcu_dereference() without missing -much. But here are simplified versions anyway. And whatever you do, -don't forget about them when submitting patches making use of RCU!] - - #define rcu_assign_pointer(p, v) \ - ({ \ - smp_store_release(&(p), (v)); \ - }) - - #define rcu_dereference(p) \ - ({ \ - typeof(p) _________p1 = READ_ONCE(p); \ - (_________p1); \ - }) - - -The rcu_read_lock() and rcu_read_unlock() primitive read-acquire -and release a global reader-writer lock. The synchronize_rcu() -primitive write-acquires this same lock, then releases it. This means -that once synchronize_rcu() exits, all RCU read-side critical sections -that were in progress before synchronize_rcu() was called are guaranteed -to have completed -- there is no way that synchronize_rcu() would have -been able to write-acquire the lock otherwise. The smp_mb__after_spinlock() -promotes synchronize_rcu() to a full memory barrier in compliance with -the "Memory-Barrier Guarantees" listed in: - - Documentation/RCU/Design/Requirements/Requirements.rst - -It is possible to nest rcu_read_lock(), since reader-writer locks may -be recursively acquired. Note also that rcu_read_lock() is immune -from deadlock (an important property of RCU). The reason for this is -that the only thing that can block rcu_read_lock() is a synchronize_rcu(). -But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex, -so there can be no deadlock cycle. - -Quick Quiz #1: Why is this argument naive? How could a deadlock - occur when using this algorithm in a real-world Linux - kernel? How could this deadlock be avoided? - - -5B. "TOY" EXAMPLE #2: CLASSIC RCU - -This section presents a "toy" RCU implementation that is based on -"classic RCU". It is also short on performance (but only for updates) and -on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT -kernels. The definitions of rcu_dereference() and rcu_assign_pointer() -are the same as those shown in the preceding section, so they are omitted. - - void rcu_read_lock(void) { } - - void rcu_read_unlock(void) { } - - void synchronize_rcu(void) - { - int cpu; - - for_each_possible_cpu(cpu) - run_on(cpu); - } - -Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing. -This is the great strength of classic RCU in a non-preemptive kernel: -read-side overhead is precisely zero, at least on non-Alpha CPUs. -And there is absolutely no way that rcu_read_lock() can possibly -participate in a deadlock cycle! - -The implementation of synchronize_rcu() simply schedules itself on each -CPU in turn. The run_on() primitive can be implemented straightforwardly -in terms of the sched_setaffinity() primitive. Of course, a somewhat less -"toy" implementation would restore the affinity upon completion rather -than just leaving all tasks running on the last CPU, but when I said -"toy", I meant -toy-! - -So how the heck is this supposed to work??? - -Remember that it is illegal to block while in an RCU read-side critical -section. Therefore, if a given CPU executes a context switch, we know -that it must have completed all preceding RCU read-side critical sections. -Once -all- CPUs have executed a context switch, then -all- preceding -RCU read-side critical sections will have completed. - -So, suppose that we remove a data item from its structure and then invoke -synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed -that there are no RCU read-side critical sections holding a reference -to that data item, so we can safely reclaim it. - -Quick Quiz #2: Give an example where Classic RCU's read-side - overhead is -negative-. - -Quick Quiz #3: If it is illegal to block in an RCU read-side - critical section, what the heck do you do in - PREEMPT_RT, where normal spinlocks can block??? - - -6. ANALOGY WITH READER-WRITER LOCKING - -Although RCU can be used in many different ways, a very common use of -RCU is analogous to reader-writer locking. The following unified -diff shows how closely related RCU and reader-writer locking can be. - - @@ -5,5 +5,5 @@ struct el { - int data; - /* Other data fields */ - }; - -rwlock_t listmutex; - +spinlock_t listmutex; - struct el head; - - @@ -13,15 +14,15 @@ - struct list_head *lp; - struct el *p; - - - read_lock(&listmutex); - - list_for_each_entry(p, head, lp) { - + rcu_read_lock(); - + list_for_each_entry_rcu(p, head, lp) { - if (p->key == key) { - *result = p->data; - - read_unlock(&listmutex); - + rcu_read_unlock(); - return 1; - } - } - - read_unlock(&listmutex); - + rcu_read_unlock(); - return 0; - } - - @@ -29,15 +30,16 @@ - { - struct el *p; - - - write_lock(&listmutex); - + spin_lock(&listmutex); - list_for_each_entry(p, head, lp) { - if (p->key == key) { - - list_del(&p->list); - - write_unlock(&listmutex); - + list_del_rcu(&p->list); - + spin_unlock(&listmutex); - + synchronize_rcu(); - kfree(p); - return 1; - } - } - - write_unlock(&listmutex); - + spin_unlock(&listmutex); - return 0; - } - -Or, for those who prefer a side-by-side listing: - - 1 struct el { 1 struct el { - 2 struct list_head list; 2 struct list_head list; - 3 long key; 3 long key; - 4 spinlock_t mutex; 4 spinlock_t mutex; - 5 int data; 5 int data; - 6 /* Other data fields */ 6 /* Other data fields */ - 7 }; 7 }; - 8 rwlock_t listmutex; 8 spinlock_t listmutex; - 9 struct el head; 9 struct el head; - - 1 int search(long key, int *result) 1 int search(long key, int *result) - 2 { 2 { - 3 struct list_head *lp; 3 struct list_head *lp; - 4 struct el *p; 4 struct el *p; - 5 5 - 6 read_lock(&listmutex); 6 rcu_read_lock(); - 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) { - 8 if (p->key == key) { 8 if (p->key == key) { - 9 *result = p->data; 9 *result = p->data; -10 read_unlock(&listmutex); 10 rcu_read_unlock(); -11 return 1; 11 return 1; -12 } 12 } -13 } 13 } -14 read_unlock(&listmutex); 14 rcu_read_unlock(); -15 return 0; 15 return 0; -16 } 16 } - - 1 int delete(long key) 1 int delete(long key) - 2 { 2 { - 3 struct el *p; 3 struct el *p; - 4 4 - 5 write_lock(&listmutex); 5 spin_lock(&listmutex); - 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { - 7 if (p->key == key) { 7 if (p->key == key) { - 8 list_del(&p->list); 8 list_del_rcu(&p->list); - 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); - 10 synchronize_rcu(); -10 kfree(p); 11 kfree(p); -11 return 1; 12 return 1; -12 } 13 } -13 } 14 } -14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); -15 return 0; 16 return 0; -16 } 17 } - -Either way, the differences are quite small. Read-side locking moves -to rcu_read_lock() and rcu_read_unlock, update-side locking moves from -a reader-writer lock to a simple spinlock, and a synchronize_rcu() -precedes the kfree(). - -However, there is one potential catch: the read-side and update-side -critical sections can now run concurrently. In many cases, this will -not be a problem, but it is necessary to check carefully regardless. -For example, if multiple independent list updates must be seen as -a single atomic update, converting to RCU will require special care. - -Also, the presence of synchronize_rcu() means that the RCU version of -delete() can now block. If this is a problem, there is a callback-based -mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can -be used in place of synchronize_rcu(). - - -7. FULL LIST OF RCU APIs - -The RCU APIs are documented in docbook-format header comments in the -Linux-kernel source code, but it helps to have a full list of the -APIs, since there does not appear to be a way to categorize them -in docbook. Here is the list, by category. - -RCU list traversal: - - list_entry_rcu - list_first_entry_rcu - list_next_rcu - list_for_each_entry_rcu - list_for_each_entry_continue_rcu - list_for_each_entry_from_rcu - hlist_first_rcu - hlist_next_rcu - hlist_pprev_rcu - hlist_for_each_entry_rcu - hlist_for_each_entry_rcu_bh - hlist_for_each_entry_from_rcu - hlist_for_each_entry_continue_rcu - hlist_for_each_entry_continue_rcu_bh - hlist_nulls_first_rcu - hlist_nulls_for_each_entry_rcu - hlist_bl_first_rcu - hlist_bl_for_each_entry_rcu - -RCU pointer/list update: - - rcu_assign_pointer - list_add_rcu - list_add_tail_rcu - list_del_rcu - list_replace_rcu - hlist_add_behind_rcu - hlist_add_before_rcu - hlist_add_head_rcu - hlist_del_rcu - hlist_del_init_rcu - hlist_replace_rcu - list_splice_init_rcu() - hlist_nulls_del_init_rcu - hlist_nulls_del_rcu - hlist_nulls_add_head_rcu - hlist_bl_add_head_rcu - hlist_bl_del_init_rcu - hlist_bl_del_rcu - hlist_bl_set_first_rcu - -RCU: Critical sections Grace period Barrier - - rcu_read_lock synchronize_net rcu_barrier - rcu_read_unlock synchronize_rcu - rcu_dereference synchronize_rcu_expedited - rcu_read_lock_held call_rcu - rcu_dereference_check kfree_rcu - rcu_dereference_protected - -bh: Critical sections Grace period Barrier - - rcu_read_lock_bh call_rcu rcu_barrier - rcu_read_unlock_bh synchronize_rcu - [local_bh_disable] synchronize_rcu_expedited - [and friends] - rcu_dereference_bh - rcu_dereference_bh_check - rcu_dereference_bh_protected - rcu_read_lock_bh_held - -sched: Critical sections Grace period Barrier - - rcu_read_lock_sched call_rcu rcu_barrier - rcu_read_unlock_sched synchronize_rcu - [preempt_disable] synchronize_rcu_expedited - [and friends] - rcu_read_lock_sched_notrace - rcu_read_unlock_sched_notrace - rcu_dereference_sched - rcu_dereference_sched_check - rcu_dereference_sched_protected - rcu_read_lock_sched_held - - -SRCU: Critical sections Grace period Barrier - - srcu_read_lock call_srcu srcu_barrier - srcu_read_unlock synchronize_srcu - srcu_dereference synchronize_srcu_expedited - srcu_dereference_check - srcu_read_lock_held - -SRCU: Initialization/cleanup - DEFINE_SRCU - DEFINE_STATIC_SRCU - init_srcu_struct - cleanup_srcu_struct - -All: lockdep-checked RCU-protected pointer access - - rcu_access_pointer - rcu_dereference_raw - RCU_LOCKDEP_WARN - rcu_sleep_check - RCU_NONIDLE - -See the comment headers in the source code (or the docbook generated -from them) for more information. - -However, given that there are no fewer than four families of RCU APIs -in the Linux kernel, how do you choose which one to use? The following -list can be helpful: - -a. Will readers need to block? If so, you need SRCU. - -b. What about the -rt patchset? If readers would need to block - in an non-rt kernel, you need SRCU. If readers would block - in a -rt kernel, but not in a non-rt kernel, SRCU is not - necessary. (The -rt patchset turns spinlocks into sleeplocks, - hence this distinction.) - -c. Do you need to treat NMI handlers, hardirq handlers, - and code segments with preemption disabled (whether - via preempt_disable(), local_irq_save(), local_bh_disable(), - or some other mechanism) as if they were explicit RCU readers? - If so, RCU-sched is the only choice that will work for you. - -d. Do you need RCU grace periods to complete even in the face - of softirq monopolization of one or more of the CPUs? For - example, is your code subject to network-based denial-of-service - attacks? If so, you should disable softirq across your readers, - for example, by using rcu_read_lock_bh(). - -e. Is your workload too update-intensive for normal use of - RCU, but inappropriate for other synchronization mechanisms? - If so, consider SLAB_TYPESAFE_BY_RCU (which was originally - named SLAB_DESTROY_BY_RCU). But please be careful! - -f. Do you need read-side critical sections that are respected - even though they are in the middle of the idle loop, during - user-mode execution, or on an offlined CPU? If so, SRCU is the - only choice that will work for you. - -g. Otherwise, use RCU. - -Of course, this all assumes that you have determined that RCU is in fact -the right tool for your job. - - -8. ANSWERS TO QUICK QUIZZES - -Quick Quiz #1: Why is this argument naive? How could a deadlock - occur when using this algorithm in a real-world Linux - kernel? [Referring to the lock-based "toy" RCU - algorithm.] - -Answer: Consider the following sequence of events: - - 1. CPU 0 acquires some unrelated lock, call it - "problematic_lock", disabling irq via - spin_lock_irqsave(). - - 2. CPU 1 enters synchronize_rcu(), write-acquiring - rcu_gp_mutex. - - 3. CPU 0 enters rcu_read_lock(), but must wait - because CPU 1 holds rcu_gp_mutex. - - 4. CPU 1 is interrupted, and the irq handler - attempts to acquire problematic_lock. - - The system is now deadlocked. - - One way to avoid this deadlock is to use an approach like - that of CONFIG_PREEMPT_RT, where all normal spinlocks - become blocking locks, and all irq handlers execute in - the context of special tasks. In this case, in step 4 - above, the irq handler would block, allowing CPU 1 to - release rcu_gp_mutex, avoiding the deadlock. - - Even in the absence of deadlock, this RCU implementation - allows latency to "bleed" from readers to other - readers through synchronize_rcu(). To see this, - consider task A in an RCU read-side critical section - (thus read-holding rcu_gp_mutex), task B blocked - attempting to write-acquire rcu_gp_mutex, and - task C blocked in rcu_read_lock() attempting to - read_acquire rcu_gp_mutex. Task A's RCU read-side - latency is holding up task C, albeit indirectly via - task B. - - Realtime RCU implementations therefore use a counter-based - approach where tasks in RCU read-side critical sections - cannot be blocked by tasks executing synchronize_rcu(). - -Quick Quiz #2: Give an example where Classic RCU's read-side - overhead is -negative-. - -Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT - kernel where a routing table is used by process-context - code, but can be updated by irq-context code (for example, - by an "ICMP REDIRECT" packet). The usual way of handling - this would be to have the process-context code disable - interrupts while searching the routing table. Use of - RCU allows such interrupt-disabling to be dispensed with. - Thus, without RCU, you pay the cost of disabling interrupts, - and with RCU you don't. - - One can argue that the overhead of RCU in this - case is negative with respect to the single-CPU - interrupt-disabling approach. Others might argue that - the overhead of RCU is merely zero, and that replacing - the positive overhead of the interrupt-disabling scheme - with the zero-overhead RCU scheme does not constitute - negative overhead. - - In real life, of course, things are more complex. But - even the theoretical possibility of negative overhead for - a synchronization primitive is a bit unexpected. ;-) - -Quick Quiz #3: If it is illegal to block in an RCU read-side - critical section, what the heck do you do in - PREEMPT_RT, where normal spinlocks can block??? - -Answer: Just as PREEMPT_RT permits preemption of spinlock - critical sections, it permits preemption of RCU - read-side critical sections. It also permits - spinlocks blocking while in RCU read-side critical - sections. - - Why the apparent inconsistency? Because it is - possible to use priority boosting to keep the RCU - grace periods short if need be (for example, if running - short of memory). In contrast, if blocking waiting - for (say) network reception, there is no way to know - what should be boosted. Especially given that the - process we need to boost might well be a human being - who just went out for a pizza or something. And although - a computer-operated cattle prod might arouse serious - interest, it might also provoke serious objections. - Besides, how does the computer know what pizza parlor - the human being went to??? - - -ACKNOWLEDGEMENTS - -My thanks to the people who helped make this human-readable, including -Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern. - - -For more information, see http://www.rdrop.com/users/paulmck/RCU. --- a/Documentation/admin-guide/acpi/fan_performance_states.rst +++ b/Documentation/admin-guide/acpi/fan_performance_states.rst @@ -0,0 +1,62 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=========================== +ACPI Fan Performance States +=========================== + +When the optional _FPS object is present under an ACPI device representing a +fan (for example, PNP0C0B or INT3404), the ACPI fan driver creates additional +"state*" attributes in the sysfs directory of the ACPI device in question. +These attributes list properties of fan performance states. + +For more information on _FPS refer to the ACPI specification at: + +http://uefi.org/specifications + +For instance, the contents of the INT3404 ACPI device sysfs directory +may look as follows:: + + $ ls -l /sys/bus/acpi/devices/INT3404:00/ + total 0 +... + -r--r--r-- 1 root root 4096 Dec 13 20:38 state0 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state1 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state10 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state11 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state2 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state3 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state4 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state5 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state6 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state7 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state8 + -r--r--r-- 1 root root 4096 Dec 13 20:38 state9 + -r--r--r-- 1 root root 4096 Dec 13 01:00 status + ... + +where each of the "state*" files represents one performance state of the fan +and contains a colon-separated list of 5 integer numbers (fields) with the +following interpretation:: + +control_percent:trip_point_index:speed_rpm:noise_level_mdb:power_mw + +* ``control_percent``: The percent value to be used to set the fan speed to a + specific level using the _FSL object (0-100). + +* ``trip_point_index``: The active cooling trip point number that corresponds + to this performance state (0-9). + +* ``speed_rpm``: Speed of the fan in rotations per minute. + +* ``noise_level_mdb``: Audible noise emitted by the fan in this state in + millidecibels. + +* ``power_mw``: Power draw of the fan in this state in milliwatts. + +For example:: + + $cat /sys/bus/acpi/devices/INT3404:00/state1 + 25:0:3200:12500:1250 + +When a given field is not populated or its value provided by the platform +firmware is invalid, the "not-defined" string is shown instead of the value. --- a/Documentation/admin-guide/acpi/index.rst +++ b/Documentation/admin-guide/acpi/index.rst @@ -12,3 +12,4 @@ dsdt-override ssdt-overlays cppc_sysfs + fan_performance_states --- a/Documentation/admin-guide/blockdev/zram.rst +++ b/Documentation/admin-guide/blockdev/zram.rst @@ -1,15 +1,15 @@ ======================================== -zram: Compressed RAM based block devices +zram: Compressed RAM-based block devices ======================================== Introduction ============ -The zram module creates RAM based block devices named /dev/zram +The zram module creates RAM-based block devices named /dev/zram ( = 0, 1, ...). Pages written to these disks are compressed and stored in memory itself. These disks allow very fast I/O and compression provides -good amounts of memory savings. Some of the usecases include /tmp storage, -use as swap disks, various caches under /var and maybe many more :) +good amounts of memory savings. Some of the use cases include /tmp storage, +use as swap disks, various caches under /var and maybe many more. :) Statistics for individual zram devices are exported through sysfs nodes at /sys/block/zram/ @@ -43,17 +43,17 @@ ======== ============================================================= -EBUSY an attempt to modify an attribute that cannot be changed once - the device has been initialised. Please reset device first; + the device has been initialised. Please reset device first. -ENOMEM zram was not able to allocate enough memory to fulfil your - needs; + needs. -EINVAL invalid input has been provided. ======== ============================================================= -If you use 'echo', the returned value that is changed by 'echo' utility, +If you use 'echo', the returned value is set by the 'echo' utility, and, in general case, something like:: echo 3 > /sys/block/zram0/max_comp_streams - if [ $? -ne 0 ]; + if [ $? -ne 0 ]; then handle_error fi @@ -65,7 +65,8 @@ :: modprobe zram num_devices=4 - This creates 4 devices: /dev/zram{0,1,2,3} + +This creates 4 devices: /dev/zram{0,1,2,3} num_devices parameter is optional and tells zram how many devices should be pre-created. Default: 1. @@ -73,12 +74,12 @@ 2) Set max number of compression streams ======================================== -Regardless the value passed to this attribute, ZRAM will always -allocate multiple compression streams - one per online CPUs - thus +Regardless of the value passed to this attribute, ZRAM will always +allocate multiple compression streams - one per online CPU - thus allowing several concurrent compression operations. The number of allocated compression streams goes down when some of the CPUs become offline. There is no single-compression-stream mode anymore, -unless you are running a UP system or has only 1 CPU online. +unless you are running a UP system or have only 1 CPU online. To find out how many streams are currently available:: @@ -89,7 +90,7 @@ Using comp_algorithm device attribute one can see available and currently selected (shown in square brackets) compression algorithms, -change selected compression algorithm (once the device is initialised +or change the selected compression algorithm (once the device is initialised there is no way to change compression algorithm). Examples:: @@ -167,9 +168,9 @@ zram provides a control interface, which enables dynamic (on-demand) device addition and removal. -In order to add a new /dev/zramX device, perform read operation on hot_add -attribute. This will return either new device's device id (meaning that you -can use /dev/zram) or error code. +In order to add a new /dev/zramX device, perform a read operation on the hot_add +attribute. This will return either the new device's device id (meaning that you +can use /dev/zram) or an error code. Example:: @@ -186,8 +187,8 @@ Per-device statistics are exported as various nodes under /sys/block/zram/ -A brief description of exported device attributes. For more details please -read Documentation/ABI/testing/sysfs-block-zram. +A brief description of exported device attributes follows. For more details +please read Documentation/ABI/testing/sysfs-block-zram. ====================== ====== =============================================== Name access description @@ -245,7 +246,7 @@ File /sys/block/zram/mm_stat -The stat file represents device's mm statistics. It consists of a single +The mm_stat file represents the device's mm statistics. It consists of a single line of text and contains the following stats separated by whitespace: ================ ============================================================= @@ -261,7 +262,7 @@ Unit: bytes mem_limit the maximum amount of memory ZRAM can use to store the compressed data - mem_used_max the maximum amount of memory zram have consumed to + mem_used_max the maximum amount of memory zram has consumed to store the data same_pages the number of same element filled pages written to this disk. No memory is allocated for such pages. @@ -271,7 +272,7 @@ File /sys/block/zram/bd_stat -The stat file represents device's backing device statistics. It consists of +The bd_stat file represents a device's backing device statistics. It consists of a single line of text and contains the following stats separated by whitespace: ============== ============================================================= @@ -316,9 +317,9 @@ echo /dev/sda5 > /sys/block/zramX/backing_dev before disksize setting. It supports only partition at this moment. -If admin want to use incompressible page writeback, they could do via:: +If admin wants to use incompressible page writeback, they could do via:: - echo huge > /sys/block/zramX/write + echo huge > /sys/block/zramX/writeback To use idle page writeback, first, user need to declare zram pages as idle:: @@ -326,7 +327,7 @@ echo all > /sys/block/zramX/idle From now on, any pages on zram are idle pages. The idle mark -will be removed until someone request access of the block. +will be removed until someone requests access of the block. IOW, unless there is access request, those pages are still idle pages. Admin can request writeback of those idle pages at right timing via:: @@ -341,16 +342,16 @@ To overcome the concern, zram supports "writeback_limit" feature. The "writeback_limit_enable"'s default value is 0 so that it doesn't limit -any writeback. IOW, if admin want to apply writeback budget, he should +any writeback. IOW, if admin wants to apply writeback budget, he should enable writeback_limit_enable via:: $ echo 1 > /sys/block/zramX/writeback_limit_enable Once writeback_limit_enable is set, zram doesn't allow any writeback -until admin set the budget via /sys/block/zramX/writeback_limit. +until admin sets the budget via /sys/block/zramX/writeback_limit. (If admin doesn't enable writeback_limit_enable, writeback_limit's value -assigned via /sys/block/zramX/writeback_limit is meaninless.) +assigned via /sys/block/zramX/writeback_limit is meaningless.) If admin want to limit writeback as per-day 400M, he could do it like below:: @@ -361,13 +362,13 @@ /sys/block/zram0/writeback_limit. $ echo 1 > /sys/block/zram0/writeback_limit_enable -If admin want to allow further write again once the bugdet is exausted, +If admins want to allow further write again once the bugdet is exhausted, he could do it like below:: $ echo $((400<>4K_SHIFT)) > \ /sys/block/zram0/writeback_limit -If admin want to see remaining writeback budget since he set:: +If admin wants to see remaining writeback budget since last set:: $ cat /sys/block/zramX/writeback_limit @@ -375,12 +376,12 @@ $ echo 0 > /sys/block/zramX/writeback_limit_enable -The writeback_limit count will reset whenever you reset zram(e.g., +The writeback_limit count will reset whenever you reset zram (e.g., system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of writeback happened until you reset the zram to allocate extra writeback budget in next setting is user's job. -If admin want to measure writeback count in a certain period, he could +If admin wants to measure writeback count in a certain period, he could know it via /sys/block/zram0/bd_stat's 3rd column. memory tracking --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -61,6 +61,8 @@ 5-6. Device 5-7. RDMA 5-7-1. RDMA Interface Files + 5-8. HugeTLB + 5.8-1. HugeTLB Interface Files 5-8. Misc 5-8-1. perf_event 5-N. Non-normative information @@ -2056,6 +2058,33 @@ mlx4_0 hca_handle=1 hca_object=20 ocrdma1 hca_handle=1 hca_object=23 +HugeTLB +------- + +The HugeTLB controller allows to limit the HugeTLB usage per control group and +enforces the controller limit during page fault. + +HugeTLB Interface Files +~~~~~~~~~~~~~~~~~~~~~~~ + + hugetlb..current + Show current usage for "hugepagesize" hugetlb. It exists for all + the cgroup except root. + + hugetlb..max + Set/show the hard limit of "hugepagesize" hugetlb usage. + The default value is "max". It exists for all the cgroup except root. + + hugetlb..events + A read-only flat-keyed file which exists on non-root cgroups. + + max + The number of allocation failure due to HugeTLB limit + + hugetlb..events.local + Similar to hugetlb..events but the fields in the file + are local to the cgroup i.e. not hierarchical. The file modified event + generated on this file reflects only the local events. Misc ---- --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -76,6 +76,7 @@ device-mapper/index efi-stub ext4 + nfs/index gpio/index highuid hw_random --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -511,7 +511,7 @@ 1 -- check protection requested by application. Default value is set via a kernel config option. Value can be changed at runtime via - /selinux/checkreqprot. + /sys/fs/selinux/checkreqprot. cio_ignore= [S390] See Documentation/s390/common_io.rst for details. @@ -1165,10 +1165,10 @@ efi= [EFI] Format: { "old_map", "nochunk", "noruntime", "debug", - "nosoftreserve" } + "nosoftreserve", "disable_early_pci_dma", + "no_disable_early_pci_dma" } old_map [X86-64]: switch to the old ioremap-based EFI - runtime services mapping. 32-bit still uses this one by - default. + runtime services mapping. [Needs CONFIG_X86_UV=y] nochunk: disable reading files in "chunks" in the EFI boot stub, as chunking can cause problems with some firmware implementations. @@ -1180,6 +1180,10 @@ claim. Specify efi=nosoftreserve to disable this reservation and treat the memory by its base type (i.e. EFI_CONVENTIONAL_MEMORY / "System RAM"). + disable_early_pci_dma: Disable the busmaster bit on all + PCI bridges while in the EFI boot stub + no_disable_early_pci_dma: Leave the busmaster bit set + on all PCI bridges while in the EFI boot stub efi_no_storage_paranoia [EFI; X86] Using this parameter you can use more than 50% of @@ -1245,7 +1249,8 @@ 0 -- permissive (log only, no denials). 1 -- enforcing (deny and log). Default value is 0. - Value can be changed at runtime via /selinux/enforce. + Value can be changed at runtime via + /sys/fs/selinux/enforce. erst_disable [ACPI] Disable Error Record Serialization Table (ERST) @@ -1933,9 +1938,31 @@ begins at 0 and the maximum value is "number of CPUs in system - 1". - The format of is described above. + managed_irq + + Isolate from being targeted by managed interrupts + which have an interrupt mask containing isolated + CPUs. The affinity of managed interrupts is + handled by the kernel and cannot be changed via + the /proc/irq/* interfaces. + + This isolation is best effort and only effective + if the automatically assigned interrupt mask of a + device queue contains isolated and housekeeping + CPUs. If housekeeping CPUs are online then such + interrupts are directed to the housekeeping CPU + so that IO submitted on the housekeeping CPU + cannot disturb the isolated CPU. + If a queue's affinity mask contains only isolated + CPUs then this parameter has no effect on the + interrupt routing decision, though interrupts are + only delivered when tasks running on those + isolated CPUs submit IO. IO submitted on + housekeeping CPUs has no influence on those + queues. + The format of is described above. iucv= [HW,NET] @@ -3978,6 +4005,19 @@ test until boot completes in order to avoid interference. + rcuperf.kfree_rcu_test= [KNL] + Set to measure performance of kfree_rcu() flooding. + + rcuperf.kfree_nthreads= [KNL] + The number of threads running loops of kfree_rcu(). + + rcuperf.kfree_alloc_num= [KNL] + Number of allocations and frees done in an iteration. + + rcuperf.kfree_loops= [KNL] + Number of loops doing rcuperf.kfree_alloc_num number + of allocations and frees. + rcuperf.nreaders= [KNL] Set number of RCU readers. The value -1 selects N, where N is the number of CPUs. A value @@ -4348,9 +4388,7 @@ See security/selinux/Kconfig help text. 0 -- disable. 1 -- enable. - Default value is set via kernel config option. - If enabled at boot time, /selinux/disable can be used - later to disable prior to initial policy load. + Default value is 1. apparmor= [APPARMOR] Disable or enable AppArmor at boot time Format: { "0" | "1" } --- a/Documentation/admin-guide/nfs/fault_injection.rst +++ b/Documentation/admin-guide/nfs/fault_injection.rst @@ -0,0 +1,70 @@ +=================== +NFS Fault Injection +=================== + +Fault injection is a method for forcing errors that may not normally occur, or +may be difficult to reproduce. Forcing these errors in a controlled environment +can help the developer find and fix bugs before their code is shipped in a +production system. Injecting an error on the Linux NFS server will allow us to +observe how the client reacts and if it manages to recover its state correctly. + +NFSD_FAULT_INJECTION must be selected when configuring the kernel to use this +feature. + + +Using Fault Injection +===================== +On the client, mount the fault injection server through NFS v4.0+ and do some +work over NFS (open files, take locks, ...). + +On the server, mount the debugfs filesystem to and ls +/nfsd. This will show a list of files that will be used for +injecting faults on the NFS server. As root, write a number n to the file +corresponding to the action you want the server to take. The server will then +process the first n items it finds. So if you want to forget 5 locks, echo '5' +to /nfsd/forget_locks. A value of 0 will tell the server to forget +all corresponding items. A log message will be created containing the number +of items forgotten (check dmesg). + +Go back to work on the client and check if the client recovered from the error +correctly. + + +Available Faults +================ +forget_clients: + The NFS server keeps a list of clients that have placed a mount call. If + this list is cleared, the server will have no knowledge of who the client + is, forcing the client to reauthenticate with the server. + +forget_openowners: + The NFS server keeps a list of what files are currently opened and who + they were opened by. Clearing this list will force the client to reopen + its files. + +forget_locks: + The NFS server keeps a list of what files are currently locked in the VFS. + Clearing this list will force the client to reclaim its locks (files are + unlocked through the VFS as they are cleared from this list). + +forget_delegations: + A delegation is used to assure the client that a file, or part of a file, + has not changed since the delegation was awarded. Clearing this list will + force the client to reacquire its delegation before accessing the file + again. + +recall_delegations: + Delegations can be recalled by the server when another client attempts to + access a file. This test will notify the client that its delegation has + been revoked, forcing the client to reacquire the delegation before using + the file again. + + +tools/nfs/inject_faults.sh script +================================= +This script has been created to ease the fault injection process. This script +will detect the mounted debugfs directory and write to the files located there +based on the arguments passed by the user. For example, running +`inject_faults.sh forget_locks 1` as root will instruct the server to forget +one lock. Running `inject_faults forget_locks` will instruct the server to +forgetall locks. --- a/Documentation/admin-guide/nfs/index.rst +++ b/Documentation/admin-guide/nfs/index.rst @@ -0,0 +1,15 @@ +============= +NFS +============= + +.. toctree:: + :maxdepth: 1 + + nfs-client + nfsroot + nfs-rdma + nfsd-admin-interfaces + nfs-idmapper + pnfs-block-server + pnfs-scsi-server + fault_injection --- a/Documentation/admin-guide/nfs/nfs-client.rst +++ b/Documentation/admin-guide/nfs/nfs-client.rst @@ -0,0 +1,141 @@ +========== +NFS Client +========== + +The NFS client +============== + +The NFS version 2 protocol was first documented in RFC1094 (March 1989). +Since then two more major releases of NFS have been published, with NFSv3 +being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April +2003). + +The Linux NFS client currently supports all the above published versions, +and work is in progress on adding support for minor version 1 of the NFSv4 +protocol. + +The purpose of this document is to provide information on some of the +special features of the NFS client that can be configured by system +administrators. + + +The nfs4_unique_id parameter +============================ + +NFSv4 requires clients to identify themselves to servers with a unique +string. File open and lock state shared between one client and one server +is associated with this identity. To support robust NFSv4 state recovery +and transparent state migration, this identity string must not change +across client reboots. + +Without any other intervention, the Linux client uses a string that contains +the local system's node name. System administrators, however, often do not +take care to ensure that node names are fully qualified and do not change +over the lifetime of a client system. Node names can have other +administrative requirements that require particular behavior that does not +work well as part of an nfs_client_id4 string. + +The nfs.nfs4_unique_id boot parameter specifies a unique string that can be +used instead of a system's node name when an NFS client identifies itself to +a server. Thus, if the system's node name is not unique, or it changes, its +nfs.nfs4_unique_id stays the same, preventing collision with other clients +or loss of state during NFS reboot recovery or transparent state migration. + +The nfs.nfs4_unique_id string is typically a UUID, though it can contain +anything that is believed to be unique across all NFS clients. An +nfs4_unique_id string should be chosen when a client system is installed, +just as a system's root file system gets a fresh UUID in its label at +install time. + +The string should remain fixed for the lifetime of the client. It can be +changed safely if care is taken that the client shuts down cleanly and all +outstanding NFSv4 state has expired, to prevent loss of NFSv4 state. + +This string can be stored in an NFS client's grub.conf, or it can be provided +via a net boot facility such as PXE. It may also be specified as an nfs.ko +module parameter. Specifying a uniquifier string is not support for NFS +clients running in containers. + + +The DNS resolver +================ + +NFSv4 allows for one server to refer the NFS client to data that has been +migrated onto another server by means of the special "fs_locations" +attribute. See `RFC3530 Section 6: Filesystem Migration and Replication`_ and +`Implementation Guide for Referrals in NFSv4`_. + +.. _RFC3530 Section 6\: Filesystem Migration and Replication: http://tools.ietf.org/html/rfc3530#section-6 +.. _Implementation Guide for Referrals in NFSv4: http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 + +The fs_locations information can take the form of either an ip address and +a path, or a DNS hostname and a path. The latter requires the NFS client to +do a DNS lookup in order to mount the new volume, and hence the need for an +upcall to allow userland to provide this service. + +Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual +/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps: + + (1) The process checks the dns_resolve cache to see if it contains a + valid entry. If so, it returns that entry and exits. + + (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' + (may be changed using the 'nfs.cache_getent' kernel boot parameter) + is run, with two arguments: + - the cache name, "dns_resolve" + - the hostname to resolve + + (3) After looking up the corresponding ip address, the helper script + writes the result into the rpc_pipefs pseudo-file + '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel' + in the following (text) format: + + " \n" + + Where is in the usual IPv4 (123.456.78.90) or IPv6 + (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format. + is identical to the second argument of the helper + script, and is the 'time to live' of this cache entry (in + units of seconds). + + .. note:: + If is invalid, say the string "0", then a negative + entry is created, which will cause the kernel to treat the hostname + as having no valid DNS translation. + + + + +A basic sample /sbin/nfs_cache_getent +===================================== +.. code-block:: sh + + #!/bin/bash + # + ttl=600 + # + cut=/usr/bin/cut + getent=/usr/bin/getent + rpc_pipefs=/var/lib/nfs/rpc_pipefs + # + die() + { + echo "Usage: $0 cache_name entry_name" + exit 1 + } + + [ $# -lt 2 ] && die + cachename="$1" + cache_path=${rpc_pipefs}/cache/${cachename}/channel + + case "${cachename}" in + dns_resolve) + name="$2" + result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" + [ -z "${result}" ] && result="0" + ;; + *) + die + ;; + esac + echo "${result} ${name} ${ttl}" >${cache_path} --- a/Documentation/admin-guide/nfs/nfs-idmapper.rst +++ b/Documentation/admin-guide/nfs/nfs-idmapper.rst @@ -0,0 +1,78 @@ +============= +NFS ID Mapper +============= + +Id mapper is used by NFS to translate user and group ids into names, and to +translate user and group names into ids. Part of this translation involves +performing an upcall to userspace to request the information. There are two +ways NFS could obtain this information: placing a call to /sbin/request-key +or by placing a call to the rpc.idmap daemon. + +NFS will attempt to call /sbin/request-key first. If this succeeds, the +result will be cached using the generic request-key cache. This call should +only fail if /etc/request-key.conf is not configured for the id_resolver key +type, see the "Configuring" section below if you wish to use the request-key +method. + +If the call to /sbin/request-key fails (if /etc/request-key.conf is not +configured with the id_resolver key type), then the idmapper will ask the +legacy rpc.idmap daemon for the id mapping. This result will be stored +in a custom NFS idmap cache. + + +Configuring +=========== + +The file /etc/request-key.conf will need to be modified so /sbin/request-key can +direct the upcall. The following line should be added: + +``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...`` +``#====== ======= =============== =============== ===============================`` +``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600`` + + +This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap. +The last parameter, 600, defines how many seconds into the future the key will +expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout +is not specified, nfs.idmap will default to 600 seconds. + +id mapper uses for key descriptions:: + + uid: Find the UID for the given user + gid: Find the GID for the given group + user: Find the user name for the given UID + group: Find the group name for the given GID + +You can handle any of these individually, rather than using the generic upcall +program. If you would like to use your own program for a uid lookup then you +would edit your request-key.conf so it look similar to this: + +``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...`` +``#====== ======= =============== =============== ===============================`` +``create id_resolver uid:* * /some/other/program %k %d 600`` +``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600`` + + +Notice that the new line was added above the line for the generic program. +request-key will find the first matching line and corresponding program. In +this case, /some/other/program will handle all uid lookups and +/usr/sbin/nfs.idmap will handle gid, user, and group lookups. + +See Documentation/security/keys/request-key.rst for more information +about the request-key function. + + +nfs.idmap +========= + +nfs.idmap is designed to be called by request-key, and should not be run "by +hand". This program takes two arguments, a serialized key and a key +description. The serialized key is first converted into a key_serial_t, and +then passed as an argument to keyctl_instantiate (both are part of keyutils.h). + +The actual lookups are performed by functions found in nfsidmap.h. nfs.idmap +determines the correct function to call by looking at the first part of the +description string. For example, a uid lookup description will appear as +"uid:user@domain". + +nfs.idmap will return 0 if the key was instantiated, and non-zero otherwise. --- a/Documentation/admin-guide/nfs/nfs-rdma.rst +++ b/Documentation/admin-guide/nfs/nfs-rdma.rst @@ -0,0 +1,292 @@ +=================== +Setting up NFS/RDMA +=================== + +:Author: + NetApp and Open Grid Computing (May 29, 2008) + +.. warning:: + This document is probably obsolete. + +Overview +======== + +This document describes how to install and setup the Linux NFS/RDMA client +and server software. + +The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server +was first included in the following release, Linux 2.6.25. + +In our testing, we have obtained excellent performance results (full 10Gbit +wire bandwidth at minimal client CPU) under many workloads. The code passes +the full Connectathon test suite and operates over both Infiniband and iWARP +RDMA adapters. + +Getting Help +============ + +If you get stuck, you can ask questions on the +nfs-rdma-devel@lists.sourceforge.net mailing list. + +Installation +============ + +These instructions are a step by step guide to building a machine for +use with NFS/RDMA. + +- Install an RDMA device + + Any device supported by the drivers in drivers/infiniband/hw is acceptable. + + Testing has been performed using several Mellanox-based IB cards, the + Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. + +- Install a Linux distribution and tools + + The first kernel release to contain both the NFS/RDMA client and server was + Linux 2.6.25 Therefore, a distribution compatible with this and subsequent + Linux kernel release should be installed. + + The procedures described in this document have been tested with + distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). + +- Install nfs-utils-1.1.2 or greater on the client + + An NFS/RDMA mount point can be obtained by using the mount.nfs command in + nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils + version with support for NFS/RDMA mounts, but for various reasons we + recommend using nfs-utils-1.1.2 or greater). To see which version of + mount.nfs you are using, type: + + .. code-block:: sh + + $ /sbin/mount.nfs -V + + If the version is less than 1.1.2 or the command does not exist, + you should install the latest version of nfs-utils. + + Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs + + Uncompress the package and follow the installation instructions. + + If you will not need the idmapper and gssd executables (you do not need + these to create an NFS/RDMA enabled mount command), the installation + process can be simplified by disabling these features when running + configure: + + .. code-block:: sh + + $ ./configure --disable-gss --disable-nfsv4 + + To build nfs-utils you will need the tcp_wrappers package installed. For + more information on this see the package's README and INSTALL files. + + After building the nfs-utils package, there will be a mount.nfs binary in + the utils/mount directory. This binary can be used to initiate NFS v2, v3, + or v4 mounts. To initiate a v4 mount, the binary must be called + mount.nfs4. The standard technique is to create a symlink called + mount.nfs4 to mount.nfs. + + This mount.nfs binary should be installed at /sbin/mount.nfs as follows: + + .. code-block:: sh + + $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs + + In this location, mount.nfs will be invoked automatically for NFS mounts + by the system mount command. + + .. note:: + mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed + on the NFS client machine. You do not need this specific version of + nfs-utils on the server. Furthermore, only the mount.nfs command from + nfs-utils-1.1.2 is needed on the client. + +- Install a Linux kernel with NFS/RDMA + + The NFS/RDMA client and server are both included in the mainline Linux + kernel version 2.6.25 and later. This and other versions of the Linux + kernel can be found at: https://www.kernel.org/pub/linux/kernel/ + + Download the sources and place them in an appropriate location. + +- Configure the RDMA stack + + Make sure your kernel configuration has RDMA support enabled. Under + Device Drivers -> InfiniBand support, update the kernel configuration + to enable InfiniBand support [NOTE: the option name is misleading. Enabling + InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. + + Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or + iWARP adapter support (amso, cxgb3, etc.). + + If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. + +- Configure the NFS client and server + + Your kernel configuration must also have NFS file system support and/or + NFS server support enabled. These and other NFS related configuration + options can be found under File Systems -> Network File Systems. + +- Build, install, reboot + + The NFS/RDMA code will be enabled automatically if NFS and RDMA + are turned on. The NFS/RDMA client and server are configured via the hidden + SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The + value of SUNRPC_XPRT_RDMA will be: + + #. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client + and server will not be built + + #. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, + in this case the NFS/RDMA client and server will be built as modules + + #. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client + and server will be built into the kernel + + Therefore, if you have followed the steps above and turned no NFS and RDMA, + the NFS/RDMA client and server will be built. + + Build a new kernel, install it, boot it. + +Check RDMA and NFS Setup +======================== + +Before configuring the NFS/RDMA software, it is a good idea to test +your new kernel to ensure that the kernel is working correctly. +In particular, it is a good idea to verify that the RDMA stack +is functioning as expected and standard NFS over TCP/IP and/or UDP/IP +is working properly. + +- Check RDMA Setup + + If you built the RDMA components as modules, load them at + this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel + card: + + .. code-block:: sh + + $ modprobe ib_mthca + $ modprobe ib_ipoib + + If you are using InfiniBand, make sure there is a Subnet Manager (SM) + running on the network. If your IB switch has an embedded SM, you can + use it. Otherwise, you will need to run an SM, such as OpenSM, on one + of your end nodes. + + If an SM is running on your network, you should see the following: + + .. code-block:: sh + + $ cat /sys/class/infiniband/driverX/ports/1/state + 4: ACTIVE + + where driverX is mthca0, ipath5, ehca3, etc. + + To further test the InfiniBand software stack, use IPoIB (this + assumes you have two IB hosts named host1 and host2): + + .. code-block:: sh + + host1$ ip link set dev ib0 up + host1$ ip address add dev ib0 a.b.c.x + host2$ ip link set dev ib0 up + host2$ ip address add dev ib0 a.b.c.y + host1$ ping a.b.c.y + host2$ ping a.b.c.x + + For other device types, follow the appropriate procedures. + +- Check NFS Setup + + For the NFS components enabled above (client and/or server), + test their functionality over standard Ethernet using TCP/IP or UDP/IP. + +NFS/RDMA Setup +============== + +We recommend that you use two machines, one to act as the client and +one to act as the server. + +One time configuration: +----------------------- + +- On the server system, configure the /etc/exports file and start the NFS/RDMA server. + + Exports entries with the following formats have been tested:: + + /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) + /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) + + The IP address(es) is(are) the client's IPoIB address for an InfiniBand + HCA or the client's iWARP address(es) for an RNIC. + + .. note:: + The "insecure" option must be used because the NFS/RDMA client does + not use a reserved port. + +Each time a machine boots: +-------------------------- + +- Load and configure the RDMA drivers + + For InfiniBand using a Mellanox adapter: + + .. code-block:: sh + + $ modprobe ib_mthca + $ modprobe ib_ipoib + $ ip li set dev ib0 up + $ ip addr add dev ib0 a.b.c.d + + .. note:: + Please use unique addresses for the client and server! + +- Start the NFS server + + If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA transport module: + + .. code-block:: sh + + $ modprobe svcrdma + + Regardless of how the server was built (module or built-in), start the + server: + + .. code-block:: sh + + $ /etc/init.d/nfs start + + or + + .. code-block:: sh + + $ service nfs start + + Instruct the server to listen on the RDMA transport: + + .. code-block:: sh + + $ echo rdma 20049 > /proc/fs/nfsd/portlist + +- On the client system + + If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA client module: + + .. code-block:: sh + + $ modprobe xprtrdma.ko + + Regardless of how the client was built (module or built-in), use this + command to mount the NFS/RDMA server: + + .. code-block:: sh + + $ mount -o rdma,port=20049 :/ /mnt + + To verify that the mount is using RDMA, run "cat /proc/mounts" and check + the "proto" field for the given mount. + + Congratulations! You're using NFS/RDMA! --- a/Documentation/admin-guide/nfs/nfsd-admin-interfaces.rst +++ b/Documentation/admin-guide/nfs/nfsd-admin-interfaces.rst @@ -0,0 +1,40 @@ +================================== +Administrative interfaces for nfsd +================================== + +Note that normally these interfaces are used only by the utilities in +nfs-utils. + +nfsd is controlled mainly by pseudofiles under the "nfsd" filesystem, +which is normally mounted at /proc/fs/nfsd/. + +The server is always started by the first write of a nonzero value to +nfsd/threads. + +Before doing that, NFSD can be told which sockets to listen on by +writing to nfsd/portlist; that write may be: + + - an ascii-encoded file descriptor, which should refer to a + bound (and listening, for tcp) socket, or + - "transportname port", where transportname is currently either + "udp", "tcp", or "rdma". + +If nfsd is started without doing any of these, then it will create one +udp and one tcp listener at port 2049 (see nfsd_init_socks). + +On startup, nfsd and lockd grace periods start. nfsd is shut down by a write of +0 to nfsd/threads. All locks and state are thrown away at that point. + +Between startup and shutdown, the number of threads may be adjusted up +or down by additional writes to nfsd/threads or by writes to +nfsd/pool_threads. + +For more detail about files under nfsd/ and what they control, see +fs/nfsd/nfsctl.c; most of them have detailed comments. + +Implementation notes +==================== + +Note that the rpc server requires the caller to serialize addition and +removal of listening sockets, and startup and shutdown of the server. +For nfsd this is done using nfsd_mutex. --- a/Documentation/admin-guide/nfs/nfsroot.rst +++ b/Documentation/admin-guide/nfs/nfsroot.rst @@ -0,0 +1,364 @@ +=============================================== +Mounting the root filesystem via NFS (nfsroot) +=============================================== + +:Authors: + Written 1996 by Gero Kuhlmann + + Updated 1997 by Martin Mares + + Updated 2006 by Nico Schottelius + + Updated 2006 by Horms + + Updated 2018 by Chris Novakovic + + + +In order to use a diskless system, such as an X-terminal or printer server for +example, it is necessary for the root filesystem to be present on a non-disk +device. This may be an initramfs (see +Documentation/filesystems/ramfs-rootfs-initramfs.txt), a ramdisk (see +Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The +following text describes on how to use NFS for the root filesystem. For the rest +of this text 'client' means the diskless system, and 'server' means the NFS +server. + + + + +Enabling nfsroot capabilities +============================= + +In order to use nfsroot, NFS client support needs to be selected as +built-in during configuration. Once this has been selected, the nfsroot +option will become available, which should also be selected. + +In the networking options, kernel level autoconfiguration can be selected, +along with the types of autoconfiguration to support. Selecting all of +DHCP, BOOTP and RARP is safe. + + + + +Kernel command line +=================== + +When the kernel has been loaded by a boot loader (see below) it needs to be +told what root fs device to use. And in the case of nfsroot, where to find +both the server and the name of the directory on the server to mount as root. +This can be established using the following kernel command line parameters: + + +root=/dev/nfs + This is necessary to enable the pseudo-NFS-device. Note that it's not a + real device but just a synonym to tell the kernel to use NFS instead of + a real device. + + +nfsroot=[:][,] + If the `nfsroot' parameter is NOT given on the command line, + the default ``"/tftpboot/%s"`` will be used. + + Specifies the IP address of the NFS server. + The default address is determined by the ip parameter + (see below). This parameter allows the use of different + servers for IP autoconfiguration and NFS. + + Name of the directory on the server to mount as root. + If there is a "%s" token in the string, it will be + replaced by the ASCII-representation of the client's + IP address. + + Standard NFS options. All options are separated by commas. + The following defaults are used:: + + port = as given by server portmap daemon + rsize = 4096 + wsize = 4096 + timeo = 7 + retrans = 3 + acregmin = 3 + acregmax = 60 + acdirmin = 30 + acdirmax = 60 + flags = hard, nointr, noposix, cto, ac + + +ip=::::::::: + This parameter tells the kernel how to configure IP addresses of devices + and also how to set up the IP routing table. It was originally called + nfsaddrs, but now the boot-time IP configuration works independently of + NFS, so it was renamed to ip and the old name remained as an alias for + compatibility reasons. + + If this parameter is missing from the kernel command line, all fields are + assumed to be empty, and the defaults mentioned below apply. In general + this means that the kernel tries to configure everything using + autoconfiguration. + + The parameter can appear alone as the value to the ip + parameter (without all the ':' characters before). If the value is + "ip=off" or "ip=none", no autoconfiguration will take place, otherwise + autoconfiguration will take place. The most common way to use this + is "ip=dhcp". + + IP address of the client. + Default: Determined using autoconfiguration. + + IP address of the NFS server. + If RARP is used to determine + the client address and this parameter is NOT empty only + replies from the specified server are accepted. + + Only required for NFS root. That is autoconfiguration + will not be triggered if it is missing and NFS root is not + in operation. + + Value is exported to /proc/net/pnp with the prefix "bootserver " + (see below). + + Default: Determined using autoconfiguration. + The address of the autoconfiguration server is used. + + IP address of a gateway if the server is on a different subnet. + Default: Determined using autoconfiguration. + + Netmask for local network interface. + If unspecified the netmask is derived from the client IP address + assuming classful addressing. + + Default: Determined using autoconfiguration. + + Name of the client. + If a '.' character is present, anything + before the first '.' is used as the client's hostname, and anything + after it is used as its NIS domain name. May be supplied by + autoconfiguration, but its absence will not trigger autoconfiguration. + If specified and DHCP is used, the user-provided hostname (and NIS + domain name, if present) will be carried in the DHCP request; this + may cause a DNS record to be created or updated for the client. + + Default: Client IP address is used in ASCII notation. + + Name of network device to use. + Default: If the host only has one device, it is used. + Otherwise the device is determined using + autoconfiguration. This is done by sending + autoconfiguration requests out of all devices, + and using the device that received the first reply. + + Method to use for autoconfiguration. + In the case of options + which specify multiple autoconfiguration protocols, + requests are sent using all protocols, and the first one + to reply is used. + + Only autoconfiguration protocols that have been compiled + into the kernel will be used, regardless of the value of + this option:: + + off or none: don't use autoconfiguration + (do static IP assignment instead) + on or any: use any protocol available in the kernel + (default) + dhcp: use DHCP + bootp: use BOOTP + rarp: use RARP + both: use both BOOTP and RARP but not DHCP + (old option kept for backwards compatibility) + + if dhcp is used, the client identifier can be used by following + format "ip=dhcp,client-id-type,client-id-value" + + Default: any + + IP address of primary nameserver. + Value is exported to /proc/net/pnp with the prefix "nameserver " + (see below). + + Default: None if not using autoconfiguration; determined + automatically if using autoconfiguration. + + IP address of secondary nameserver. + See . + + IP address of a Network Time Protocol (NTP) server. + Value is exported to /proc/net/ipconfig/ntp_servers, but is + otherwise unused (see below). + + Default: None if not using autoconfiguration; determined + automatically if using autoconfiguration. + + After configuration (whether manual or automatic) is complete, two files + are created in the following format; lines are omitted if their respective + value is empty following configuration: + + - /proc/net/pnp: + + #PROTO: (depending on configuration method) + domain (if autoconfigured, the DNS domain) + nameserver (primary name server IP) + nameserver (secondary name server IP) + nameserver (tertiary name server IP) + bootserver (NFS server IP) + + - /proc/net/ipconfig/ntp_servers: + + (NTP server IP) + (NTP server IP) + (NTP server IP) + + and (in /proc/net/pnp) and and + (in /proc/net/ipconfig/ntp_servers) are requested during autoconfiguration; + they cannot be specified as part of the "ip=" kernel command line parameter. + + Because the "domain" and "nameserver" options are recognised by DNS + resolvers, /etc/resolv.conf is often linked to /proc/net/pnp on systems + that use an NFS root filesystem. + + Note that the kernel will not synchronise the system time with any NTP + servers it discovers; this is the responsibility of a user space process + (e.g. an initrd/initramfs script that passes the IP addresses listed in + /proc/net/ipconfig/ntp_servers to an NTP client before mounting the real + root filesystem if it is on NFS). + + +nfsrootdebug + This parameter enables debugging messages to appear in the kernel + log at boot time so that administrators can verify that the correct + NFS mount options, server address, and root path are passed to the + NFS client. + + +rdinit= + To specify which file contains the program that starts system + initialization, administrators can use this command line parameter. + The default value of this parameter is "/init". If the specified + file exists and the kernel can execute it, root filesystem related + kernel command line parameters, including 'nfsroot=', are ignored. + + A description of the process of mounting the root file system can be + found in Documentation/driver-api/early-userspace/early_userspace_support.rst + + +Boot Loader +=========== + +To get the kernel into memory different approaches can be used. +They depend on various facilities being available: + + +- Booting from a floppy using syslinux + + When building kernels, an easy way to create a boot floppy that uses + syslinux is to use the zdisk or bzdisk make targets which use zimage + and bzimage images respectively. Both targets accept the + FDARGS parameter which can be used to set the kernel command line. + + e.g:: + + make bzdisk FDARGS="root=/dev/nfs" + + Note that the user running this command will need to have + access to the floppy drive device, /dev/fd0 + + For more information on syslinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + + .. note:: + Previously it was possible to write a kernel directly to + a floppy using dd, configure the boot device using rdev, and + boot using the resulting floppy. Linux no longer supports this + method of booting. + +- Booting from a cdrom using isolinux + + When building kernels, an easy way to create a bootable cdrom that + uses isolinux is to use the isoimage target which uses a bzimage + image. Like zdisk and bzdisk, this target accepts the FDARGS + parameter which can be used to set the kernel command line. + + e.g:: + + make isoimage FDARGS="root=/dev/nfs" + + The resulting iso image will be arch//boot/image.iso + This can be written to a cdrom using a variety of tools including + cdrecord. + + e.g:: + + cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso + + For more information on isolinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + +- Using LILO + + When using LILO all the necessary command line parameters may be + specified using the 'append=' directive in the LILO configuration + file. + + However, to use the 'root=' directive you also need to create + a dummy root device, which may be removed after LILO is run. + + e.g:: + + mknod /dev/boot255 c 0 255 + + For information on configuring LILO, please refer to its documentation. + +- Using GRUB + + When using GRUB, kernel parameter are simply appended after the kernel + specification: kernel + +- Using loadlin + + loadlin may be used to boot Linux from a DOS command prompt without + requiring a local hard disk to mount as root. This has not been + thoroughly tested by the authors of this document, but in general + it should be possible configure the kernel command line similarly + to the configuration of LILO. + + Please refer to the loadlin documentation for further information. + +- Using a boot ROM + + This is probably the most elegant way of booting a diskless client. + With a boot ROM the kernel is loaded using the TFTP protocol. The + authors of this document are not aware of any no commercial boot + ROMs that support booting Linux over the network. However, there + are two free implementations of a boot ROM, netboot-nfs and + etherboot, both of which are available on sunsite.unc.edu, and both + of which contain everything you need to boot a diskless Linux client. + +- Using pxelinux + + Pxelinux may be used to boot linux using the PXE boot loader + which is present on many modern network cards. + + When using pxelinux, the kernel image is specified using + "kernel ". The nfsroot parameters + are passed to the kernel by adding them to the "append" line. + It is common to use serial console in conjunction with pxeliunx, + see Documentation/admin-guide/serial-console.rst for more information. + + For more information on isolinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + + + + +Credits +======= + + The nfsroot code in the kernel and the RARP support have been written + by Gero Kuhlmann . + + The rest of the IP layer autoconfiguration code has been written + by Martin Mares . + + In order to write the initial version of nfsroot I would like to thank + Jens-Uwe Mager for his help. --- a/Documentation/admin-guide/nfs/pnfs-block-server.rst +++ b/Documentation/admin-guide/nfs/pnfs-block-server.rst @@ -0,0 +1,42 @@ +=================================== +pNFS block layout server user guide +=================================== + +The Linux NFS server now supports the pNFS block layout extension. In this +case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition +to handling all the metadata access to the NFS export also hands out layouts +to the clients to directly access the underlying block devices that are +shared with the client. + +To use pNFS block layouts with with the Linux NFS server the exported file +system needs to support the pNFS block layouts (currently just XFS), and the +file system must sit on shared storage (typically iSCSI) that is accessible +to the clients in addition to the MDS. As of now the file system needs to +sit directly on the exported volume, striping or concatenation of +volumes on the MDS and clients is not supported yet. + +On the server, pNFS block volume support is automatically if the file system +support it. On the client make sure the kernel has the CONFIG_PNFS_BLOCK +option enabled, the blkmapd daemon from nfs-utils is running, and the +file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1). + +If the nfsd server needs to fence a non-responding client it calls +/sbin/nfsd-recall-failed with the first argument set to the IP address of +the client, and the second argument set to the device node without the /dev +prefix for the file system to be fenced. Below is an example file that shows +how to translate the device into a serial number from SCSI EVPD 0x80:: + + cat > /sbin/nfsd-recall-failed << EOF + +.. code-block:: sh + + #!/bin/sh + + CLIENT="$1" + DEV="/dev/$2" + EVPD=`sg_inq --page=0x80 ${DEV} | \ + grep "Unit serial number:" | \ + awk -F ': ' '{print $2}'` + + echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log + EOF --- a/Documentation/admin-guide/nfs/pnfs-scsi-server.rst +++ b/Documentation/admin-guide/nfs/pnfs-scsi-server.rst @@ -0,0 +1,24 @@ + +================================== +pNFS SCSI layout server user guide +================================== + +This document describes support for pNFS SCSI layouts in the Linux NFS server. +With pNFS SCSI layouts, the NFS server acts as Metadata Server (MDS) for pNFS, +which in addition to handling all the metadata access to the NFS export, +also hands out layouts to the clients so that they can directly access the +underlying SCSI LUNs that are shared with the client. + +To use pNFS SCSI layouts with with the Linux NFS server, the exported file +system needs to support the pNFS SCSI layouts (currently just XFS), and the +file system must sit on a SCSI LUN that is accessible to the clients in +addition to the MDS. As of now the file system needs to sit directly on the +exported LUN, striping or concatenation of LUNs on the MDS and clients +is not supported yet. + +On a server built with CONFIG_NFSD_SCSI, the pNFS SCSI volume support is +automatically enabled if the file system is exported using the "pnfs" +option and the underlying SCSI device support persistent reservations. +On the client make sure the kernel has the CONFIG_PNFS_BLOCK option +enabled, and the file system is mounted using the NFSv4.1 protocol +version (mount -o vers=4.1). --- a/Documentation/admin-guide/pm/cpuidle.rst +++ b/Documentation/admin-guide/pm/cpuidle.rst @@ -506,6 +506,9 @@ ``disable`` Whether or not this idle state is disabled. +``default_status`` + The default status of this state, "enabled" or "disabled". + ``latency`` Exit latency of the idle state in microseconds. --- a/Documentation/admin-guide/pm/intel_idle.rst +++ b/Documentation/admin-guide/pm/intel_idle.rst @@ -0,0 +1,246 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: + +============================================== +``intel_idle`` CPU Idle Time Management Driver +============================================== + +:Copyright: |copy| 2020 Intel Corporation + +:Author: Rafael J. Wysocki + + +General Information +=================== + +``intel_idle`` is a part of the +:doc:`CPU idle time management subsystem ` in the Linux kernel +(``CPUIdle``). It is the default CPU idle time management driver for the +Nehalem and later generations of Intel processors, but the level of support for +a particular processor model in it depends on whether or not it recognizes that +processor model and may also depend on information coming from the platform +firmware. [To understand ``intel_idle`` it is necessary to know how ``CPUIdle`` +works in general, so this is the time to get familiar with :doc:`cpuidle` if you +have not done that yet.] + +``intel_idle`` uses the ``MWAIT`` instruction to inform the processor that the +logical CPU executing it is idle and so it may be possible to put some of the +processor's functional blocks into low-power states. That instruction takes two +arguments (passed in the ``EAX`` and ``ECX`` registers of the target CPU), the +first of which, referred to as a *hint*, can be used by the processor to +determine what can be done (for details refer to Intel Software Developer’s +Manual [1]_). Accordingly, ``intel_idle`` refuses to work with processors in +which the support for the ``MWAIT`` instruction has been disabled (for example, +via the platform firmware configuration menu) or which do not support that +instruction at all. + +``intel_idle`` is not modular, so it cannot be unloaded, which means that the +only way to pass early-configuration-time parameters to it is via the kernel +command line. + + +.. _intel-idle-enumeration-of-states: + +Enumeration of Idle States +========================== + +Each ``MWAIT`` hint value is interpreted by the processor as a license to +reconfigure itself in a certain way in order to save energy. The processor +configurations (with reduced power draw) resulting from that are referred to +as C-states (in the ACPI terminology) or idle states. The list of meaningful +``MWAIT`` hint values and idle states (i.e. low-power configurations of the +processor) corresponding to them depends on the processor model and it may also +depend on the configuration of the platform. + +In order to create a list of available idle states required by the ``CPUIdle`` +subsystem (see :ref:`idle-states-representation` in :doc:`cpuidle`), +``intel_idle`` can use two sources of information: static tables of idle states +for different processor models included in the driver itself and the ACPI tables +of the system. The former are always used if the processor model at hand is +recognized by ``intel_idle`` and the latter are used if that is required for +the given processor model (which is the case for all server processor models +recognized by ``intel_idle``) or if the processor model is not recognized. + +If the ACPI tables are going to be used for building the list of available idle +states, ``intel_idle`` first looks for a ``_CST`` object under one of the ACPI +objects corresponding to the CPUs in the system (refer to the ACPI specification +[2]_ for the description of ``_CST`` and its output package). Because the +``CPUIdle`` subsystem expects that the list of idle states supplied by the +driver will be suitable for all of the CPUs handled by it and ``intel_idle`` is +registered as the ``CPUIdle`` driver for all of the CPUs in the system, the +driver looks for the first ``_CST`` object returning at least one valid idle +state description and such that all of the idle states included in its return +package are of the FFH (Functional Fixed Hardware) type, which means that the +``MWAIT`` instruction is expected to be used to tell the processor that it can +enter one of them. The return package of that ``_CST`` is then assumed to be +applicable to all of the other CPUs in the system and the idle state +descriptions extracted from it are stored in a preliminary list of idle states +coming from the ACPI tables. [This step is skipped if ``intel_idle`` is +configured to ignore the ACPI tables; see `below `_.] + +Next, the first (index 0) entry in the list of available idle states is +initialized to represent a "polling idle state" (a pseudo-idle state in which +the target CPU continuously fetches and executes instructions), and the +subsequent (real) idle state entries are populated as follows. + +If the processor model at hand is recognized by ``intel_idle``, there is a +(static) table of idle state descriptions for it in the driver. In that case, +the "internal" table is the primary source of information on idle states and the +information from it is copied to the final list of available idle states. If +using the ACPI tables for the enumeration of idle states is not required +(depending on the processor model), all of the listed idle state are enabled by +default (so all of them will be taken into consideration by ``CPUIdle`` +governors during CPU idle state selection). Otherwise, some of the listed idle +states may not be enabled by default if there are no matching entries in the +preliminary list of idle states coming from the ACPI tables. In that case user +space still can enable them later (on a per-CPU basis) with the help of +the ``disable`` idle state attribute in ``sysfs`` (see +:ref:`idle-states-representation` in :doc:`cpuidle`). This basically means that +the idle states "known" to the driver may not be enabled by default if they have +not been exposed by the platform firmware (through the ACPI tables). + +If the given processor model is not recognized by ``intel_idle``, but it +supports ``MWAIT``, the preliminary list of idle states coming from the ACPI +tables is used for building the final list that will be supplied to the +``CPUIdle`` core during driver registration. For each idle state in that list, +the description, ``MWAIT`` hint and exit latency are copied to the corresponding +entry in the final list of idle states. The name of the idle state represented +by it (to be returned by the ``name`` idle state attribute in ``sysfs``) is +"CX_ACPI", where X is the index of that idle state in the final list (note that +the minimum value of X is 1, because 0 is reserved for the "polling" state), and +its target residency is based on the exit latency value. Specifically, for +C1-type idle states the exit latency value is also used as the target residency +(for compatibility with the majority of the "internal" tables of idle states for +various processor models recognized by ``intel_idle``) and for the other idle +state types (C2 and C3) the target residency value is 3 times the exit latency +(again, that is because it reflects the target residency to exit latency ratio +in the majority of cases for the processor models recognized by ``intel_idle``). +All of the idle states in the final list are enabled by default in this case. + + +.. _intel-idle-initialization: + +Initialization +============== + +The initialization of ``intel_idle`` starts with checking if the kernel command +line options forbid the use of the ``MWAIT`` instruction. If that is the case, +an error code is returned right away. + +The next step is to check whether or not the processor model is known to the +driver, which determines the idle states enumeration method (see +`above `_), and whether or not the processor +supports ``MWAIT`` (the initialization fails if that is not the case). Then, +the ``MWAIT`` support in the processor is enumerated through ``CPUID`` and the +driver initialization fails if the level of support is not as expected (for +example, if the total number of ``MWAIT`` substates returned is 0). + +Next, if the driver is not configured to ignore the ACPI tables (see +`below `_), the idle states information provided by the +platform firmware is extracted from them. + +Then, ``CPUIdle`` device objects are allocated for all CPUs and the list of +available idle states is created as explained +`above `_. + +Finally, ``intel_idle`` is registered with the help of cpuidle_register_driver() +as the ``CPUIdle`` driver for all CPUs in the system and a CPU online callback +for configuring individual CPUs is registered via cpuhp_setup_state(), which +(among other things) causes the callback routine to be invoked for all of the +CPUs present in the system at that time (each CPU executes its own instance of +the callback routine). That routine registers a ``CPUIdle`` device for the CPU +running it (which enables the ``CPUIdle`` subsystem to operate that CPU) and +optionally performs some CPU-specific initialization actions that may be +required for the given processor model. + + +.. _intel-idle-parameters: + +Kernel Command Line Options and Module Parameters +================================================= + +The *x86* architecture support code recognizes three kernel command line +options related to CPU idle time management: ``idle=poll``, ``idle=halt``, +and ``idle=nomwait``. If any of them is present in the kernel command line, the +``MWAIT`` instruction is not allowed to be used, so the initialization of +``intel_idle`` will fail. + +Apart from that there are two module parameters recognized by ``intel_idle`` +itself that can be set via the kernel command line (they cannot be updated via +sysfs, so that is the only way to change their values). + +The ``max_cstate`` parameter value is the maximum idle state index in the list +of idle states supplied to the ``CPUIdle`` core during the registration of the +driver. It is also the maximum number of regular (non-polling) idle states that +can be used by ``intel_idle``, so the enumeration of idle states is terminated +after finding that number of usable idle states (the other idle states that +potentially might have been used if ``max_cstate`` had been greater are not +taken into consideration at all). Setting ``max_cstate`` can prevent +``intel_idle`` from exposing idle states that are regarded as "too deep" for +some reason to the ``CPUIdle`` core, but it does so by making them effectively +invisible until the system is shut down and started again which may not always +be desirable. In practice, it is only really necessary to do that if the idle +states in question cannot be enabled during system startup, because in the +working state of the system the CPU power management quality of service (PM +QoS) feature can be used to prevent ``CPUIdle`` from touching those idle states +even if they have been enumerated (see :ref:`cpu-pm-qos` in :doc:`cpuidle`). +Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail. + +The ``noacpi`` module parameter (which is recognized by ``intel_idle`` if the +kernel has been configured with ACPI support), can be set to make the driver +ignore the system's ACPI tables entirely (it is unset by default). + + +.. _intel-idle-core-and-package-idle-states: + +Core and Package Levels of Idle States +====================================== + +Typically, in a processor supporting the ``MWAIT`` instruction there are (at +least) two levels of idle states (or C-states). One level, referred to as +"core C-states", covers individual cores in the processor, whereas the other +level, referred to as "package C-states", covers the entire processor package +and it may also involve other components of the system (GPUs, memory +controllers, I/O hubs etc.). + +Some of the ``MWAIT`` hint values allow the processor to use core C-states only +(most importantly, that is the case for the ``MWAIT`` hint value corresponding +to the ``C1`` idle state), but the majority of them give it a license to put +the target core (i.e. the core containing the logical CPU executing ``MWAIT`` +with the given hint value) into a specific core C-state and then (if possible) +to enter a specific package C-state at the deeper level. For example, the +``MWAIT`` hint value representing the ``C3`` idle state allows the processor to +put the target core into the low-power state referred to as "core ``C3``" (or +``CC3``), which happens if all of the logical CPUs (SMT siblings) in that core +have executed ``MWAIT`` with the ``C3`` hint value (or with a hint value +representing a deeper idle state), and in addition to that (in the majority of +cases) it gives the processor a license to put the entire package (possibly +including some non-CPU components such as a GPU or a memory controller) into the +low-power state referred to as "package ``C3``" (or ``PC3``), which happens if +all of the cores have gone into the ``CC3`` state and (possibly) some additional +conditions are satisfied (for instance, if the GPU is covered by ``PC3``, it may +be required to be in a certain GPU-specific low-power state for ``PC3`` to be +reachable). + +As a rule, there is no simple way to make the processor use core C-states only +if the conditions for entering the corresponding package C-states are met, so +the logical CPU executing ``MWAIT`` with a hint value that is not core-level +only (like for ``C1``) must always assume that this may cause the processor to +enter a package C-state. [That is why the exit latency and target residency +values corresponding to the majority of ``MWAIT`` hint values in the "internal" +tables of idle states in ``intel_idle`` reflect the properties of package +C-states.] If using package C-states is not desirable at all, either +:ref:`PM QoS ` or the ``max_cstate`` module parameter of +``intel_idle`` described `above `_ must be used to +restrict the range of permissible idle states to the ones with core-level only +``MWAIT`` hint values (like ``C1``). + + +References +========== + +.. [1] *Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2B*, + https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-2b-manual.html + +.. [2] *Advanced Configuration and Power Interface (ACPI) Specification*, + https://uefi.org/specifications --- a/Documentation/admin-guide/pm/working-state.rst +++ b/Documentation/admin-guide/pm/working-state.rst @@ -8,6 +8,7 @@ :maxdepth: 2 cpuidle + intel_idle cpufreq intel_pstate intel_epb --- a/Documentation/admin-guide/thunderbolt.rst +++ b/Documentation/admin-guide/thunderbolt.rst @@ -1,6 +1,28 @@ -============= - Thunderbolt -============= +.. SPDX-License-Identifier: GPL-2.0 + +====================== + USB4 and Thunderbolt +====================== +USB4 is the public specification based on Thunderbolt 3 protocol with +some differences at the register level among other things. Connection +manager is an entity running on the host router (host controller) +responsible for enumerating routers and establishing tunnels. A +connection manager can be implemented either in firmware or software. +Typically PCs come with a firmware connection manager for Thunderbolt 3 +and early USB4 capable systems. Apple systems on the other hand use +software connection manager and the later USB4 compliant devices follow +the suit. + +The Linux Thunderbolt driver supports both and can detect at runtime which +connection manager implementation is to be used. To be on the safe side the +software connection manager in Linux also advertises security level +``user`` which means PCIe tunneling is disabled by default. The +documentation below applies to both implementations with the exception that +the software connection manager only supports ``user`` security level and +is expected to be accompanied with an IOMMU based DMA protection. + +Security levels and how to use them +----------------------------------- The interface presented here is not meant for end users. Instead there should be a userspace tool that handles all the low-level details, keeps a database of the authorized devices and prompts users for new connections. @@ -18,8 +40,6 @@ keep in mind that this bypasses the security levels and makes the system vulnerable to DMA attacks. -Security levels and how to use them ------------------------------------ Starting with Intel Falcon Ridge Thunderbolt controller there are 4 security levels available. Intel Titan Ridge added one more security level (usbonly). The reason for these is the fact that the connected devices can --- a/Documentation/arm64/cpu-feature-registers.rst +++ b/Documentation/arm64/cpu-feature-registers.rst @@ -117,6 +117,8 @@ +------------------------------+---------+---------+ | Name | bits | visible | +------------------------------+---------+---------+ + | RNDR | [63-60] | y | + +------------------------------+---------+---------+ | TS | [55-52] | y | +------------------------------+---------+---------+ | FHM | [51-48] | y | @@ -200,6 +202,12 @@ +------------------------------+---------+---------+ | Name | bits | visible | +------------------------------+---------+---------+ + | I8MM | [55-52] | y | + +------------------------------+---------+---------+ + | DGH | [51-48] | y | + +------------------------------+---------+---------+ + | BF16 | [47-44] | y | + +------------------------------+---------+---------+ | SB | [39-36] | y | +------------------------------+---------+---------+ | FRINTTS | [35-32] | y | @@ -234,10 +242,18 @@ +------------------------------+---------+---------+ | Name | bits | visible | +------------------------------+---------+---------+ + | F64MM | [59-56] | y | + +------------------------------+---------+---------+ + | F32MM | [55-52] | y | + +------------------------------+---------+---------+ + | I8MM | [47-44] | y | + +------------------------------+---------+---------+ | SM4 | [43-40] | y | +------------------------------+---------+---------+ | SHA3 | [35-32] | y | +------------------------------+---------+---------+ + | BF16 | [23-20] | y | + +------------------------------+---------+---------+ | BitPerm | [19-16] | y | +------------------------------+---------+---------+ | AES | [7-4] | y | --- a/Documentation/arm64/elf_hwcaps.rst +++ b/Documentation/arm64/elf_hwcaps.rst @@ -204,6 +204,37 @@ Functionality implied by ID_AA64ISAR1_EL1.FRINTTS == 0b0001. +HWCAP2_SVEI8MM + + Functionality implied by ID_AA64ZFR0_EL1.I8MM == 0b0001. + +HWCAP2_SVEF32MM + + Functionality implied by ID_AA64ZFR0_EL1.F32MM == 0b0001. + +HWCAP2_SVEF64MM + + Functionality implied by ID_AA64ZFR0_EL1.F64MM == 0b0001. + +HWCAP2_SVEBF16 + + Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0001. + +HWCAP2_I8MM + + Functionality implied by ID_AA64ISAR1_EL1.I8MM == 0b0001. + +HWCAP2_BF16 + + Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0001. + +HWCAP2_DGH + + Functionality implied by ID_AA64ISAR1_EL1.DGH == 0b0001. + +HWCAP2_RNG + + Functionality implied by ID_AA64ISAR0_EL1.RNDR == 0b0001. 4. Unused AT_HWCAP bits ----------------------- --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -88,6 +88,8 @@ +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A55 | #1530923 | ARM64_ERRATUM_1530923 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1349291 | N/A | --- a/Documentation/asm-annotations.rst +++ b/Documentation/asm-annotations.rst @@ -73,10 +73,11 @@ three main groups: 1. ``SYM_FUNC_*`` -- to annotate C-like functions. This means functions with - standard C calling conventions, i.e. the stack contains a return address at - the predefined place and a return from the function can happen in a - standard way. When frame pointers are enabled, save/restore of frame - pointer shall happen at the start/end of a function, respectively, too. + standard C calling conventions. For example, on x86, this means that the + stack contains a return address at the predefined place and a return from + the function can happen in a standard way. When frame pointers are enabled, + save/restore of frame pointer shall happen at the start/end of a function, + respectively, too. Checking tools like ``objtool`` should ensure such marked functions conform to these rules. The tools can also easily annotate these functions with --- a/Documentation/block/biovecs.rst +++ b/Documentation/block/biovecs.rst @@ -47,7 +47,7 @@ advantages: * Before, iterating over bios was very awkward when you weren't processing - exactly one bvec at a time - for example, bio_copy_data() in fs/bio.c, + exactly one bvec at a time - for example, bio_copy_data() in block/bio.c, which copies the contents of one bio into another. Because the biovecs wouldn't necessarily be the same size, the old code was tricky convoluted - it had to walk two different bios at the same time, keeping both bi_idx and --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -39,6 +39,7 @@ ../RCU/index gcc-plugins symbol-namespaces + padata Interfaces for kernel debugging --- a/Documentation/core-api/padata.rst +++ b/Documentation/core-api/padata.rst @@ -0,0 +1,169 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================================= +The padata parallel execution mechanism +======================================= + +:Date: December 2019 + +Padata is a mechanism by which the kernel can farm jobs out to be done in +parallel on multiple CPUs while retaining their ordering. It was developed for +use with the IPsec code, which needs to be able to perform encryption and +decryption on large numbers of packets without reordering those packets. The +crypto developers made a point of writing padata in a sufficiently general +fashion that it could be put to other uses as well. + +Usage +===== + +Initializing +------------ + +The first step in using padata is to set up a padata_instance structure for +overall control of how jobs are to be run:: + + #include + + struct padata_instance *padata_alloc_possible(const char *name); + +'name' simply identifies the instance. + +There are functions for enabling and disabling the instance:: + + int padata_start(struct padata_instance *pinst); + void padata_stop(struct padata_instance *pinst); + +These functions are setting or clearing the "PADATA_INIT" flag; if that flag is +not set, other functions will refuse to work. padata_start() returns zero on +success (flag set) or -EINVAL if the padata cpumask contains no active CPU +(flag not set). padata_stop() clears the flag and blocks until the padata +instance is unused. + +Finally, complete padata initialization by allocating a padata_shell:: + + struct padata_shell *padata_alloc_shell(struct padata_instance *pinst); + +A padata_shell is used to submit a job to padata and allows a series of such +jobs to be serialized independently. A padata_instance may have one or more +padata_shells associated with it, each allowing a separate series of jobs. + +Modifying cpumasks +------------------ + +The CPUs used to run jobs can be changed in two ways, programatically with +padata_set_cpumask() or via sysfs. The former is defined:: + + int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, + cpumask_var_t cpumask); + +Here cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a +parallel cpumask describes which processors will be used to execute jobs +submitted to this instance in parallel and a serial cpumask defines which +processors are allowed to be used as the serialization callback processor. +cpumask specifies the new cpumask to use. + +There may be sysfs files for an instance's cpumasks. For example, pcrypt's +live in /sys/kernel/pcrypt/. Within an instance's directory +there are two files, parallel_cpumask and serial_cpumask, and either cpumask +may be changed by echoing a bitmask into the file, for example:: + + echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask + +Reading one of these files shows the user-supplied cpumask, which may be +different from the 'usable' cpumask. + +Padata maintains two pairs of cpumasks internally, the user-supplied cpumasks +and the 'usable' cpumasks. (Each pair consists of a parallel and a serial +cpumask.) The user-supplied cpumasks default to all possible CPUs on instance +allocation and may be changed as above. The usable cpumasks are always a +subset of the user-supplied cpumasks and contain only the online CPUs in the +user-supplied masks; these are the cpumasks padata actually uses. So it is +legal to supply a cpumask to padata that contains offline CPUs. Once an +offline CPU in the user-supplied cpumask comes online, padata is going to use +it. + +Changing the CPU masks are expensive operations, so it should not be done with +great frequency. + +Running A Job +------------- + +Actually submitting work to the padata instance requires the creation of a +padata_priv structure, which represents one job:: + + struct padata_priv { + /* Other stuff here... */ + void (*parallel)(struct padata_priv *padata); + void (*serial)(struct padata_priv *padata); + }; + +This structure will almost certainly be embedded within some larger +structure specific to the work to be done. Most of its fields are private to +padata, but the structure should be zeroed at initialisation time, and the +parallel() and serial() functions should be provided. Those functions will +be called in the process of getting the work done as we will see +momentarily. + +The submission of the job is done with:: + + int padata_do_parallel(struct padata_shell *ps, + struct padata_priv *padata, int *cb_cpu); + +The ps and padata structures must be set up as described above; cb_cpu +points to the preferred CPU to be used for the final callback when the job is +done; it must be in the current instance's CPU mask (if not the cb_cpu pointer +is updated to point to the CPU actually chosen). The return value from +padata_do_parallel() is zero on success, indicating that the job is in +progress. -EBUSY means that somebody, somewhere else is messing with the +instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the +serial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped +instance. + +Each job submitted to padata_do_parallel() will, in turn, be passed to +exactly one call to the above-mentioned parallel() function, on one CPU, so +true parallelism is achieved by submitting multiple jobs. parallel() runs with +software interrupts disabled and thus cannot sleep. The parallel() +function gets the padata_priv structure pointer as its lone parameter; +information about the actual work to be done is probably obtained by using +container_of() to find the enclosing structure. + +Note that parallel() has no return value; the padata subsystem assumes that +parallel() will take responsibility for the job from this point. The job +need not be completed during this call, but, if parallel() leaves work +outstanding, it should be prepared to be called again with a new job before +the previous one completes. + +Serializing Jobs +---------------- + +When a job does complete, parallel() (or whatever function actually finishes +the work) should inform padata of the fact with a call to:: + + void padata_do_serial(struct padata_priv *padata); + +At some point in the future, padata_do_serial() will trigger a call to the +serial() function in the padata_priv structure. That call will happen on +the CPU requested in the initial call to padata_do_parallel(); it, too, is +run with local software interrupts disabled. +Note that this call may be deferred for a while since the padata code takes +pains to ensure that jobs are completed in the order in which they were +submitted. + +Destroying +---------- + +Cleaning up a padata instance predictably involves calling the three free +functions that correspond to the allocation in reverse:: + + void padata_free_shell(struct padata_shell *ps); + void padata_stop(struct padata_instance *pinst); + void padata_free(struct padata_instance *pinst); + +It is the user's responsibility to ensure all outstanding jobs are complete +before any of the above are called. + +Interface +========= + +.. kernel-doc:: include/linux/padata.h +.. kernel-doc:: kernel/padata.c --- a/Documentation/crypto/devel-algos.rst +++ b/Documentation/crypto/devel-algos.rst @@ -31,33 +31,23 @@ :: - int crypto_unregister_alg(struct crypto_alg *alg); - int crypto_unregister_algs(struct crypto_alg *algs, int count); - + void crypto_unregister_alg(struct crypto_alg *alg); + void crypto_unregister_algs(struct crypto_alg *algs, int count); -Notice that both registration and unregistration functions do return a -value, so make sure to handle errors. A return code of zero implies -success. Any return code < 0 implies an error. -The bulk registration/unregistration functions register/unregister each -transformation in the given array of length count. They handle errors as -follows: - -- crypto_register_algs() succeeds if and only if it successfully - registers all the given transformations. If an error occurs partway - through, then it rolls back successful registrations before returning - the error code. Note that if a driver needs to handle registration - errors for individual transformations, then it will need to use the - non-bulk function crypto_register_alg() instead. +The registration functions return 0 on success, or a negative errno +value on failure. crypto_register_algs() succeeds only if it +successfully registered all the given algorithms; if it fails partway +through, then any changes are rolled back. -- crypto_unregister_algs() tries to unregister all the given - transformations, continuing on error. It logs errors and always - returns zero. +The unregistration functions always succeed, so they don't have a +return value. Don't try to unregister algorithms that aren't +currently registered. Single-Block Symmetric Ciphers [CIPHER] --------------------------------------- -Example of transformations: aes, arc4, ... +Example of transformations: aes, serpent, ... This section describes the simplest of all transformation implementations, that being the CIPHER type used for symmetric ciphers. @@ -108,7 +98,7 @@ Multi-Block Ciphers ------------------- -Example of transformations: cbc(aes), ecb(arc4), ... +Example of transformations: cbc(aes), chacha20, ... This section describes the multi-block cipher transformation implementations. The multi-block ciphers are used for transformations @@ -169,10 +159,10 @@ :: - int crypto_unregister_ahash(struct ahash_alg *alg); + void crypto_unregister_ahash(struct ahash_alg *alg); - int crypto_unregister_shash(struct shash_alg *alg); - int crypto_unregister_shashes(struct shash_alg *algs, int count); + void crypto_unregister_shash(struct shash_alg *alg); + void crypto_unregister_shashes(struct shash_alg *algs, int count); Cipher Definition With struct shash_alg and ahash_alg --- a/Documentation/dev-tools/kunit/faq.rst +++ b/Documentation/dev-tools/kunit/faq.rst @@ -29,7 +29,8 @@ For the most part, the KUnit core framework (what you use to write the tests) can compile to any architecture; it compiles like just another part of the -kernel and runs when the kernel boots. However, there is some infrastructure, +kernel and runs when the kernel boots, or when built as a module, when the +module is loaded. However, there is some infrastructure, like the KUnit Wrapper (``tools/testing/kunit/kunit.py``) that does not support other architectures. --- a/Documentation/dev-tools/kunit/index.rst +++ b/Documentation/dev-tools/kunit/index.rst @@ -49,6 +49,9 @@ of a host operating system; to be clear, it does not require any virtualization support; it is just a regular program. +Alternatively, kunit and kunit tests can be built as modules and tests will +run when the test module is loaded. + KUnit is fast. Excluding build time, from invocation to completion KUnit can run several dozen tests in only 10 to 20 seconds; this might not sound like a big deal to some people, but having such fast and easy to run tests fundamentally --- a/Documentation/dev-tools/kunit/usage.rst +++ b/Documentation/dev-tools/kunit/usage.rst @@ -539,6 +539,22 @@ Congratulations, you just ran a KUnit test on the x86 architecture! +In a similar manner, kunit and kunit tests can also be built as modules, +so if you wanted to run tests in this way you might add the following config +options to your ``.config``: + +.. code-block:: none + + CONFIG_KUNIT=m + CONFIG_KUNIT_EXAMPLE_TEST=m + +Once the kernel is built and installed, a simple + +.. code-block:: bash + modprobe example-test + +...will run the tests. + Writing new tests for other architectures ----------------------------------------- --- a/Documentation/devicetree/bindings/arm/atmel-sysregs.txt +++ b/Documentation/devicetree/bindings/arm/atmel-sysregs.txt @@ -10,6 +10,12 @@ - interrupts: Should contain interrupt for the PIT which is the IRQ line shared across all System Controller members. +PIT64B Timer required properties: +- compatible: Should be "microchip,sam9x60-pit64b" +- reg: Should contain registers location and length +- interrupts: Should contain interrupt for PIT64B timer +- clocks: Should contain the available clock sources for PIT64B timer. + System Timer (ST) required properties: - compatible: Should be "atmel,at91rm9200-st", "syscon", "simple-mfd" - reg: Should contain registers location and length --- a/Documentation/devicetree/bindings/ata/brcm,sata-brcm.txt +++ b/Documentation/devicetree/bindings/ata/brcm,sata-brcm.txt @@ -5,6 +5,7 @@ Required properties: - compatible : should be one or more of + "brcm,bcm7216-ahci" "brcm,bcm7425-ahci" "brcm,bcm7445-ahci" "brcm,bcm-nsp-ahci" @@ -14,6 +15,12 @@ - reg-names : "ahci" and "top-ctrl" - interrupts : interrupt mapping for SATA IRQ +Optional properties: + +- reset: for "brcm,bcm7216-ahci" must be a valid reset phandle + pointing to the RESCAL reset controller provider node. +- reset-names: for "brcm,bcm7216-ahci", must be "rescal". + Also see ahci-platform.txt. Example: --- a/Documentation/devicetree/bindings/connector/usb-connector.txt +++ b/Documentation/devicetree/bindings/connector/usb-connector.txt @@ -1,8 +1,8 @@ USB Connector ============= -USB connector node represents physical USB connector. It should be -a child of USB interface controller. +A USB connector node represents a physical USB connector. It should be +a child of a USB interface controller. Required properties: - compatible: describes type of the connector, must be one of: --- a/Documentation/devicetree/bindings/dma/fsl-edma.txt +++ b/Documentation/devicetree/bindings/dma/fsl-edma.txt @@ -10,6 +10,7 @@ - compatible : - "fsl,vf610-edma" for eDMA used similar to that on Vybrid vf610 SoC - "fsl,imx7ulp-edma" for eDMA2 used similar to that on i.mx7ulp + - "fsl,fsl,ls1028a-edma" for eDMA used similar to that on Vybrid vf610 SoC - reg : Specifies base physical address(s) and size of the eDMA registers. The 1st region is eDMA control register's address and size. The 2nd and the 3rd regions are programmable channel multiplexing --- a/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt +++ b/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt @@ -10,6 +10,9 @@ "fsl,imx6q-sdma" "fsl,imx7d-sdma" "fsl,imx8mq-sdma" + "fsl,imx8mm-sdma" + "fsl,imx8mn-sdma" + "fsl,imx8mp-sdma" The -to variants should be preferred since they allow to determine the correct ROM script addresses needed for the driver to work without additional firmware. --- a/Documentation/devicetree/bindings/dma/jz4780-dma.txt +++ b/Documentation/devicetree/bindings/dma/jz4780-dma.txt @@ -1,4 +1,4 @@ -* Ingenic JZ4780 DMA Controller +* Ingenic XBurst DMA Controller Required properties: @@ -8,10 +8,12 @@ * ingenic,jz4770-dma * ingenic,jz4780-dma * ingenic,x1000-dma + * ingenic,x1830-dma - reg: Should contain the DMA channel registers location and length, followed by the DMA controller registers location and length. - interrupts: Should contain the interrupt specifier of the DMA controller. -- clocks: Should contain a clock specifier for the JZ4780/X1000 PDMA clock. +- clocks: Should contain a clock specifier for the JZ4780/X1000/X1830 PDMA + clock. - #dma-cells: Must be <2>. Number of integer cells in the dmas property of DMA clients (see below). --- a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt +++ b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt @@ -30,6 +30,7 @@ - "renesas,dmac-r8a7794" (R-Car E2) - "renesas,dmac-r8a7795" (R-Car H3) - "renesas,dmac-r8a7796" (R-Car M3-W) + - "renesas,dmac-r8a77961" (R-Car M3-W+) - "renesas,dmac-r8a77965" (R-Car M3-N) - "renesas,dmac-r8a77970" (R-Car V3M) - "renesas,dmac-r8a77980" (R-Car V3H) --- a/Documentation/devicetree/bindings/dma/ti/k3-udma.yaml +++ b/Documentation/devicetree/bindings/dma/ti/k3-udma.yaml @@ -0,0 +1,184 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/dma/ti/k3-udma.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Texas Instruments K3 NAVSS Unified DMA Device Tree Bindings + +maintainers: + - Peter Ujfalusi + +description: | + The UDMA-P is intended to perform similar (but significantly upgraded) + functions as the packet-oriented DMA used on previous SoC devices. The UDMA-P + module supports the transmission and reception of various packet types. + The UDMA-P architecture facilitates the segmentation and reassembly of SoC DMA + data structure compliant packets to/from smaller data blocks that are natively + compatible with the specific requirements of each connected peripheral. + Multiple Tx and Rx channels are provided within the DMA which allow multiple + segmentation or reassembly operations to be ongoing. The DMA controller + maintains state information for each of the channels which allows packet + segmentation and reassembly operations to be time division multiplexed between + channels in order to share the underlying DMA hardware. An external DMA + scheduler is used to control the ordering and rate at which this multiplexing + occurs for Transmit operations. The ordering and rate of Receive operations + is indirectly controlled by the order in which blocks are pushed into the DMA + on the Rx PSI-L interface. + + The UDMA-P also supports acting as both a UTC and UDMA-C for its internal + channels. Channels in the UDMA-P can be configured to be either Packet-Based + or Third-Party channels on a channel by channel basis. + + All transfers within NAVSS is done between PSI-L source and destination + threads. + The peripherals serviced by UDMA can be PSI-L native (sa2ul, cpsw, etc) or + legacy, non PSI-L native peripherals. In the later case a special, small PDMA + is tasked to act as a bridge between the PSI-L fabric and the legacy + peripheral. + + PDMAs can be configured via UDMAP peer registers to match with the + configuration of the legacy peripheral. + +allOf: + - $ref: "../dma-controller.yaml#" + +properties: + "#dma-cells": + const: 1 + description: | + The cell is the PSI-L thread ID of the remote (to UDMAP) end. + Valid ranges for thread ID depends on the data movement direction: + for source thread IDs (rx): 0 - 0x7fff + for destination thread IDs (tx): 0x8000 - 0xffff + + Please refer to the device documentation for the PSI-L thread map and also + the PSI-L peripheral chapter for the correct thread ID. + + compatible: + enum: + - ti,am654-navss-main-udmap + - ti,am654-navss-mcu-udmap + - ti,j721e-navss-main-udmap + - ti,j721e-navss-mcu-udmap + + reg: + maxItems: 3 + + reg-names: + items: + - const: gcfg + - const: rchanrt + - const: tchanrt + + msi-parent: true + + ti,sci: + description: phandle to TI-SCI compatible System controller node + allOf: + - $ref: /schemas/types.yaml#/definitions/phandle + + ti,sci-dev-id: + description: TI-SCI device id of UDMAP + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32 + + ti,ringacc: + description: phandle to the ring accelerator node + allOf: + - $ref: /schemas/types.yaml#/definitions/phandle + + ti,sci-rm-range-tchan: + description: | + Array of UDMA tchan resource subtypes for resource allocation for this + host + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32-array + minItems: 1 + # Should be enough + maxItems: 255 + + ti,sci-rm-range-rchan: + description: | + Array of UDMA rchan resource subtypes for resource allocation for this + host + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32-array + minItems: 1 + # Should be enough + maxItems: 255 + + ti,sci-rm-range-rflow: + description: | + Array of UDMA rflow resource subtypes for resource allocation for this + host + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32-array + minItems: 1 + # Should be enough + maxItems: 255 + +required: + - compatible + - "#dma-cells" + - reg + - reg-names + - msi-parent + - ti,sci + - ti,sci-dev-id + - ti,ringacc + - ti,sci-rm-range-tchan + - ti,sci-rm-range-rchan + - ti,sci-rm-range-rflow + +examples: + - |+ + cbass_main { + #address-cells = <2>; + #size-cells = <2>; + + cbass_main_navss: navss@30800000 { + compatible = "simple-mfd"; + #address-cells = <2>; + #size-cells = <2>; + dma-coherent; + dma-ranges; + ranges; + + ti,sci-dev-id = <118>; + + main_udmap: dma-controller@31150000 { + compatible = "ti,am654-navss-main-udmap"; + reg = <0x0 0x31150000 0x0 0x100>, + <0x0 0x34000000 0x0 0x100000>, + <0x0 0x35000000 0x0 0x100000>; + reg-names = "gcfg", "rchanrt", "tchanrt"; + #dma-cells = <1>; + + ti,ringacc = <&ringacc>; + + msi-parent = <&inta_main_udmass>; + + ti,sci = <&dmsc>; + ti,sci-dev-id = <188>; + + ti,sci-rm-range-tchan = <0x1>, /* TX_HCHAN */ + <0x2>; /* TX_CHAN */ + ti,sci-rm-range-rchan = <0x4>, /* RX_HCHAN */ + <0x5>; /* RX_CHAN */ + ti,sci-rm-range-rflow = <0x6>; /* GP RFLOW */ + }; + }; + + mcasp0: mcasp@02B00000 { + dmas = <&main_udmap 0xc400>, <&main_udmap 0x4400>; + dma-names = "tx", "rx"; + }; + + crypto: crypto@4E00000 { + compatible = "ti,sa2ul-crypto"; + + dmas = <&main_udmap 0xc000>, <&main_udmap 0x4000>, <&main_udmap 0x4001>; + dma-names = "tx", "rx1", "rx2"; + }; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/gpio/qcom,wcd934x-gpio.yaml b/Documentation/devicetree/bindings/gpio/qcom,wcd934x-gpio.yaml --- a/Documentation/devicetree/bindings/gpio/qcom,wcd934x-gpio.yaml +++ b/Documentation/devicetree/bindings/gpio/qcom,wcd934x-gpio.yaml @@ -0,0 +1,47 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/qcom,wcd934x-gpio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: WCD9340/WCD9341 GPIO controller + +maintainers: + - Srinivas Kandagatla + +description: | + Qualcomm Technologies Inc WCD9340/WCD9341 Audio Codec has integrated + gpio controller to control 5 gpios on the chip. + +properties: + compatible: + enum: + - qcom,wcd9340-gpio + - qcom,wcd9341-gpio + + reg: + maxItems: 1 + + gpio-controller: true + + '#gpio-cells': + const: 2 + +required: + - compatible + - reg + - gpio-controller + - "#gpio-cells" + +additionalProperties: false + +examples: + - | + wcdgpio: gpio@42 { + compatible = "qcom,wcd9340-gpio"; + reg = <0x042 0x2>; + gpio-controller; + #gpio-cells = <2>; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt b/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt --- a/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt +++ b/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt @@ -18,7 +18,8 @@ - "renesas,gpio-r8a7793": for R8A7793 (R-Car M2-N) compatible GPIO controller. - "renesas,gpio-r8a7794": for R8A7794 (R-Car E2) compatible GPIO controller. - "renesas,gpio-r8a7795": for R8A7795 (R-Car H3) compatible GPIO controller. - - "renesas,gpio-r8a7796": for R8A7796 (R-Car M3-W) compatible GPIO controller. + - "renesas,gpio-r8a7796": for R8A77960 (R-Car M3-W) compatible GPIO controller. + - "renesas,gpio-r8a77961": for R8A77961 (R-Car M3-W+) compatible GPIO controller. - "renesas,gpio-r8a77965": for R8A77965 (R-Car M3-N) compatible GPIO controller. - "renesas,gpio-r8a77970": for R8A77970 (R-Car V3M) compatible GPIO controller. - "renesas,gpio-r8a77980": for R8A77980 (R-Car V3H) compatible GPIO controller. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/gpio/sifive,gpio.yaml b/Documentation/devicetree/bindings/gpio/sifive,gpio.yaml --- a/Documentation/devicetree/bindings/gpio/sifive,gpio.yaml +++ b/Documentation/devicetree/bindings/gpio/sifive,gpio.yaml @@ -0,0 +1,68 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/sifive,gpio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: SiFive GPIO controller + +maintainers: + - Yash Shah + - Paul Walmsley + +properties: + compatible: + items: + - const: sifive,fu540-c000-gpio + - const: sifive,gpio0 + + reg: + maxItems: 1 + + interrupts: + description: + interrupt mapping one per GPIO. Maximum 16 GPIOs. + minItems: 1 + maxItems: 16 + + interrupt-controller: true + + "#interrupt-cells": + const: 2 + + clocks: + maxItems: 1 + + "#gpio-cells": + const: 2 + + gpio-controller: true + +required: + - compatible + - reg + - interrupts + - interrupt-controller + - "#interrupt-cells" + - clocks + - "#gpio-cells" + - gpio-controller + +additionalProperties: false + +examples: + - | + #include + gpio@10060000 { + compatible = "sifive,fu540-c000-gpio", "sifive,gpio0"; + interrupt-parent = <&plic>; + interrupts = <7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22>; + reg = <0x0 0x10060000 0x0 0x1000>; + clocks = <&tlclk PRCI_CLK_TLCLK>; + gpio-controller; + #gpio-cells = <2>; + interrupt-controller; + #interrupt-cells = <2>; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/gpio/xylon,logicvc-gpio.yaml b/Documentation/devicetree/bindings/gpio/xylon,logicvc-gpio.yaml --- a/Documentation/devicetree/bindings/gpio/xylon,logicvc-gpio.yaml +++ b/Documentation/devicetree/bindings/gpio/xylon,logicvc-gpio.yaml @@ -0,0 +1,69 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +# Copyright 2019 Bootlin +%YAML 1.2 +--- +$id: "http://devicetree.org/schemas/gpio/xylon,logicvc-gpio.yaml#" +$schema: "http://devicetree.org/meta-schemas/core.yaml#" + +title: Xylon LogiCVC GPIO controller + +maintainers: + - Paul Kocialkowski + +description: | + The LogiCVC GPIO describes the GPIO block included in the LogiCVC display + controller. These are meant to be used for controlling display-related + signals. + + The controller exposes GPIOs from the display and power control registers, + which are mapped by the driver as follows: + - GPIO[4:0] (display control) mapped to index 0-4 + - EN_BLIGHT (power control) mapped to index 5 + - EN_VDD (power control) mapped to index 6 + - EN_VEE (power control) mapped to index 7 + - V_EN (power control) mapped to index 8 + +properties: + $nodename: + pattern: "^gpio@[0-9a-f]+$" + + compatible: + enum: + - xylon,logicvc-3.02.a-gpio + + reg: + maxItems: 1 + + "#gpio-cells": + const: 2 + + gpio-controller: true + + gpio-line-names: + minItems: 1 + maxItems: 9 + +required: + - compatible + - reg + - "#gpio-cells" + - gpio-controller + +examples: + - | + logicvc: logicvc@43c00000 { + compatible = "xylon,logicvc-3.02.a", "syscon", "simple-mfd"; + reg = <0x43c00000 0x6000>; + + #address-cells = <1>; + #size-cells = <1>; + + logicvc_gpio: gpio@40 { + compatible = "xylon,logicvc-3.02.a-gpio"; + reg = <0x40 0x40>; + gpio-controller; + #gpio-cells = <2>; + gpio-line-names = "GPIO0", "GPIO1", "GPIO2", "GPIO3", "GPIO4", + "EN_BLIGHT", "EN_VDD", "EN_VEE", "V_EN"; + }; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/hwmon/adi,adm1177.yaml b/Documentation/devicetree/bindings/hwmon/adi,adm1177.yaml --- a/Documentation/devicetree/bindings/hwmon/adi,adm1177.yaml +++ b/Documentation/devicetree/bindings/hwmon/adi,adm1177.yaml @@ -0,0 +1,66 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/hwmon/adi,adm1177.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Analog Devices ADM1177 Hot Swap Controller and Digital Power Monitor + +maintainers: + - Michael Hennerich + - Beniamin Bia + +description: | + Analog Devices ADM1177 Hot Swap Controller and Digital Power Monitor + https://www.analog.com/media/en/technical-documentation/data-sheets/ADM1177.pdf + +properties: + compatible: + enum: + - adi,adm1177 + + reg: + maxItems: 1 + + avcc-supply: + description: + Phandle to the Avcc power supply + + shunt-resistor-micro-ohms: + description: + The value of curent sense resistor in microohms. If not provided, + the current reading and overcurrent alert is disabled. + + adi,shutdown-threshold-microamp: + description: + Specifies the current level at which an over current alert occurs. + If not provided, the overcurrent alert is configured to max ADC range + based on shunt-resistor-micro-ohms. + + adi,vrange-high-enable: + description: + Specifies which internal voltage divider to be used. A 1 selects + a 7:2 voltage divider while a 0 selects a 14:1 voltage divider. + type: boolean + +required: + - compatible + - reg + +examples: + - | + #include + #include + i2c0 { + #address-cells = <1>; + #size-cells = <0>; + + pwmon@5a { + compatible = "adi,adm1177"; + reg = <0x5a>; + shunt-resistor-micro-ohms = <50000>; /* 50 mOhm */ + adi,shutdown-threshold-microamp = <1059000>; /* 1.059 A */ + adi,vrange-high-enable; + }; + }; +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/hwmon/pmbus/ti,ucd90320.yaml b/Documentation/devicetree/bindings/hwmon/pmbus/ti,ucd90320.yaml --- a/Documentation/devicetree/bindings/hwmon/pmbus/ti,ucd90320.yaml +++ b/Documentation/devicetree/bindings/hwmon/pmbus/ti,ucd90320.yaml @@ -0,0 +1,45 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- + +$id: http://devicetree.org/schemas/hwmon/pmbus/ti,ucd90320.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: UCD90320 power sequencer + +maintainers: + - Jim Wright + +description: | + The UCD90320 is a 32-rail PMBus/I2C addressable power-supply sequencer and + monitor. The 24 integrated ADC channels (AMONx) monitor the power supply + voltage, current, and temperature. Of the 84 GPIO pins, 8 can be used as + digital monitors (DMONx), 32 to enable the power supply (ENx), 24 for + margining (MARx), 16 for logical GPO, and 32 GPIs for cascading, and system + function. + + http://focus.ti.com/lit/ds/symlink/ucd90320.pdf + +properties: + compatible: + enum: + - ti,ucd90320 + + reg: + maxItems: 1 + +required: + - compatible + - reg + +examples: + - | + i2c { + #address-cells = <1>; + #size-cells = <0>; + + ucd90320@11 { + compatible = "ti,ucd90320"; + reg = <0x11>; + }; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/accel/adi,adis16240.yaml b/Documentation/devicetree/bindings/iio/accel/adi,adis16240.yaml --- a/Documentation/devicetree/bindings/iio/accel/adi,adis16240.yaml +++ b/Documentation/devicetree/bindings/iio/accel/adi,adis16240.yaml @@ -0,0 +1,49 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iio/accel/adi,adis16240.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: ADIS16240 Programmable Impact Sensor and Recorder driver + +maintainers: + - Alexandru Ardelean + +description: | + ADIS16240 Programmable Impact Sensor and Recorder driver that supports + SPI interface. + https://www.analog.com/en/products/adis16240.html + +properties: + compatible: + enum: + - adi,adis16240 + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + +required: + - compatible + - reg + - interrupts + +examples: + - | + #include + #include + spi0 { + #address-cells = <1>; + #size-cells = <0>; + + /* Example for a SPI device node */ + accelerometer@0 { + compatible = "adi,adis16240"; + reg = <0>; + spi-max-frequency = <2500000>; + interrupt-parent = <&gpio0>; + interrupts = <0 IRQ_TYPE_LEVEL_HIGH>; + }; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/accel/bma180.txt b/Documentation/devicetree/bindings/iio/accel/bma180.txt --- a/Documentation/devicetree/bindings/iio/accel/bma180.txt +++ b/Documentation/devicetree/bindings/iio/accel/bma180.txt @@ -1,11 +1,14 @@ -* Bosch BMA180 / BMA250 triaxial acceleration sensor +* Bosch BMA180 / BMA25x triaxial acceleration sensor http://omapworld.com/BMA180_111_1002839.pdf http://ae-bst.resource.bosch.com/media/products/dokumente/bma250/bst-bma250-ds002-05.pdf Required properties: - - compatible : should be "bosch,bma180" or "bosch,bma250" + - compatible : should be one of: + "bosch,bma180" + "bosch,bma250" + "bosch,bma254" - reg : the I2C address of the sensor Optional properties: diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/accel/bosch,bma400.yaml b/Documentation/devicetree/bindings/iio/accel/bosch,bma400.yaml --- a/Documentation/devicetree/bindings/iio/accel/bosch,bma400.yaml +++ b/Documentation/devicetree/bindings/iio/accel/bosch,bma400.yaml @@ -0,0 +1,54 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iio/accel/bosch,bma400.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Bosch BMA400 triaxial acceleration sensor + +maintainers: + - Dan Robertson + +description: | + Acceleration and temperature iio sensors with an i2c interface + + Specifications about the sensor can be found at: + https://ae-bst.resource.bosch.com/media/_tech/media/datasheets/BST-BMA400-DS000.pdf + +properties: + compatible: + enum: + - bosch,bma400 + + reg: + maxItems: 1 + + vdd-supply: + description: phandle to the regulator that provides power to the accelerometer + + vddio-supply: + description: phandle to the regulator that provides power to the sensor's IO + + interrupts: + maxItems: 1 + +required: + - compatible + - reg + +examples: + - | + #include + #include + i2c { + #address-cells = <1>; + #size-cells = <0>; + accelerometer@14 { + compatible = "bosch,bma400"; + reg = <0x14>; + vdd-supply = <&vdd>; + vddio-supply = <&vddio>; + interrupt-parent = <&gpio0>; + interrupts = <0 IRQ_TYPE_LEVEL_HIGH>; + }; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/accel/kionix,kxcjk1013.txt b/Documentation/devicetree/bindings/iio/accel/kionix,kxcjk1013.txt --- a/Documentation/devicetree/bindings/iio/accel/kionix,kxcjk1013.txt +++ b/Documentation/devicetree/bindings/iio/accel/kionix,kxcjk1013.txt @@ -9,9 +9,16 @@ "kionix,kxtf9" - reg: i2c slave address +Optional properties: + + - mount-matrix: an optional 3x3 mounting rotation matrix + Example: kxtf9@f { compatible = "kionix,kxtf9"; reg = <0x0F>; + mount-matrix = "0", "1", "0", + "1", "0", "0", + "0", "0", "1"; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/adc/adi,ad7091r5.yaml b/Documentation/devicetree/bindings/iio/adc/adi,ad7091r5.yaml --- a/Documentation/devicetree/bindings/iio/adc/adi,ad7091r5.yaml +++ b/Documentation/devicetree/bindings/iio/adc/adi,ad7091r5.yaml @@ -0,0 +1,54 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iio/adc/adi,ad7091r5.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Analog Devices AD7091R5 4-Channel 12-Bit ADC + +maintainers: + - Beniamin Bia + +description: | + Analog Devices AD7091R5 4-Channel 12-Bit ADC + https://www.analog.com/media/en/technical-documentation/data-sheets/ad7091r-5.pdf + +properties: + compatible: + enum: + - adi,ad7091r5 + + reg: + maxItems: 1 + + vref-supply: + description: + Phandle to the vref power supply + + interrupts: + maxItems: 1 + + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + #include + #include + i2c { + #address-cells = <1>; + #size-cells = <0>; + + adc@2f { + compatible = "adi,ad7091r5"; + reg = <0x2f>; + + interrupts = <25 IRQ_TYPE_EDGE_FALLING>; + interrupt-parent = <&gpio>; + }; + }; +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/adc/lltc,ltc2496.yaml b/Documentation/devicetree/bindings/iio/adc/lltc,ltc2496.yaml --- a/Documentation/devicetree/bindings/iio/adc/lltc,ltc2496.yaml +++ b/Documentation/devicetree/bindings/iio/adc/lltc,ltc2496.yaml @@ -0,0 +1,47 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iio/adc/lltc,ltc2496.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Linear Technology / Analog Devices LTC2496 ADC + +maintainers: + - Lars-Peter Clausen + - Michael Hennerich + - Stefan Popa + +properties: + compatible: + enum: + - lltc,ltc2496 + + vref-supply: + description: phandle to an external regulator providing the reference voltage + allOf: + - $ref: /schemas/types.yaml#/definitions/phandle + + reg: + description: spi chipselect number according to the usual spi bindings + + spi-max-frequency: + description: maximal spi bus frequency supported + +required: + - compatible + - vref-supply + - reg + +examples: + - | + spi { + #address-cells = <1>; + #size-cells = <0>; + + adc@0 { + compatible = "lltc,ltc2496"; + reg = <0>; + vref-supply = <<c2496_reg>; + spi-max-frequency = <2000000>; + }; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/adc/sigma-delta-modulator.txt b/Documentation/devicetree/bindings/iio/adc/sigma-delta-modulator.txt --- a/Documentation/devicetree/bindings/iio/adc/sigma-delta-modulator.txt +++ b/Documentation/devicetree/bindings/iio/adc/sigma-delta-modulator.txt @@ -1,13 +0,0 @@ -Device-Tree bindings for sigma delta modulator - -Required properties: -- compatible: should be "ads1201", "sd-modulator". "sd-modulator" can be use - as a generic SD modulator if modulator not specified in compatible list. -- #io-channel-cells = <0>: See the IIO bindings section "IIO consumers". - -Example node: - - ads1202: adc { - compatible = "sd-modulator"; - #io-channel-cells = <0>; - }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/adc/sigma-delta-modulator.yaml b/Documentation/devicetree/bindings/iio/adc/sigma-delta-modulator.yaml --- a/Documentation/devicetree/bindings/iio/adc/sigma-delta-modulator.yaml +++ b/Documentation/devicetree/bindings/iio/adc/sigma-delta-modulator.yaml @@ -0,0 +1,37 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iio/adc/sigma-delta-modulator.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Device-Tree bindings for sigma delta modulator + +maintainers: + - Arnaud Pouliquen + +properties: + compatible: + description: | + "sd-modulator" can be used as a generic SD modulator, + if the modulator is not specified in the compatible list. + enum: + - sd-modulator + - ads1201 + + '#io-channel-cells': + const: 0 + +required: + - compatible + - '#io-channel-cells' + +additionalProperties: false + +examples: + - | + ads1202: adc { + compatible = "sd-modulator"; + #io-channel-cells = <0>; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/imu/adi,adis16480.txt b/Documentation/devicetree/bindings/iio/imu/adi,adis16480.txt --- a/Documentation/devicetree/bindings/iio/imu/adi,adis16480.txt +++ b/Documentation/devicetree/bindings/iio/imu/adi,adis16480.txt @@ -8,6 +8,7 @@ * "adi,adis16480" * "adi,adis16485" * "adi,adis16488" + * "adi,adis16490" * "adi,adis16495-1" * "adi,adis16495-2" * "adi,adis16495-3" diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/pressure/asc,dlhl60d.yaml b/Documentation/devicetree/bindings/iio/pressure/asc,dlhl60d.yaml --- a/Documentation/devicetree/bindings/iio/pressure/asc,dlhl60d.yaml +++ b/Documentation/devicetree/bindings/iio/pressure/asc,dlhl60d.yaml @@ -0,0 +1,51 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iio/pressure/asc,dlhl60d.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: All Sensors DLH series low voltage digital pressure sensors + +maintainers: + - Tomislav Denis + +description: | + Bindings for the All Sensors DLH series pressure sensors. + + Specifications about the sensors can be found at: + http://www.allsensors.com/cad/DS-0355_Rev_B.PDF + +properties: + compatible: + enum: + - asc,dlhl60d + - asc,dlhl60g + + reg: + description: I2C device address + maxItems: 1 + + interrupts: + description: interrupt mapping for EOC(data ready) pin + maxItems: 1 + +required: + - compatible + - reg + +examples: + - | + #include + + i2c0 { + #address-cells = <1>; + #size-cells = <0>; + + pressure@29 { + compatible = "asc,dlhl60d"; + reg = <0x29>; + interrupt-parent = <&gpio0>; + interrupts = <10 IRQ_TYPE_EDGE_RISING>; + }; + }; +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/proximity/parallax-ping.yaml b/Documentation/devicetree/bindings/iio/proximity/parallax-ping.yaml --- a/Documentation/devicetree/bindings/iio/proximity/parallax-ping.yaml +++ b/Documentation/devicetree/bindings/iio/proximity/parallax-ping.yaml @@ -0,0 +1,51 @@ +# SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iio/proximity/parallax-ping.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Parallax PING))) and LaserPING range finder + +maintainers: + - Andreas Klinger + +description: | + Bit-banging driver using one GPIO: + - ping-gpios is raised by the driver to start measurement + - direction of ping-gpio is then switched into input with an interrupt + for receiving distance value as PWM signal + + Specifications about the devices can be found at: + http://parallax.com/sites/default/files/downloads/28041-LaserPING-2m-Rangefinder-Guide.pdf + http://parallax.com/sites/default/files/downloads/28015-PING-Documentation-v1.6.pdf + +properties: + compatible: + enum: + - parallax,ping + - parallax,laserping + + ping-gpios: + description: + Definition of the GPIO for the triggering and echo (output and input) + This GPIO is set for about 5 us by the driver to tell the device it + should initiate the measurement cycle. Afterwards the GPIO is switched + to input direction with an interrupt. The device sets it and the + length of the input signal corresponds to the measured distance. + It needs to be an GPIO which is able to deliver an interrupt because + the time between two interrupts is measured in the driver. + See Documentation/devicetree/bindings/gpio/gpio.txt for information + on how to specify a consumer gpio. + maxItems: 1 + +required: + - compatible + - ping-gpios + +examples: + - | + #include + proximity { + compatible = "parallax,laserping"; + ping-gpios = <&gpio0 26 GPIO_ACTIVE_HIGH>; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/iio/temperature/maxim_thermocouple.txt b/Documentation/devicetree/bindings/iio/temperature/maxim_thermocouple.txt --- a/Documentation/devicetree/bindings/iio/temperature/maxim_thermocouple.txt +++ b/Documentation/devicetree/bindings/iio/temperature/maxim_thermocouple.txt @@ -5,7 +5,10 @@ Required properties: - - compatible: must be "maxim,max31855" or "maxim,max6675" + - compatible: must be "maxim,max6675" or one of the following: + "maxim,max31855k", "maxim,max31855j", "maxim,max31855n", + "maxim,max31855s", "maxim,max31855t", "maxim,max31855e", + "maxim,max31855r"; the generic "max,max31855" is deprecated. - reg: SPI chip select number for the device - spi-max-frequency: must be 4300000 - spi-cpha: must be defined for max6675 to enable SPI mode 1 @@ -15,7 +18,7 @@ Example: max31855@0 { - compatible = "maxim,max31855"; + compatible = "maxim,max31855k"; reg = <0>; spi-max-frequency = <4300000>; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/interconnect/qcom,msm8916.yaml b/Documentation/devicetree/bindings/interconnect/qcom,msm8916.yaml --- a/Documentation/devicetree/bindings/interconnect/qcom,msm8916.yaml +++ b/Documentation/devicetree/bindings/interconnect/qcom,msm8916.yaml @@ -0,0 +1,77 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/interconnect/qcom,msm8916.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm MSM8916 Network-On-Chip interconnect + +maintainers: + - Georgi Djakov + +description: | + The Qualcomm MSM8916 interconnect providers support adjusting the + bandwidth requirements between the various NoC fabrics. + +properties: + compatible: + enum: + - qcom,msm8916-bimc + - qcom,msm8916-pcnoc + - qcom,msm8916-snoc + + reg: + maxItems: 1 + + '#interconnect-cells': + const: 1 + + clock-names: + items: + - const: bus + - const: bus_a + + clocks: + items: + - description: Bus Clock + - description: Bus A Clock + +required: + - compatible + - reg + - '#interconnect-cells' + - clock-names + - clocks + +additionalProperties: false + +examples: + - | + #include + + bimc: interconnect@400000 { + compatible = "qcom,msm8916-bimc"; + reg = <0x00400000 0x62000>; + #interconnect-cells = <1>; + clock-names = "bus", "bus_a"; + clocks = <&rpmcc RPM_SMD_BIMC_CLK>, + <&rpmcc RPM_SMD_BIMC_A_CLK>; + }; + + pcnoc: interconnect@500000 { + compatible = "qcom,msm8916-pcnoc"; + reg = <0x00500000 0x11000>; + #interconnect-cells = <1>; + clock-names = "bus", "bus_a"; + clocks = <&rpmcc RPM_SMD_PCNOC_CLK>, + <&rpmcc RPM_SMD_PCNOC_A_CLK>; + }; + + snoc: interconnect@580000 { + compatible = "qcom,msm8916-snoc"; + reg = <0x00580000 0x14000>; + #interconnect-cells = <1>; + clock-names = "bus", "bus_a"; + clocks = <&rpmcc RPM_SMD_SNOC_CLK>, + <&rpmcc RPM_SMD_SNOC_A_CLK>; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.txt b/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.txt --- a/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.txt @@ -17,6 +17,7 @@ "amlogic,meson-axg-gpio-intc" for AXG SoCs (A113D, A113X) "amlogic,meson-g12a-gpio-intc" for G12A SoCs (S905D2, S905X2, S905Y2) "amlogic,meson-sm1-gpio-intc" for SM1 SoCs (S905D3, S905X3, S905Y3) + "amlogic,meson-a1-gpio-intc" for A1 SoCs (A113L) - reg : Specifies base physical address and size of the registers. - interrupt-controller : Identifies the node as an interrupt controller. - #interrupt-cells : Specifies the number of cells needed to encode an diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2xxx-scu-ic.txt b/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2xxx-scu-ic.txt --- a/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2xxx-scu-ic.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2xxx-scu-ic.txt @@ -0,0 +1,23 @@ +Aspeed AST25XX and AST26XX SCU Interrupt Controller + +Required Properties: + - #interrupt-cells : must be 1 + - compatible : must be "aspeed,ast2500-scu-ic", + "aspeed,ast2600-scu-ic0" or + "aspeed,ast2600-scu-ic1" + - interrupts : interrupt from the parent controller + - interrupt-controller : indicates that the controller receives and + fires new interrupts for child busses + +Example: + + syscon@1e6e2000 { + ranges = <0 0x1e6e2000 0x1a8>; + + scu_ic: interrupt-controller@18 { + #interrupt-cells = <1>; + compatible = "aspeed,ast2500-scu-ic"; + interrupts = <21>; + interrupt-controller; + }; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/interrupt-controller/fsl,intmux.yaml b/Documentation/devicetree/bindings/interrupt-controller/fsl,intmux.yaml --- a/Documentation/devicetree/bindings/interrupt-controller/fsl,intmux.yaml +++ b/Documentation/devicetree/bindings/interrupt-controller/fsl,intmux.yaml @@ -0,0 +1,68 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/interrupt-controller/fsl,intmux.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Freescale INTMUX interrupt multiplexer + +maintainers: + - Joakim Zhang + +properties: + compatible: + const: fsl,imx-intmux + + reg: + maxItems: 1 + + interrupts: + minItems: 1 + maxItems: 8 + description: | + Should contain the parent interrupt lines (up to 8) used to multiplex + the input interrupts. + + interrupt-controller: true + + '#interrupt-cells': + const: 2 + description: | + The 1st cell is hw interrupt number, the 2nd cell is channel index. + + clocks: + description: ipg clock. + + clock-names: + const: ipg + +required: + - compatible + - reg + - interrupts + - interrupt-controller + - '#interrupt-cells' + - clocks + - clock-names + +additionalProperties: false + +examples: + - | + interrupt-controller@37400000 { + compatible = "fsl,imx-intmux"; + reg = <0x37400000 0x1000>; + interrupts = <0 16 4>, + <0 17 4>, + <0 18 4>, + <0 19 4>, + <0 20 4>, + <0 21 4>, + <0 22 4>, + <0 23 4>; + interrupt-controller; + interrupt-parent = <&gic>; + #interrupt-cells = <2>; + clocks = <&clk>; + clock-names = "ipg"; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/memory-controllers/fsl/imx8m-ddrc.yaml b/Documentation/devicetree/bindings/memory-controllers/fsl/imx8m-ddrc.yaml --- a/Documentation/devicetree/bindings/memory-controllers/fsl/imx8m-ddrc.yaml +++ b/Documentation/devicetree/bindings/memory-controllers/fsl/imx8m-ddrc.yaml @@ -0,0 +1,72 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/memory-controllers/fsl/imx8m-ddrc.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: i.MX8M DDR Controller + +maintainers: + - Leonard Crestez + +description: + The DDRC block is integrated in i.MX8M for interfacing with DDR based + memories. + + It supports switching between different frequencies at runtime but during + this process RAM itself becomes briefly inaccessible so actual frequency + switching is implemented by TF-A code which runs from a SRAM area. + + The Linux driver for the DDRC doesn't even map registers (they're included + for the sake of "describing hardware"), it mostly just exposes firmware + capabilities through standard Linux mechanism like devfreq and OPP tables. + +properties: + compatible: + items: + - enum: + - fsl,imx8mn-ddrc + - fsl,imx8mm-ddrc + - fsl,imx8mq-ddrc + - const: fsl,imx8m-ddrc + + reg: + maxItems: 1 + description: + Base address and size of DDRC CTL area. + This is not currently mapped by the imx8m-ddrc driver. + + clocks: + maxItems: 4 + + clock-names: + items: + - const: core + - const: pll + - const: alt + - const: apb + + operating-points-v2: true + opp-table: true + +required: + - reg + - compatible + - clocks + - clock-names + +additionalProperties: false + +examples: + - | + #include + ddrc: memory-controller@3d400000 { + compatible = "fsl,imx8mm-ddrc", "fsl,imx8m-ddrc"; + reg = <0x3d400000 0x400000>; + clock-names = "core", "pll", "alt", "apb"; + clocks = <&clk IMX8MM_CLK_DRAM_CORE>, + <&clk IMX8MM_DRAM_PLL>, + <&clk IMX8MM_CLK_DRAM_ALT>, + <&clk IMX8MM_CLK_DRAM_APB>; + operating-points-v2 = <&ddrc_opp_table>; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mfd/xylon,logicvc.yaml b/Documentation/devicetree/bindings/mfd/xylon,logicvc.yaml --- a/Documentation/devicetree/bindings/mfd/xylon,logicvc.yaml +++ b/Documentation/devicetree/bindings/mfd/xylon,logicvc.yaml @@ -0,0 +1,50 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +# Copyright 2019 Bootlin +%YAML 1.2 +--- +$id: "http://devicetree.org/schemas/mfd/xylon,logicvc.yaml#" +$schema: "http://devicetree.org/meta-schemas/core.yaml#" + +title: Xylon LogiCVC multi-function device + +maintainers: + - Paul Kocialkowski + +description: | + The LogiCVC is a display controller that also contains a GPIO controller. + As a result, a multi-function device is exposed as parent of the display + and GPIO blocks. + +properties: + compatible: + items: + - enum: + - xylon,logicvc-3.02.a + - const: syscon + - const: simple-mfd + + reg: + maxItems: 1 + +select: + properties: + compatible: + contains: + enum: + - xylon,logicvc-3.02.a + + required: + - compatible + +required: + - compatible + - reg + +examples: + - | + logicvc: logicvc@43c00000 { + compatible = "xylon,logicvc-3.02.a", "syscon", "simple-mfd"; + reg = <0x43c00000 0x6000>; + #address-cells = <1>; + #size-cells = <1>; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/brcm,sdhci-brcmstb.txt b/Documentation/devicetree/bindings/mmc/brcm,sdhci-brcmstb.txt --- a/Documentation/devicetree/bindings/mmc/brcm,sdhci-brcmstb.txt +++ b/Documentation/devicetree/bindings/mmc/brcm,sdhci-brcmstb.txt @@ -11,28 +11,43 @@ - compatible: should be one of the following - "brcm,bcm7425-sdhci" - "brcm,bcm7445-sdhci" + - "brcm,bcm7216-sdhci" Refer to clocks/clock-bindings.txt for generic clock consumer properties. Example: - sdhci@f03e0100 { - compatible = "brcm,bcm7425-sdhci"; - reg = <0xf03e0000 0x100>; - interrupts = <0x0 0x26 0x0>; - sdhci,auto-cmd12; - clocks = <&sw_sdio>; + sdhci@84b0000 { sd-uhs-sdr50; sd-uhs-ddr50; + sd-uhs-sdr104; + sdhci,auto-cmd12; + compatible = "brcm,bcm7216-sdhci", + "brcm,bcm7445-sdhci", + "brcm,sdhci-brcmstb"; + reg = <0x84b0000 0x260 0x84b0300 0x200>; + reg-names = "host", "cfg"; + interrupts = <0x0 0x26 0x4>; + interrupt-names = "sdio0_0"; + clocks = <&scmi_clk 245>; + clock-names = "sw_sdio"; }; - sdhci@f03e0300 { + sdhci@84b1000 { + mmc-ddr-1_8v; + mmc-hs200-1_8v; + mmc-hs400-1_8v; + mmc-hs400-enhanced-strobe; + supports-cqe; non-removable; bus-width = <0x8>; - compatible = "brcm,bcm7425-sdhci"; - reg = <0xf03e0200 0x100>; - interrupts = <0x0 0x27 0x0>; - sdhci,auto-cmd12; - clocks = ; - mmc-hs200-1_8v; + compatible = "brcm,bcm7216-sdhci", + "brcm,bcm7445-sdhci", + "brcm,sdhci-brcmstb"; + reg = <0x84b1000 0x260 0x84b1300 0x200>; + reg-names = "host", "cfg"; + interrupts = <0x0 0x27 0x4>; + interrupt-names = "sdio1_0"; + clocks = <&scmi_clk 245>; + clock-names = "sw_sdio"; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt --- a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt +++ b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt @@ -21,6 +21,7 @@ "fsl,imx8mq-usdhc" "fsl,imx8mm-usdhc" "fsl,imx8mn-usdhc" + "fsl,imx8mp-usdhc" "fsl,imx8qxp-usdhc" Optional properties: diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/renesas,sdhi.txt b/Documentation/devicetree/bindings/mmc/renesas,sdhi.txt --- a/Documentation/devicetree/bindings/mmc/renesas,sdhi.txt +++ b/Documentation/devicetree/bindings/mmc/renesas,sdhi.txt @@ -23,7 +23,8 @@ "renesas,sdhi-r8a7793" - SDHI IP on R8A7793 SoC "renesas,sdhi-r8a7794" - SDHI IP on R8A7794 SoC "renesas,sdhi-r8a7795" - SDHI IP on R8A7795 SoC - "renesas,sdhi-r8a7796" - SDHI IP on R8A7796 SoC + "renesas,sdhi-r8a7796" - SDHI IP on R8A77960 SoC + "renesas,sdhi-r8a77961" - SDHI IP on R8A77961 SoC "renesas,sdhi-r8a77965" - SDHI IP on R8A77965 SoC "renesas,sdhi-r8a77970" - SDHI IP on R8A77970 SoC "renesas,sdhi-r8a77980" - SDHI IP on R8A77980 SoC diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt --- a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt +++ b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt @@ -1,49 +0,0 @@ -* Rockchip specific extensions to the Synopsys Designware Mobile - Storage Host Controller - -The Synopsys designware mobile storage host controller is used to interface -a SoC with storage medium such as eMMC or SD/MMC cards. This file documents -differences between the core Synopsys dw mshc controller properties described -by synopsys-dw-mshc.txt and the properties used by the Rockchip specific -extensions to the Synopsys Designware Mobile Storage Host Controller. - -Required Properties: - -* compatible: should be - - "rockchip,rk2928-dw-mshc": for Rockchip RK2928 and following, - before RK3288 - - "rockchip,rk3288-dw-mshc": for Rockchip RK3288 - - "rockchip,rv1108-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RV1108 - - "rockchip,px30-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip PX30 - - "rockchip,rk3036-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RK3036 - - "rockchip,rk3228-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RK322x - - "rockchip,rk3328-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RK3328 - - "rockchip,rk3368-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RK3368 - - "rockchip,rk3399-dw-mshc", "rockchip,rk3288-dw-mshc": for Rockchip RK3399 - -Optional Properties: -* clocks: from common clock binding: if ciu-drive and ciu-sample are - specified in clock-names, should contain handles to these clocks. - -* clock-names: Apart from the clock-names described in synopsys-dw-mshc.txt - two more clocks "ciu-drive" and "ciu-sample" are supported. They are used - to control the clock phases, "ciu-sample" is required for tuning high- - speed modes. - -* rockchip,default-sample-phase: The default phase to set ciu-sample at - probing, low speeds or in case where all phases work at tuning time. - If not specified 0 deg will be used. - -* rockchip,desired-num-phases: The desired number of times that the host - execute tuning when needed. If not specified, the host will do tuning - for 360 times, namely tuning for each degree. - -Example: - - rkdwmmc0@12200000 { - compatible = "rockchip,rk3288-dw-mshc"; - reg = <0x12200000 0x1000>; - interrupts = <0 75 0>; - #address-cells = <1>; - #size-cells = <0>; - }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.yaml b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.yaml --- a/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.yaml +++ b/Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.yaml @@ -0,0 +1,125 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/mmc/rockchip-dw-mshc.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Rockchip designware mobile storage host controller device tree bindings + +description: + Rockchip uses the Synopsys designware mobile storage host controller + to interface a SoC with storage medium such as eMMC or SD/MMC cards. + This file documents the combined properties for the core Synopsys dw mshc + controller that are not already included in the synopsys-dw-mshc-common.yaml + file and the Rockchip specific extensions. + +allOf: + - $ref: "synopsys-dw-mshc-common.yaml#" + +maintainers: + - Heiko Stuebner + +# Everything else is described in the common file +properties: + compatible: + oneOf: + # for Rockchip RK2928 and before RK3288 + - const: rockchip,rk2928-dw-mshc + # for Rockchip RK3288 + - const: rockchip,rk3288-dw-mshc + - items: + - enum: + # for Rockchip PX30 + - rockchip,px30-dw-mshc + # for Rockchip RK3036 + - rockchip,rk3036-dw-mshc + # for Rockchip RK322x + - rockchip,rk3228-dw-mshc + # for Rockchip RK3308 + - rockchip,rk3308-dw-mshc + # for Rockchip RK3328 + - rockchip,rk3328-dw-mshc + # for Rockchip RK3368 + - rockchip,rk3368-dw-mshc + # for Rockchip RK3399 + - rockchip,rk3399-dw-mshc + # for Rockchip RV1108 + - rockchip,rv1108-dw-mshc + - const: rockchip,rk3288-dw-mshc + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + minItems: 2 + maxItems: 4 + description: + Handle to "biu" and "ciu" clocks for the bus interface unit clock and + the card interface unit clock. If "ciu-drive" and "ciu-sample" are + specified in clock-names, it should also contain + handles to these clocks. + + clock-names: + minItems: 2 + items: + - const: biu + - const: ciu + - const: ciu-drive + - const: ciu-sample + description: + Apart from the clock-names "biu" and "ciu" two more clocks + "ciu-drive" and "ciu-sample" are supported. They are used + to control the clock phases, "ciu-sample" is required for tuning + high speed modes. + + rockchip,default-sample-phase: + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32 + minimum: 0 + maximum: 360 + default: 0 + description: + The default phase to set "ciu-sample" at probing, + low speeds or in case where all phases work at tuning time. + If not specified 0 deg will be used. + + rockchip,desired-num-phases: + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32 + minimum: 0 + maximum: 360 + default: 360 + description: + The desired number of times that the host execute tuning when needed. + If not specified, the host will do tuning for 360 times, + namely tuning for each degree. + +required: + - compatible + - reg + - interrupts + - clocks + - clock-names + +examples: + - | + #include + #include + #include + sdmmc: mmc@ff0c0000 { + compatible = "rockchip,rk3288-dw-mshc"; + reg = <0x0 0xff0c0000 0x0 0x4000>; + interrupts = ; + clocks = <&cru HCLK_SDMMC>, <&cru SCLK_SDMMC>, + <&cru SCLK_SDMMC_DRV>, <&cru SCLK_SDMMC_SAMPLE>; + clock-names = "biu", "ciu", "ciu-drive", "ciu-sample"; + resets = <&cru SRST_MMC0>; + reset-names = "reset"; + fifo-depth = <0x100>; + max-frequency = <150000000>; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt b/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt --- a/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt +++ b/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt @@ -5,11 +5,16 @@ sdhci-of-at91 driver. Required properties: -- compatible: Must be "atmel,sama5d2-sdhci". +- compatible: Must be "atmel,sama5d2-sdhci" or "microchip,sam9x60-sdhci". - clocks: Phandlers to the clocks. -- clock-names: Must be "hclock", "multclk", "baseclk"; +- clock-names: Must be "hclock", "multclk", "baseclk" for + "atmel,sama5d2-sdhci". + Must be "hclock", "multclk" for "microchip,sam9x60-sdhci". Optional properties: +- assigned-clocks: The same with "multclk". +- assigned-clock-rates The rate of "multclk" in order to not rely on the + gck configuration set by previous components. - microchip,sdcal-inverted: when present, polarity on the SDCAL SoC pin is inverted. The default polarity for this signal is described in the datasheet. For instance on SAMA5D2, the pin is usually tied to the GND with a resistor @@ -17,10 +22,12 @@ Example: -sdmmc0: sdio-host@a0000000 { +mmc0: sdio-host@a0000000 { compatible = "atmel,sama5d2-sdhci"; reg = <0xa0000000 0x300>; interrupts = <31 IRQ_TYPE_LEVEL_HIGH 0>; clocks = <&sdmmc0_hclk>, <&sdmmc0_gclk>, <&main>; clock-names = "hclock", "multclk", "baseclk"; + assigned-clocks = <&sdmmc0_gclk>; + assigned-clock-rates = <480000000>; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/sdhci-msm.txt b/Documentation/devicetree/bindings/mmc/sdhci-msm.txt --- a/Documentation/devicetree/bindings/mmc/sdhci-msm.txt +++ b/Documentation/devicetree/bindings/mmc/sdhci-msm.txt @@ -19,6 +19,7 @@ "qcom,msm8996-sdhci", "qcom,sdhci-msm-v4" "qcom,sdm845-sdhci", "qcom,sdhci-msm-v5" "qcom,qcs404-sdhci", "qcom,sdhci-msm-v5" + "qcom,sc7180-sdhci", "qcom,sdhci-msm-v5"; NOTE that some old device tree files may be floating around that only have the string "qcom,sdhci-msm-v4" without the SoC compatible string but doing that should be considered a deprecated practice. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/sdhci-omap.txt b/Documentation/devicetree/bindings/mmc/sdhci-omap.txt --- a/Documentation/devicetree/bindings/mmc/sdhci-omap.txt +++ b/Documentation/devicetree/bindings/mmc/sdhci-omap.txt @@ -7,6 +7,8 @@ Required properties: - compatible: Should be "ti,dra7-sdhci" for DRA7 and DRA72 controllers Should be "ti,k2g-sdhci" for K2G + Should be "ti,am335-sdhci" for am335x controllers + Should be "ti,am437-sdhci" for am437x controllers - ti,hwmods: Must be "mmc", is controller instance starting 1 (Not required for K2G). - pinctrl-names: Should be subset of "default", "hs", "sdr12", "sdr25", "sdr50", @@ -15,6 +17,13 @@ "hs200_1_8v", - pinctrl- : Pinctrl states as described in bindings/pinctrl/pinctrl-bindings.txt +Optional properties: +- dmas: List of DMA specifiers with the controller specific format as described + in the generic DMA client binding. A tx and rx specifier is required. +- dma-names: List of DMA request names. These strings correspond 1:1 with the + DMA specifiers listed in dmas. The string naming is to be "tx" + and "rx" for TX and RX DMA requests, respectively. + Example: mmc1: mmc@4809c000 { compatible = "ti,dra7-sdhci"; @@ -22,4 +31,6 @@ ti,hwmods = "mmc1"; bus-width = <4>; vmmc-supply = <&vmmc>; /* phandle to regulator node */ + dmas = <&sdma 61 &sdma 62>; + dma-names = "tx", "rx"; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc-common.yaml b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc-common.yaml --- a/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc-common.yaml +++ b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc-common.yaml @@ -0,0 +1,68 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/mmc/synopsys-dw-mshc-common.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Synopsys Designware Mobile Storage Host Controller Common Properties + +allOf: + - $ref: "mmc-controller.yaml#" + +maintainers: + - Ulf Hansson + +# Everything else is described in the common file +properties: + resets: + maxItems: 1 + + reset-names: + const: reset + + clock-frequency: + description: + Should be the frequency (in Hz) of the ciu clock. If this + is specified and the ciu clock is specified then we'll try to set the ciu + clock to this at probe time. + + fifo-depth: + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32 + description: + The maximum size of the tx/rx fifo's. If this property is not + specified, the default value of the fifo size is determined from the + controller registers. + + card-detect-delay: + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32 + - default: 0 + description: + Delay in milli-seconds before detecting card after card + insert event. The default value is 0. + + data-addr: + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32 + description: + Override fifo address with value provided by DT. The default FIFO reg + offset is assumed as 0x100 (version < 0x240A) and 0x200(version >= 0x240A) + by driver. If the controller does not follow this rule, please use + this property to set fifo address in device tree. + + fifo-watermark-aligned: + allOf: + - $ref: /schemas/types.yaml#/definitions/flag + description: + Data done irq is expected if data length is less than + watermark in PIO mode. But fifo watermark is requested to be aligned + with data length in some SoC so that TX/RX irq can be generated with + data done irq. Add this watermark quirk to mark this requirement and + force fifo watermark setting accordingly. + + dmas: + maxItems: 1 + + dma-names: + const: rx-tx diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt --- a/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt +++ b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.txt @@ -1,141 +0,0 @@ -* Synopsys Designware Mobile Storage Host Controller - -The Synopsys designware mobile storage host controller is used to interface -a SoC with storage medium such as eMMC or SD/MMC cards. This file documents -differences between the core mmc properties described by mmc.txt and the -properties used by the Synopsys Designware Mobile Storage Host Controller. - -Required Properties: - -* compatible: should be - - snps,dw-mshc: for controllers compliant with synopsys dw-mshc. -* #address-cells: should be 1. -* #size-cells: should be 0. - -# Slots (DEPRECATED): The slot specific information are contained within - child-nodes with each child-node representing a supported slot. There should - be atleast one child node representing a card slot. The name of the child node - representing the slot is recommended to be slot@n where n is the unique number - of the slot connected to the controller. The following are optional properties - which can be included in the slot child node. - - * reg: specifies the physical slot number. The valid values of this - property is 0 to (num-slots -1), where num-slots is the value - specified by the num-slots property. - - * bus-width: as documented in mmc core bindings. - - * wp-gpios: specifies the write protect gpio line. The format of the - gpio specifier depends on the gpio controller. If a GPIO is not used - for write-protect, this property is optional. - - * disable-wp: If the wp-gpios property isn't present then (by default) - we'd assume that the write protect is hooked up directly to the - controller's special purpose write protect line (accessible via - the WRTPRT register). However, it's possible that we simply don't - want write protect. In that case specify 'disable-wp'. - NOTE: This property is not required for slots known to always - connect to eMMC or SDIO cards. - -Optional properties: - -* resets: phandle + reset specifier pair, intended to represent hardware - reset signal present internally in some host controller IC designs. - See Documentation/devicetree/bindings/reset/reset.txt for details. - -* reset-names: request name for using "resets" property. Must be "reset". - (It will be used together with "resets" property.) - -* clocks: from common clock binding: handle to biu and ciu clocks for the - bus interface unit clock and the card interface unit clock. - -* clock-names: from common clock binding: Shall be "biu" and "ciu". - If the biu clock is missing we'll simply skip enabling it. If the - ciu clock is missing we'll just assume that the clock is running at - clock-frequency. It is an error to omit both the ciu clock and the - clock-frequency. - -* clock-frequency: should be the frequency (in Hz) of the ciu clock. If this - is specified and the ciu clock is specified then we'll try to set the ciu - clock to this at probe time. - -* fifo-depth: The maximum size of the tx/rx fifo's. If this property is not - specified, the default value of the fifo size is determined from the - controller registers. - -* card-detect-delay: Delay in milli-seconds before detecting card after card - insert event. The default value is 0. - -* data-addr: Override fifo address with value provided by DT. The default FIFO reg - offset is assumed as 0x100 (version < 0x240A) and 0x200(version >= 0x240A) by - driver. If the controller does not follow this rule, please use this property - to set fifo address in device tree. - -* fifo-watermark-aligned: Data done irq is expected if data length is less than - watermark in PIO mode. But fifo watermark is requested to be aligned with data - length in some SoC so that TX/RX irq can be generated with data done irq. Add this - watermark quirk to mark this requirement and force fifo watermark setting - accordingly. - -* vmmc-supply: The phandle to the regulator to use for vmmc. If this is - specified we'll defer probe until we can find this regulator. - -* dmas: List of DMA specifiers with the controller specific format as described - in the generic DMA client binding. Refer to dma.txt for details. - -* dma-names: request names for generic DMA client binding. Must be "rx-tx". - Refer to dma.txt for details. - -Aliases: - -- All the MSHC controller nodes should be represented in the aliases node using - the following format 'mshc{n}' where n is a unique number for the alias. - -Example: - -The MSHC controller node can be split into two portions, SoC specific and -board specific portions as listed below. - - dwmmc0@12200000 { - compatible = "snps,dw-mshc"; - clocks = <&clock 351>, <&clock 132>; - clock-names = "biu", "ciu"; - reg = <0x12200000 0x1000>; - interrupts = <0 75 0>; - #address-cells = <1>; - #size-cells = <0>; - data-addr = <0x200>; - fifo-watermark-aligned; - resets = <&rst 20>; - reset-names = "reset"; - }; - -[board specific internal DMA resources] - - dwmmc0@12200000 { - clock-frequency = <400000000>; - clock-freq-min-max = <400000 200000000>; - broken-cd; - fifo-depth = <0x80>; - card-detect-delay = <200>; - vmmc-supply = <&buck8>; - bus-width = <8>; - cap-mmc-highspeed; - cap-sd-highspeed; - }; - -[board specific generic DMA request binding] - - dwmmc0@12200000 { - clock-frequency = <400000000>; - clock-freq-min-max = <400000 200000000>; - broken-cd; - fifo-depth = <0x80>; - card-detect-delay = <200>; - vmmc-supply = <&buck8>; - bus-width = <8>; - cap-mmc-highspeed; - cap-sd-highspeed; - dmas = <&pdma 12>; - dma-names = "rx-tx"; - }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.yaml b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.yaml --- a/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.yaml +++ b/Documentation/devicetree/bindings/mmc/synopsys-dw-mshc.yaml @@ -0,0 +1,70 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/mmc/synopsys-dw-mshc.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Synopsys Designware Mobile Storage Host Controller Binding + +allOf: + - $ref: "synopsys-dw-mshc-common.yaml#" + +maintainers: + - Ulf Hansson + +# Everything else is described in the common file +properties: + compatible: + const: snps,dw-mshc + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + minItems: 2 + maxItems: 2 + description: + Handle to "biu" and "ciu" clocks for the + bus interface unit clock and the card interface unit clock. + + clock-names: + items: + - const: biu + - const: ciu + +required: + - compatible + - reg + - interrupts + - clocks + - clock-names + +examples: + - | + mmc@12200000 { + compatible = "snps,dw-mshc"; + reg = <0x12200000 0x1000>; + interrupts = <0 75 0>; + clocks = <&clock 351>, <&clock 132>; + clock-names = "biu", "ciu"; + dmas = <&pdma 12>; + dma-names = "rx-tx"; + resets = <&rst 20>; + reset-names = "reset"; + vmmc-supply = <&buck8>; + #address-cells = <1>; + #size-cells = <0>; + broken-cd; + bus-width = <8>; + cap-mmc-highspeed; + cap-sd-highspeed; + card-detect-delay = <200>; + clock-freq-min-max = <400000 200000000>; + clock-frequency = <400000000>; + data-addr = <0x200>; + fifo-depth = <0x80>; + fifo-watermark-aligned; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt b/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt --- a/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt +++ b/Documentation/devicetree/bindings/net/broadcom-bluetooth.txt @@ -11,6 +11,7 @@ - compatible: should contain one of the following: * "brcm,bcm20702a1" + * "brcm,bcm4329-bt" * "brcm,bcm4330-bt" * "brcm,bcm43438-bt" * "brcm,bcm4345c5" @@ -22,7 +23,9 @@ - max-speed: see Documentation/devicetree/bindings/serial/slave-device.txt - shutdown-gpios: GPIO specifier, used to enable the BT module - device-wakeup-gpios: GPIO specifier, used to wakeup the controller - - host-wakeup-gpios: GPIO specifier, used to wakeup the host processor + - host-wakeup-gpios: GPIO specifier, used to wakeup the host processor. + deprecated, replaced by interrupts and + "host-wakeup" interrupt-names - clocks: 1 or 2 clocks as defined in clock-names below, in that order - clock-names: names for clock inputs, matching the clocks given - "extclk": deprecated, replaced by "txco" @@ -30,7 +33,14 @@ - "lpo": external low power 32.768 kHz clock - vbat-supply: phandle to regulator supply for VBAT - vddio-supply: phandle to regulator supply for VDDIO - + - brcm,bt-pcm-int-params: configure PCM parameters via a 5-byte array + - sco-routing: 0 = PCM, 1 = Transport, 2 = Codec, 3 = I2S + - pcm-interface-rate: 128KBps, 256KBps, 512KBps, 1024KBps, 2048KBps + - pcm-frame-type: short, long + - pcm-sync-mode: slave, master + - pcm-clock-mode: slave, master + - interrupts: must be one, used to wakeup the host processor + - interrupt-names: must be "host-wakeup" Example: @@ -41,5 +51,6 @@ bluetooth { compatible = "brcm,bcm43438-bt"; max-speed = <921600>; + brcm,bt-pcm-int-params = [01 02 00 01 01]; }; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/net/dsa/ar9331.txt b/Documentation/devicetree/bindings/net/dsa/ar9331.txt --- a/Documentation/devicetree/bindings/net/dsa/ar9331.txt +++ b/Documentation/devicetree/bindings/net/dsa/ar9331.txt @@ -0,0 +1,148 @@ +Atheros AR9331 built-in switch +============================= + +It is a switch built-in to Atheros AR9331 WiSoC and addressable over internal +MDIO bus. All PHYs are built-in as well. + +Required properties: + + - compatible: should be: "qca,ar9331-switch" + - reg: Address on the MII bus for the switch. + - resets : Must contain an entry for each entry in reset-names. + - reset-names : Must include the following entries: "switch" + - interrupt-parent: Phandle to the parent interrupt controller + - interrupts: IRQ line for the switch + - interrupt-controller: Indicates the switch is itself an interrupt + controller. This is used for the PHY interrupts. + - #interrupt-cells: must be 1 + - mdio: Container of PHY and devices on the switches MDIO bus. + +See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list of additional +required and optional properties. +Examples: + +eth0: ethernet@19000000 { + compatible = "qca,ar9330-eth"; + reg = <0x19000000 0x200>; + interrupts = <4>; + + resets = <&rst 9>, <&rst 22>; + reset-names = "mac", "mdio"; + clocks = <&pll ATH79_CLK_AHB>, <&pll ATH79_CLK_AHB>; + clock-names = "eth", "mdio"; + + phy-mode = "mii"; + phy-handle = <&phy_port4>; +}; + +eth1: ethernet@1a000000 { + compatible = "qca,ar9330-eth"; + reg = <0x1a000000 0x200>; + interrupts = <5>; + resets = <&rst 13>, <&rst 23>; + reset-names = "mac", "mdio"; + clocks = <&pll ATH79_CLK_AHB>, <&pll ATH79_CLK_AHB>; + clock-names = "eth", "mdio"; + + phy-mode = "gmii"; + + fixed-link { + speed = <1000>; + full-duplex; + }; + + mdio { + #address-cells = <1>; + #size-cells = <0>; + + switch10: switch@10 { + #address-cells = <1>; + #size-cells = <0>; + + compatible = "qca,ar9331-switch"; + reg = <0x10>; + resets = <&rst 8>; + reset-names = "switch"; + + interrupt-parent = <&miscintc>; + interrupts = <12>; + + interrupt-controller; + #interrupt-cells = <1>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + switch_port0: port@0 { + reg = <0x0>; + label = "cpu"; + ethernet = <ð1>; + + phy-mode = "gmii"; + + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + + switch_port1: port@1 { + reg = <0x1>; + phy-handle = <&phy_port0>; + phy-mode = "internal"; + }; + + switch_port2: port@2 { + reg = <0x2>; + phy-handle = <&phy_port1>; + phy-mode = "internal"; + }; + + switch_port3: port@3 { + reg = <0x3>; + phy-handle = <&phy_port2>; + phy-mode = "internal"; + }; + + switch_port4: port@4 { + reg = <0x4>; + phy-handle = <&phy_port3>; + phy-mode = "internal"; + }; + }; + + mdio { + #address-cells = <1>; + #size-cells = <0>; + + interrupt-parent = <&switch10>; + + phy_port0: phy@0 { + reg = <0x0>; + interrupts = <0>; + }; + + phy_port1: phy@1 { + reg = <0x1>; + interrupts = <0>; + }; + + phy_port2: phy@2 { + reg = <0x2>; + interrupts = <0>; + }; + + phy_port3: phy@3 { + reg = <0x3>; + interrupts = <0>; + }; + + phy_port4: phy@4 { + reg = <0x4>; + interrupts = <0>; + }; + }; + }; + }; +}; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/net/mediatek-dwmac.txt b/Documentation/devicetree/bindings/net/mediatek-dwmac.txt --- a/Documentation/devicetree/bindings/net/mediatek-dwmac.txt +++ b/Documentation/devicetree/bindings/net/mediatek-dwmac.txt @@ -14,7 +14,7 @@ Should be "macirq" for the main MAC IRQ - clocks: Must contain a phandle for each entry in clock-names. - clock-names: The name of the clock listed in the clocks property. These are - "axi", "apb", "mac_main", "ptp_ref" for MT2712 SoC + "axi", "apb", "mac_main", "ptp_ref", "rmii_internal" for MT2712 SoC. - mac-address: See ethernet.txt in the same directory - phy-mode: See ethernet.txt in the same directory - mediatek,pericfg: A phandle to the syscon node that control ethernet @@ -23,8 +23,10 @@ Optional properties: - mediatek,tx-delay-ps: TX clock delay macro value. Default is 0. It should be defined for RGMII/MII interface. + It should be defined for RMII interface when the reference clock is from MT2712 SoC. - mediatek,rx-delay-ps: RX clock delay macro value. Default is 0. - It should be defined for RGMII/MII/RMII interface. + It should be defined for RGMII/MII interface. + It should be defined for RMII interface. Both delay properties need to be a multiple of 170 for RGMII interface, or will round down. Range 0~31*170. Both delay properties need to be a multiple of 550 for MII/RMII interface, @@ -34,13 +36,20 @@ reference clock, which is from external PHYs, is connected to RXC pin on MT2712 SoC. Otherwise, is connected to TXC pin. +- mediatek,rmii-clk-from-mac: boolean property, if present indicates that + MT2712 SoC provides the RMII reference clock, which outputs to TXC pin only. - mediatek,txc-inverse: boolean property, if present indicates that 1. tx clock will be inversed in MII/RGMII case, 2. tx clock inside MAC will be inversed relative to reference clock which is from external PHYs in RMII case, and it rarely happen. + 3. the reference clock, which outputs to TXC pin will be inversed in RMII case + when the reference clock is from MT2712 SoC. - mediatek,rxc-inverse: boolean property, if present indicates that 1. rx clock will be inversed in MII/RGMII case. - 2. reference clock will be inversed when arrived at MAC in RMII case. + 2. reference clock will be inversed when arrived at MAC in RMII case, when + the reference clock is from external PHYs. + 3. the inside clock, which be sent to MAC, will be inversed in RMII case when + the reference clock is from MT2712 SoC. - assigned-clocks: mac_main and ptp_ref clocks - assigned-clock-parents: parent clocks of the assigned clocks @@ -50,29 +59,33 @@ reg = <0 0x1101c000 0 0x1300>; interrupts = ; interrupt-names = "macirq"; - phy-mode ="rgmii"; + phy-mode ="rgmii-rxid"; mac-address = [00 55 7b b5 7d f7]; clock-names = "axi", "apb", "mac_main", "ptp_ref", - "ptp_top"; + "rmii_internal"; clocks = <&pericfg CLK_PERI_GMAC>, <&pericfg CLK_PERI_GMAC_PCLK>, <&topckgen CLK_TOP_ETHER_125M_SEL>, - <&topckgen CLK_TOP_ETHER_50M_SEL>; + <&topckgen CLK_TOP_ETHER_50M_SEL>, + <&topckgen CLK_TOP_ETHER_50M_RMII_SEL>; assigned-clocks = <&topckgen CLK_TOP_ETHER_125M_SEL>, - <&topckgen CLK_TOP_ETHER_50M_SEL>; + <&topckgen CLK_TOP_ETHER_50M_SEL>, + <&topckgen CLK_TOP_ETHER_50M_RMII_SEL>; assigned-clock-parents = <&topckgen CLK_TOP_ETHERPLL_125M>, - <&topckgen CLK_TOP_APLL1_D3>; + <&topckgen CLK_TOP_APLL1_D3>, + <&topckgen CLK_TOP_ETHERPLL_50M>; + power-domains = <&scpsys MT2712_POWER_DOMAIN_AUDIO>; mediatek,pericfg = <&pericfg>; mediatek,tx-delay-ps = <1530>; mediatek,rx-delay-ps = <1530>; mediatek,rmii-rxc; mediatek,txc-inverse; mediatek,rxc-inverse; - snps,txpbl = <32>; - snps,rxpbl = <32>; + snps,txpbl = <1>; + snps,rxpbl = <1>; snps,reset-gpio = <&pio 87 GPIO_ACTIVE_LOW>; snps,reset-active-low; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/net/ti,dp83867.txt b/Documentation/devicetree/bindings/net/ti,dp83867.txt --- a/Documentation/devicetree/bindings/net/ti,dp83867.txt +++ b/Documentation/devicetree/bindings/net/ti,dp83867.txt @@ -8,8 +8,6 @@ - ti,tx-internal-delay - RGMII Transmit Clock Delay - see dt-bindings/net/ti-dp83867.h for applicable values. Required only if interface type is PHY_INTERFACE_MODE_RGMII_ID or PHY_INTERFACE_MODE_RGMII_TXID - - ti,fifo-depth - Transmitt FIFO depth- see dt-bindings/net/ti-dp83867.h - for applicable values Note: If the interface type is PHY_INTERFACE_MODE_RGMII the TX/RX clock delays will be left at their default values, as set by the PHY's pin strapping. @@ -42,6 +40,14 @@ Some MACs work with differential SGMII clock. See data manual for details. + - ti,fifo-depth - Transmitt FIFO depth- see dt-bindings/net/ti-dp83867.h + for applicable values (deprecated) + + -tx-fifo-depth - As defined in the ethernet-controller.yaml. Values for + the depth can be found in dt-bindings/net/ti-dp83867.h + -rx-fifo-depth - As defined in the ethernet-controller.yaml. Values for + the depth can be found in dt-bindings/net/ti-dp83867.h + Note: ti,min-output-impedance and ti,max-output-impedance are mutually exclusive. When both properties are present ti,max-output-impedance takes precedence. @@ -55,7 +61,7 @@ reg = <0>; ti,rx-internal-delay = ; ti,tx-internal-delay = ; - ti,fifo-depth = ; + tx-fifo-depth = ; }; Datasheet can be found: diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt @@ -50,7 +50,7 @@ entry in clock-names. - clock-names: Should contain the clock names "wifi_wcss_cmd", "wifi_wcss_ref", "wifi_wcss_rtc" for "qcom,ipq4019-wifi" compatible target and - "cxo_ref_clk_pin" for "qcom,wcn3990-wifi" + "cxo_ref_clk_pin" and optionally "qdss" for "qcom,wcn3990-wifi" compatible target. - qcom,msi_addr: MSI interrupt address. - qcom,msi_base: Base value to add before writing MSI data into @@ -88,6 +88,9 @@ of the host capability QMI request - qcom,xo-cal-data: xo cal offset to be configured in xo trim register. +- qcom,msa-fixed-perm: Boolean context flag to disable SCM call for statically + mapped msa region. + Example (to supply PCI based wifi block details): In this example, the node is defined as child node of the PCI controller. @@ -185,4 +188,5 @@ vdd-3.3-ch0-supply = <&vreg_l25a_3p3>; memory-region = <&wifi_msa_mem>; iommus = <&apps_smmu 0x0040 0x1>; + qcom,msa-fixed-perm; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml @@ -0,0 +1,273 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +# Copyright (c) 2018-2019 The Linux Foundation. All rights reserved. + +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/wireless/qcom,ath11k.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm Technologies ath11k wireless devices Generic Binding + +maintainers: + - Kalle Valo + +description: | + These are dt entries for Qualcomm Technologies, Inc. IEEE 802.11ax + devices, for example like AHB based IPQ8074. + +properties: + compatible: + const: qcom,ipq8074-wifi + + reg: + maxItems: 1 + + interrupts: + items: + - description: misc-pulse1 interrupt events + - description: misc-latch interrupt events + - description: sw exception interrupt events + - description: watchdog interrupt events + - description: interrupt event for ring CE0 + - description: interrupt event for ring CE1 + - description: interrupt event for ring CE2 + - description: interrupt event for ring CE3 + - description: interrupt event for ring CE4 + - description: interrupt event for ring CE5 + - description: interrupt event for ring CE6 + - description: interrupt event for ring CE7 + - description: interrupt event for ring CE8 + - description: interrupt event for ring CE9 + - description: interrupt event for ring CE10 + - description: interrupt event for ring CE11 + - description: interrupt event for ring host2wbm-desc-feed + - description: interrupt event for ring host2reo-re-injection + - description: interrupt event for ring host2reo-command + - description: interrupt event for ring host2rxdma-monitor-ring3 + - description: interrupt event for ring host2rxdma-monitor-ring2 + - description: interrupt event for ring host2rxdma-monitor-ring1 + - description: interrupt event for ring reo2ost-exception + - description: interrupt event for ring wbm2host-rx-release + - description: interrupt event for ring reo2host-status + - description: interrupt event for ring reo2host-destination-ring4 + - description: interrupt event for ring reo2host-destination-ring3 + - description: interrupt event for ring reo2host-destination-ring2 + - description: interrupt event for ring reo2host-destination-ring1 + - description: interrupt event for ring rxdma2host-monitor-destination-mac3 + - description: interrupt event for ring rxdma2host-monitor-destination-mac2 + - description: interrupt event for ring rxdma2host-monitor-destination-mac1 + - description: interrupt event for ring ppdu-end-interrupts-mac3 + - description: interrupt event for ring ppdu-end-interrupts-mac2 + - description: interrupt event for ring ppdu-end-interrupts-mac1 + - description: interrupt event for ring rxdma2host-monitor-status-ring-mac3 + - description: interrupt event for ring rxdma2host-monitor-status-ring-mac2 + - description: interrupt event for ring rxdma2host-monitor-status-ring-mac1 + - description: interrupt event for ring host2rxdma-host-buf-ring-mac3 + - description: interrupt event for ring host2rxdma-host-buf-ring-mac2 + - description: interrupt event for ring host2rxdma-host-buf-ring-mac1 + - description: interrupt event for ring rxdma2host-destination-ring-mac3 + - description: interrupt event for ring rxdma2host-destination-ring-mac2 + - description: interrupt event for ring rxdma2host-destination-ring-mac1 + - description: interrupt event for ring host2tcl-input-ring4 + - description: interrupt event for ring host2tcl-input-ring3 + - description: interrupt event for ring host2tcl-input-ring2 + - description: interrupt event for ring host2tcl-input-ring1 + - description: interrupt event for ring wbm2host-tx-completions-ring3 + - description: interrupt event for ring wbm2host-tx-completions-ring2 + - description: interrupt event for ring wbm2host-tx-completions-ring1 + - description: interrupt event for ring tcl2host-status-ring + + + interrupt-names: + items: + - const: misc-pulse1 + - const: misc-latch + - const: sw-exception + - const: watchdog + - const: ce0 + - const: ce1 + - const: ce2 + - const: ce3 + - const: ce4 + - const: ce5 + - const: ce6 + - const: ce7 + - const: ce8 + - const: ce9 + - const: ce10 + - const: ce11 + - const: host2wbm-desc-feed + - const: host2reo-re-injection + - const: host2reo-command + - const: host2rxdma-monitor-ring3 + - const: host2rxdma-monitor-ring2 + - const: host2rxdma-monitor-ring1 + - const: reo2ost-exception + - const: wbm2host-rx-release + - const: reo2host-status + - const: reo2host-destination-ring4 + - const: reo2host-destination-ring3 + - const: reo2host-destination-ring2 + - const: reo2host-destination-ring1 + - const: rxdma2host-monitor-destination-mac3 + - const: rxdma2host-monitor-destination-mac2 + - const: rxdma2host-monitor-destination-mac1 + - const: ppdu-end-interrupts-mac3 + - const: ppdu-end-interrupts-mac2 + - const: ppdu-end-interrupts-mac1 + - const: rxdma2host-monitor-status-ring-mac3 + - const: rxdma2host-monitor-status-ring-mac2 + - const: rxdma2host-monitor-status-ring-mac1 + - const: host2rxdma-host-buf-ring-mac3 + - const: host2rxdma-host-buf-ring-mac2 + - const: host2rxdma-host-buf-ring-mac1 + - const: rxdma2host-destination-ring-mac3 + - const: rxdma2host-destination-ring-mac2 + - const: rxdma2host-destination-ring-mac1 + - const: host2tcl-input-ring4 + - const: host2tcl-input-ring3 + - const: host2tcl-input-ring2 + - const: host2tcl-input-ring1 + - const: wbm2host-tx-completions-ring3 + - const: wbm2host-tx-completions-ring2 + - const: wbm2host-tx-completions-ring1 + - const: tcl2host-status-ring + + qcom,rproc: + $ref: /schemas/types.yaml#definitions/phandle + description: + DT entry of q6v5-wcss remoteproc driver. + Phandle to a node that can contain the following properties + * compatible + * reg + * reg-names + +required: + - compatible + - reg + - interrupts + - interrupt-names + - qcom,rproc + +additionalProperties: false + +examples: + - | + + q6v5_wcss: q6v5_wcss@CD00000 { + compatible = "qcom,ipq8074-wcss-pil"; + reg = <0xCD00000 0x4040>, + <0x4AB000 0x20>; + reg-names = "qdsp6", + "rmb"; + }; + + wifi0: wifi@c000000 { + compatible = "qcom,ipq8074-wifi"; + reg = <0xc000000 0x2000000>; + interrupts = <0 320 1>, + <0 319 1>, + <0 318 1>, + <0 317 1>, + <0 316 1>, + <0 315 1>, + <0 314 1>, + <0 311 1>, + <0 310 1>, + <0 411 1>, + <0 410 1>, + <0 40 1>, + <0 39 1>, + <0 302 1>, + <0 301 1>, + <0 37 1>, + <0 36 1>, + <0 296 1>, + <0 295 1>, + <0 294 1>, + <0 293 1>, + <0 292 1>, + <0 291 1>, + <0 290 1>, + <0 289 1>, + <0 288 1>, + <0 239 1>, + <0 236 1>, + <0 235 1>, + <0 234 1>, + <0 233 1>, + <0 232 1>, + <0 231 1>, + <0 230 1>, + <0 229 1>, + <0 228 1>, + <0 224 1>, + <0 223 1>, + <0 203 1>, + <0 183 1>, + <0 180 1>, + <0 179 1>, + <0 178 1>, + <0 177 1>, + <0 176 1>, + <0 163 1>, + <0 162 1>, + <0 160 1>, + <0 159 1>, + <0 158 1>, + <0 157 1>, + <0 156 1>; + interrupt-names = "misc-pulse1", + "misc-latch", + "sw-exception", + "watchdog", + "ce0", + "ce1", + "ce2", + "ce3", + "ce4", + "ce5", + "ce6", + "ce7", + "ce8", + "ce9", + "ce10", + "ce11", + "host2wbm-desc-feed", + "host2reo-re-injection", + "host2reo-command", + "host2rxdma-monitor-ring3", + "host2rxdma-monitor-ring2", + "host2rxdma-monitor-ring1", + "reo2ost-exception", + "wbm2host-rx-release", + "reo2host-status", + "reo2host-destination-ring4", + "reo2host-destination-ring3", + "reo2host-destination-ring2", + "reo2host-destination-ring1", + "rxdma2host-monitor-destination-mac3", + "rxdma2host-monitor-destination-mac2", + "rxdma2host-monitor-destination-mac1", + "ppdu-end-interrupts-mac3", + "ppdu-end-interrupts-mac2", + "ppdu-end-interrupts-mac1", + "rxdma2host-monitor-status-ring-mac3", + "rxdma2host-monitor-status-ring-mac2", + "rxdma2host-monitor-status-ring-mac1", + "host2rxdma-host-buf-ring-mac3", + "host2rxdma-host-buf-ring-mac2", + "host2rxdma-host-buf-ring-mac1", + "rxdma2host-destination-ring-mac3", + "rxdma2host-destination-ring-mac2", + "rxdma2host-destination-ring-mac1", + "host2tcl-input-ring4", + "host2tcl-input-ring3", + "host2tcl-input-ring2", + "host2tcl-input-ring1", + "wbm2host-tx-completions-ring3", + "wbm2host-tx-completions-ring2", + "wbm2host-tx-completions-ring1", + "tcl2host-status-ring"; + qcom,rproc = <&q6v5_wcss>; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/nvmem/imx-ocotp.txt b/Documentation/devicetree/bindings/nvmem/imx-ocotp.txt --- a/Documentation/devicetree/bindings/nvmem/imx-ocotp.txt +++ b/Documentation/devicetree/bindings/nvmem/imx-ocotp.txt @@ -2,7 +2,7 @@ This binding represents the on-chip eFuse OTP controller found on i.MX6Q/D, i.MX6DL/S, i.MX6SL, i.MX6SX, i.MX6UL, i.MX6ULL/ULZ, i.MX6SLL, -i.MX7D/S, i.MX7ULP, i.MX8MQ, i.MX8MM and i.MX8MN SoCs. +i.MX7D/S, i.MX7ULP, i.MX8MQ, i.MX8MM, i.MX8MN and i.MX8MP SoCs. Required properties: - compatible: should be one of @@ -17,6 +17,7 @@ "fsl,imx8mq-ocotp" (i.MX8MQ), "fsl,imx8mm-ocotp" (i.MX8MM), "fsl,imx8mn-ocotp" (i.MX8MN), + "fsl,imx8mp-ocotp" (i.MX8MP), followed by "syscon". - #address-cells : Should be 1 - #size-cells : Should be 1 diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/nvmem/qcom,spmi-sdam.yaml b/Documentation/devicetree/bindings/nvmem/qcom,spmi-sdam.yaml --- a/Documentation/devicetree/bindings/nvmem/qcom,spmi-sdam.yaml +++ b/Documentation/devicetree/bindings/nvmem/qcom,spmi-sdam.yaml @@ -0,0 +1,84 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/nvmem/qcom,spmi-sdam.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm Technologies, Inc. SPMI SDAM DT bindings + +maintainers: + - Shyam Kumar Thella + +description: | + The SDAM provides scratch register space for the PMIC clients. This + memory can be used by software to store information or communicate + to/from the PBUS. + +allOf: + - $ref: "nvmem.yaml#" + +properties: + compatible: + enum: + - qcom,spmi-sdam + + reg: + maxItems: 1 + + "#address-cells": + const: 1 + + "#size-cells": + const: 1 + + ranges: true + +required: + - compatible + - reg + - ranges + +patternProperties: + "^.*@[0-9a-f]+$": + type: object + + properties: + reg: + maxItems: 1 + description: + Offset and size in bytes within the storage device. + + bits: + $ref: /schemas/types.yaml#/definitions/uint32-array + maxItems: 1 + items: + items: + - minimum: 0 + maximum: 7 + description: + Offset in bit within the address range specified by reg. + - minimum: 1 + description: + Size in bit within the address range specified by reg. + + required: + - reg + + additionalProperties: false + +examples: + - | + sdam_1: nvram@b000 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "qcom,spmi-sdam"; + reg = <0xb000 0x100>; + ranges = <0 0xb000 0x100>; + + /* Data cells */ + restart_reason: restart@50 { + reg = <0x50 0x1>; + bits = <6 2>; + }; + }; +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/phy/allwinner,sun9i-a80-usb-phy.yaml b/Documentation/devicetree/bindings/phy/allwinner,sun9i-a80-usb-phy.yaml --- a/Documentation/devicetree/bindings/phy/allwinner,sun9i-a80-usb-phy.yaml +++ b/Documentation/devicetree/bindings/phy/allwinner,sun9i-a80-usb-phy.yaml @@ -0,0 +1,135 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/phy/allwinner,sun9i-a80-usb-phy.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Allwinner A80 USB PHY Device Tree Bindings + +maintainers: + - Chen-Yu Tsai + - Maxime Ripard + +properties: + "#phy-cells": + const: 0 + + compatible: + const: allwinner,sun9i-a80-usb-phy + + reg: + maxItems: 1 + + clocks: + anyOf: + - description: Main PHY Clock + + - items: + - description: Main PHY clock + - description: HSIC 12MHz clock + - description: HSIC 480MHz clock + + clock-names: + oneOf: + - const: phy + + - items: + - const: phy + - const: hsic_12M + - const: hsic_480M + + resets: + anyOf: + - description: Normal USB PHY reset + + - items: + - description: Normal USB PHY reset + - description: HSIC Reset + + reset-names: + oneOf: + - const: phy + + - items: + - const: phy + - const: hsic + + phy_type: + const: hsic + description: + When absent, the PHY type will be assumed to be normal USB. + + phy-supply: + description: + Regulator that powers VBUS + +required: + - "#phy-cells" + - compatible + - reg + - clocks + - clock-names + - resets + - reset-names + +additionalProperties: false + +if: + properties: + phy_type: + const: hsic + + required: + - phy_type + +then: + properties: + clocks: + maxItems: 3 + + clock-names: + maxItems: 3 + + resets: + maxItems: 2 + + reset-names: + maxItems: 2 + +examples: + - | + #include + #include + + usbphy1: phy@a00800 { + compatible = "allwinner,sun9i-a80-usb-phy"; + reg = <0x00a00800 0x4>; + clocks = <&usb_clocks CLK_USB0_PHY>; + clock-names = "phy"; + resets = <&usb_clocks RST_USB0_PHY>; + reset-names = "phy"; + phy-supply = <®_usb1_vbus>; + #phy-cells = <0>; + }; + + - | + #include + #include + + usbphy3: phy@a02800 { + compatible = "allwinner,sun9i-a80-usb-phy"; + reg = <0x00a02800 0x4>; + clocks = <&usb_clocks CLK_USB2_PHY>, + <&usb_clocks CLK_USB_HSIC>, + <&usb_clocks CLK_USB2_HSIC>; + clock-names = "phy", + "hsic_12M", + "hsic_480M"; + resets = <&usb_clocks RST_USB2_PHY>, + <&usb_clocks RST_USB2_HSIC>; + reset-names = "phy", + "hsic"; + phy_type = "hsic"; + phy-supply = <®_usb3_vbus>; + #phy-cells = <0>; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/phy/brcm,brcmstb-usb-phy.txt b/Documentation/devicetree/bindings/phy/brcm,brcmstb-usb-phy.txt --- a/Documentation/devicetree/bindings/phy/brcm,brcmstb-usb-phy.txt +++ b/Documentation/devicetree/bindings/phy/brcm,brcmstb-usb-phy.txt @@ -1,30 +1,49 @@ Broadcom STB USB PHY Required properties: - - compatible: brcm,brcmstb-usb-phy - - reg: two offset and length pairs. - The first pair specifies a manditory set of memory mapped - registers used for general control of the PHY. - The second pair specifies optional registers used by some of - the SoCs that support USB 3.x - - #phy-cells: Shall be 1 as it expects one argument for setting - the type of the PHY. Possible values are: - - PHY_TYPE_USB2 for USB1.1/2.0 PHY - - PHY_TYPE_USB3 for USB3.x PHY +- compatible: should be one of + "brcm,brcmstb-usb-phy" + "brcm,bcm7216-usb-phy" + "brcm,bcm7211-usb-phy" + +- reg and reg-names properties requirements are specific to the + compatible string. + "brcm,brcmstb-usb-phy": + - reg: 1 or 2 offset and length pairs. One for the base CTRL registers + and an optional pair for systems with USB 3.x support + - reg-names: not specified + "brcm,bcm7216-usb-phy": + - reg: 3 offset and length pairs for CTRL, XHCI_EC and XHCI_GBL + registers + - reg-names: "ctrl", "xhci_ec", "xhci_gbl" + "brcm,bcm7211-usb-phy": + - reg: 5 offset and length pairs for CTRL, XHCI_EC, XHCI_GBL, + USB_PHY and USB_MDIO registers and an optional pair + for the BDC registers + - reg-names: "ctrl", "xhci_ec", "xhci_gbl", "usb_phy", "usb_mdio", "bdc_ec" + +- #phy-cells: Shall be 1 as it expects one argument for setting + the type of the PHY. Possible values are: + - PHY_TYPE_USB2 for USB1.1/2.0 PHY + - PHY_TYPE_USB3 for USB3.x PHY Optional Properties: - clocks : clock phandles. - clock-names: String, clock name. +- interrupts: wakeup interrupt +- interrupt-names: "wakeup" - brcm,ipp: Boolean, Invert Port Power. Possible values are: 0 (Don't invert), 1 (Invert) - brcm,ioc: Boolean, Invert Over Current detection. Possible values are: 0 (Don't invert), 1 (Invert) -NOTE: one or both of the following two properties must be set -- brcm,has-xhci: Boolean indicating the phy has an XHCI phy. -- brcm,has-eohci: Boolean indicating the phy has an EHCI/OHCI phy. - dr_mode: String, PHY Device mode. Possible values are: "host", "peripheral ", "drd" or "typec-pd" If this property is not defined, the phy will default to "host" mode. +- brcm,syscon-piarbctl: phandle to syscon for handling config registers +NOTE: one or both of the following two properties must be set +- brcm,has-xhci: Boolean indicating the phy has an XHCI phy. +- brcm,has-eohci: Boolean indicating the phy has an EHCI/OHCI phy. + Example: @@ -41,3 +60,27 @@ clocks = <&usb20>, <&usb30>; clock-names = "sw_usb", "sw_usb3"; }; + +usb-phy@29f0200 { + reg = <0x29f0200 0x200>, + <0x29c0880 0x30>, + <0x29cc100 0x534>, + <0x2808000 0x24>, + <0x2980080 0x8>; + reg-names = "ctrl", + "xhci_ec", + "xhci_gbl", + "usb_phy", + "usb_mdio"; + brcm,ioc = <0x0>; + brcm,ipp = <0x0>; + compatible = "brcm,bcm7211-usb-phy"; + interrupts = <0x30>; + interrupt-parent = <&vpu_intr1_nosec_intc>; + interrupt-names = "wake"; + #phy-cells = <0x1>; + brcm,has-xhci; + syscon-piarbctl = <&syscon_piarbctl>; + clocks = <&scmi_clk 256>; + clock-names = "sw_usb"; +}; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/phy/brcm-sata-phy.txt b/Documentation/devicetree/bindings/phy/brcm-sata-phy.txt --- a/Documentation/devicetree/bindings/phy/brcm-sata-phy.txt +++ b/Documentation/devicetree/bindings/phy/brcm-sata-phy.txt @@ -2,6 +2,7 @@ Required properties: - compatible: should be one or more of + "brcm,bcm7216-sata-phy" "brcm,bcm7425-sata-phy" "brcm,bcm7445-sata-phy" "brcm,iproc-ns2-sata-phy" diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/phy/intel,lgm-emmc-phy.yaml b/Documentation/devicetree/bindings/phy/intel,lgm-emmc-phy.yaml --- a/Documentation/devicetree/bindings/phy/intel,lgm-emmc-phy.yaml +++ b/Documentation/devicetree/bindings/phy/intel,lgm-emmc-phy.yaml @@ -0,0 +1,56 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/phy/intel,lgm-emmc-phy.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Intel Lightning Mountain(LGM) eMMC PHY Device Tree Bindings + +maintainers: + - Ramuthevar Vadivel Murugan + +description: |+ + Bindings for eMMC PHY on Intel's Lightning Mountain SoC, syscon + node is used to reference the base address of eMMC phy registers. + + The eMMC PHY node should be the child of a syscon node with the + required property: + + - compatible: Should be one of the following: + "intel,lgm-syscon", "syscon" + - reg: + maxItems: 1 + +properties: + compatible: + const: intel,lgm-emmc-phy + + "#phy-cells": + const: 0 + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + +required: + - "#phy-cells" + - compatible + - reg + - clocks + +examples: + - | + sysconf: chiptop@e0200000 { + compatible = "intel,lgm-syscon", "syscon"; + reg = <0xe0200000 0x100>; + + emmc-phy: emmc-phy@a8 { + compatible = "intel,lgm-emmc-phy"; + reg = <0x00a8 0x10>; + clocks = <&emmc>; + #phy-cells = <0>; + }; + }; +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/phy/phy-cadence-sierra.txt b/Documentation/devicetree/bindings/phy/phy-cadence-sierra.txt --- a/Documentation/devicetree/bindings/phy/phy-cadence-sierra.txt +++ b/Documentation/devicetree/bindings/phy/phy-cadence-sierra.txt @@ -2,21 +2,24 @@ ----------------------- Required properties: -- compatible: cdns,sierra-phy-t0 -- clocks: Must contain an entry in clock-names. - See ../clocks/clock-bindings.txt for details. -- clock-names: Must be "phy_clk" +- compatible: Must be "cdns,sierra-phy-t0" for Sierra in Cadence platform + Must be "ti,sierra-phy-t0" for Sierra in TI's J721E SoC. - resets: Must contain an entry for each in reset-names. See ../reset/reset.txt for details. - reset-names: Must include "sierra_reset" and "sierra_apb". "sierra_reset" must control the reset line to the PHY. "sierra_apb" must control the reset line to the APB PHY - interface. + interface ("sierra_apb" is optional). - reg: register range for the PHY. - #address-cells: Must be 1 - #size-cells: Must be 0 Optional properties: +- clocks: Must contain an entry in clock-names. + See ../clocks/clock-bindings.txt for details. +- clock-names: Must contain "cmn_refclk_dig_div" and + "cmn_refclk1_dig_div" for configuring the frequency of + the clock to the lanes. "phy_clk" is deprecated. - cdns,autoconf: A boolean property whose presence indicates that the PHY registers will be configured by hardware. If not present, all sub-node optional properties must be diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/phy/rockchip,px30-dsi-dphy.yaml b/Documentation/devicetree/bindings/phy/rockchip,px30-dsi-dphy.yaml --- a/Documentation/devicetree/bindings/phy/rockchip,px30-dsi-dphy.yaml +++ b/Documentation/devicetree/bindings/phy/rockchip,px30-dsi-dphy.yaml @@ -13,9 +13,6 @@ "#phy-cells": const: 0 - "#clock-cells": - const: 0 - compatible: enum: - rockchip,px30-dsi-dphy @@ -49,7 +46,6 @@ required: - "#phy-cells" - - "#clock-cells" - compatible - reg - clocks @@ -66,7 +62,6 @@ reg = <0x0 0xff2e0000 0x0 0x10000>; clocks = <&pmucru 13>, <&cru 12>; clock-names = "ref", "pclk"; - #clock-cells = <0>; resets = <&cru 12>; reset-names = "apb"; #phy-cells = <0>; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/phy/sun9i-usb-phy.txt b/Documentation/devicetree/bindings/phy/sun9i-usb-phy.txt --- a/Documentation/devicetree/bindings/phy/sun9i-usb-phy.txt +++ b/Documentation/devicetree/bindings/phy/sun9i-usb-phy.txt @@ -1,37 +0,0 @@ -Allwinner sun9i USB PHY ------------------------ - -Required properties: -- compatible : should be one of - * allwinner,sun9i-a80-usb-phy -- reg : a list of offset + length pairs -- #phy-cells : from the generic phy bindings, must be 0 -- phy_type : "hsic" for HSIC usage; - other values or absence of this property indicates normal USB -- clocks : phandle + clock specifier for the phy clocks -- clock-names : depending on the "phy_type" property, - * "phy" for normal USB - * "hsic_480M", "hsic_12M" for HSIC -- resets : a list of phandle + reset specifier pairs -- reset-names : depending on the "phy_type" property, - * "phy" for normal USB - * "hsic" for HSIC - -Optional Properties: -- phy-supply : from the generic phy bindings, a phandle to a regulator that - provides power to VBUS. - -It is recommended to list all clocks and resets available. -The driver will only use those matching the phy_type. - -Example: - usbphy1: phy@a01800 { - compatible = "allwinner,sun9i-a80-usb-phy"; - reg = <0x00a01800 0x4>; - clocks = <&usb_phy_clk 2>, <&usb_phy_clk 10>, - <&usb_phy_clk 3>; - clock-names = "hsic_480M", "hsic_12M", "phy"; - resets = <&usb_phy_clk 18>, <&usb_phy_clk 19>; - reset-names = "hsic", "phy"; - #phy-cells = <0>; - }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/phy/ti,phy-j721e-wiz.yaml b/Documentation/devicetree/bindings/phy/ti,phy-j721e-wiz.yaml --- a/Documentation/devicetree/bindings/phy/ti,phy-j721e-wiz.yaml +++ b/Documentation/devicetree/bindings/phy/ti,phy-j721e-wiz.yaml @@ -0,0 +1,221 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +# Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/ +%YAML 1.2 +--- +$id: "http://devicetree.org/schemas/phy/ti,phy-j721e-wiz.yaml#" +$schema: "http://devicetree.org/meta-schemas/core.yaml#" + +title: TI J721E WIZ (SERDES Wrapper) + +maintainers: + - Kishon Vijay Abraham I + +properties: + compatible: + enum: + - ti,j721e-wiz-16g + - ti,j721e-wiz-10g + + power-domains: + maxItems: 1 + + clocks: + maxItems: 3 + description: clock-specifier to represent input to the WIZ + + clock-names: + items: + - const: fck + - const: core_ref_clk + - const: ext_ref_clk + + num-lanes: + minimum: 1 + maximum: 4 + + "#address-cells": + const: 1 + + "#size-cells": + const: 1 + + "#reset-cells": + const: 1 + + ranges: true + + assigned-clocks: + maxItems: 2 + + assigned-clock-parents: + maxItems: 2 + + typec-dir-gpios: + maxItems: 1 + description: + GPIO to signal Type-C cable orientation for lane swap. + If GPIO is active, lane 0 and lane 1 of SERDES will be swapped to + achieve the funtionality of an external type-C plug flip mux. + + typec-dir-debounce-ms: + minimum: 100 + maximum: 1000 + default: 100 + description: + Number of milliseconds to wait before sampling typec-dir-gpio. + If not specified, the default debounce of 100ms will be used. + Type-C spec states minimum CC pin debounce of 100 ms and maximum + of 200 ms. However, some solutions might need more than 200 ms. + +patternProperties: + "^pll[0|1]-refclk$": + type: object + description: | + WIZ node should have subnodes for each of the PLLs present in + the SERDES. + properties: + clocks: + maxItems: 2 + description: Phandle to clock nodes representing the two inputs to PLL. + + "#clock-cells": + const: 0 + + assigned-clocks: + maxItems: 1 + + assigned-clock-parents: + maxItems: 1 + + required: + - clocks + - "#clock-cells" + - assigned-clocks + - assigned-clock-parents + + "^cmn-refclk1?-dig-div$": + type: object + description: + WIZ node should have subnodes for each of the PMA common refclock + provided by the SERDES. + properties: + clocks: + maxItems: 1 + description: Phandle to the clock node representing the input to the + divider clock. + + "#clock-cells": + const: 0 + + required: + - clocks + - "#clock-cells" + + "^refclk-dig$": + type: object + description: | + WIZ node should have subnode for refclk_dig to select the reference + clock source for the reference clock used in the PHY and PMA digital + logic. + properties: + clocks: + maxItems: 4 + description: Phandle to four clock nodes representing the inputs to + refclk_dig + + "#clock-cells": + const: 0 + + assigned-clocks: + maxItems: 1 + + assigned-clock-parents: + maxItems: 1 + + required: + - clocks + - "#clock-cells" + - assigned-clocks + - assigned-clock-parents + + "^serdes@[0-9a-f]+$": + type: object + description: | + WIZ node should have '1' subnode for the SERDES. It could be either + Sierra SERDES or Torrent SERDES. Sierra SERDES should follow the + bindings specified in + Documentation/devicetree/bindings/phy/phy-cadence-sierra.txt + Torrent SERDES should follow the bindings specified in + Documentation/devicetree/bindings/phy/phy-cadence-dp.txt + +required: + - compatible + - power-domains + - clocks + - clock-names + - num-lanes + - "#address-cells" + - "#size-cells" + - "#reset-cells" + - ranges + +examples: + - | + #include + + wiz@5000000 { + compatible = "ti,j721e-wiz-16g"; + #address-cells = <1>; + #size-cells = <1>; + power-domains = <&k3_pds 292 TI_SCI_PD_EXCLUSIVE>; + clocks = <&k3_clks 292 5>, <&k3_clks 292 11>, <&dummy_cmn_refclk>; + clock-names = "fck", "core_ref_clk", "ext_ref_clk"; + assigned-clocks = <&k3_clks 292 11>, <&k3_clks 292 0>; + assigned-clock-parents = <&k3_clks 292 15>, <&k3_clks 292 4>; + num-lanes = <2>; + #reset-cells = <1>; + ranges = <0x5000000 0x5000000 0x10000>; + + pll0-refclk { + clocks = <&k3_clks 293 13>, <&dummy_cmn_refclk>; + #clock-cells = <0>; + assigned-clocks = <&wiz1_pll0_refclk>; + assigned-clock-parents = <&k3_clks 293 13>; + }; + + pll1-refclk { + clocks = <&k3_clks 293 0>, <&dummy_cmn_refclk1>; + #clock-cells = <0>; + assigned-clocks = <&wiz1_pll1_refclk>; + assigned-clock-parents = <&k3_clks 293 0>; + }; + + cmn-refclk-dig-div { + clocks = <&wiz1_refclk_dig>; + #clock-cells = <0>; + }; + + cmn-refclk1-dig-div { + clocks = <&wiz1_pll1_refclk>; + #clock-cells = <0>; + }; + + refclk-dig { + clocks = <&k3_clks 292 11>, <&k3_clks 292 0>, <&dummy_cmn_refclk>, <&dummy_cmn_refclk1>; + #clock-cells = <0>; + assigned-clocks = <&wiz0_refclk_dig>; + assigned-clock-parents = <&k3_clks 292 11>; + }; + + serdes@5000000 { + compatible = "cdns,ti,sierra-phy-t0"; + reg-names = "serdes"; + reg = <0x5000000 0x10000>; + #address-cells = <1>; + #size-cells = <0>; + resets = <&serdes_wiz0 0>; + reset-names = "sierra_reset"; + clocks = <&wiz0_cmn_refclk_dig_div>, <&wiz0_cmn_refclk1_dig_div>; + clock-names = "cmn_refclk_dig_div", "cmn_refclk1_dig_div"; + }; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/pinctrl/aspeed,ast2600-pinctrl.yaml b/Documentation/devicetree/bindings/pinctrl/aspeed,ast2600-pinctrl.yaml --- a/Documentation/devicetree/bindings/pinctrl/aspeed,ast2600-pinctrl.yaml +++ b/Documentation/devicetree/bindings/pinctrl/aspeed,ast2600-pinctrl.yaml @@ -54,8 +54,9 @@ TACH10, TACH11, TACH12, TACH13, TACH14, TACH15, TACH2, TACH3, TACH4, TACH5, TACH6, TACH7, TACH8, TACH9, THRU0, THRU1, THRU2, THRU3, TXD1, TXD2, TXD3, TXD4, UART10, UART11, UART12, UART13, - UART6, UART7, UART8, UART9, VB, VGAHS, VGAVS, WDTRST1, WDTRST2, - WDTRST3, WDTRST4, ] + UART6, UART7, UART8, UART9, USBAD, USBADP, USB2AH, USB2AHP, + USB2BD, USB2BH, VB, VGAHS, VGAVS, WDTRST1, WDTRST2, WDTRST3, + WDTRST4, ] groups: allOf: - $ref: "/schemas/types.yaml#/definitions/string" @@ -85,8 +86,8 @@ TACH10, TACH11, TACH12, TACH13, TACH14, TACH15, TACH2, TACH3, TACH4, TACH5, TACH6, TACH7, TACH8, TACH9, THRU0, THRU1, THRU2, THRU3, TXD1, TXD2, TXD3, TXD4, UART10, UART11, UART12G0, - UART12G1, UART13G0, UART13G1, UART6, UART7, UART8, UART9, VB, - VGAHS, VGAVS, WDTRST1, WDTRST2, WDTRST3, WDTRST4, ] + UART12G1, UART13G0, UART13G1, UART6, UART7, UART8, UART9, USBA, + USBB, VB, VGAHS, VGAVS, WDTRST1, WDTRST2, WDTRST3, WDTRST4, ] required: - compatible diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/pinctrl/fsl,imx8mp-pinctrl.yaml b/Documentation/devicetree/bindings/pinctrl/fsl,imx8mp-pinctrl.yaml --- a/Documentation/devicetree/bindings/pinctrl/fsl,imx8mp-pinctrl.yaml +++ b/Documentation/devicetree/bindings/pinctrl/fsl,imx8mp-pinctrl.yaml @@ -0,0 +1,69 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/pinctrl/fsl,imx8mp-pinctrl.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Freescale IMX8MP IOMUX Controller + +maintainers: + - Anson Huang + +description: + Please refer to fsl,imx-pinctrl.txt and pinctrl-bindings.txt in this directory + for common binding part and usage. + +properties: + compatible: + const: fsl,imx8mp-iomuxc + + reg: + maxItems: 1 + +# Client device subnode's properties +patternProperties: + 'grp$': + type: object + description: + Pinctrl node's client devices use subnodes for desired pin configuration. + Client device subnodes use below standard properties. + + properties: + fsl,pins: + allOf: + - $ref: /schemas/types.yaml#/definitions/uint32-array + description: + each entry consists of 6 integers and represents the mux and config + setting for one pin. The first 5 integers are specified using a PIN_FUNC_ID macro, which can + be found in . The last + integer CONFIG is the pad setting value like pull-up on this pin. Please + refer to i.MX8M Plus Reference Manual for detailed CONFIG settings. + + required: + - fsl,pins + + additionalProperties: false + +required: + - compatible + - reg + +additionalProperties: false + +examples: + # Pinmux controller node + - | + iomuxc: pinctrl@30330000 { + compatible = "fsl,imx8mp-iomuxc"; + reg = <0x30330000 0x10000>; + + pinctrl_uart2: uart2grp { + fsl,pins = < + 0x228 0x488 0x5F0 0x0 0x6 0x49 + 0x228 0x488 0x000 0x0 0x0 0x49 + >; + }; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/pinctrl/ingenic,pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/ingenic,pinctrl.txt --- a/Documentation/devicetree/bindings/pinctrl/ingenic,pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/ingenic,pinctrl.txt @@ -10,9 +10,9 @@ naming scheme "PxN" where x is a character identifying the GPIO port with which the pin is associated and N is an integer from 0 to 31 identifying the pin within that GPIO port. For example PA0 is the first pin in GPIO port A, and -PB31 is the last pin in GPIO port B. The jz4740 and the x1000 contains 4 GPIO -ports, PA to PD, for a total of 128 pins. The jz4760, the jz4770 and the jz4780 -contains 6 GPIO ports, PA to PF, for a total of 192 pins. +PB31 is the last pin in GPIO port B. The jz4740, the x1000 and the x1830 +contains 4 GPIO ports, PA to PD, for a total of 128 pins. The jz4760, the +jz4770 and the jz4780 contains 6 GPIO ports, PA to PF, for a total of 192 pins. Required properties: @@ -28,6 +28,7 @@ - "ingenic,x1000-pinctrl" - "ingenic,x1000e-pinctrl" - "ingenic,x1500-pinctrl" + - "ingenic,x1830-pinctrl" - reg: Address range of the pinctrl registers. @@ -40,6 +41,7 @@ - "ingenic,jz4770-gpio" - "ingenic,jz4780-gpio" - "ingenic,x1000-gpio" + - "ingenic,x1830-gpio" - reg: The GPIO bank number. - interrupt-controller: Marks the device node as an interrupt controller. - interrupts: Interrupt specifier for the controllers interrupt. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/pinctrl/intel,lgm-io.yaml b/Documentation/devicetree/bindings/pinctrl/intel,lgm-io.yaml --- a/Documentation/devicetree/bindings/pinctrl/intel,lgm-io.yaml +++ b/Documentation/devicetree/bindings/pinctrl/intel,lgm-io.yaml @@ -0,0 +1,75 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/pinctrl/intel,lgm-io.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Intel Lightning Mountain SoC pinmux & GPIO controller binding + +maintainers: + - Rahul Tanwar + +description: | + Pinmux & GPIO controller controls pin multiplexing & configuration including + GPIO function selection & GPIO attributes configuration. + +properties: + compatible: + const: intel,lgm-io + + reg: + maxItems: 1 + +# Client device subnode's properties +patternProperties: + '-pins$': + type: object + allOf: + - $ref: pincfg-node.yaml# + - $ref: pinmux-node.yaml# + description: + Pinctrl node's client devices use subnodes for desired pin configuration. + Client device subnodes use below standard properties. + + properties: + function: true + groups: true + pins: true + pinmux: true + bias-pull-up: true + bias-pull-down: true + drive-strength: true + slew-rate: true + drive-open-drain: true + output-enable: true + + required: + - function + - groups + + additionalProperties: false + +required: + - compatible + - reg + +additionalProperties: false + +examples: + # Pinmux controller node + - | + pinctrl: pinctrl@e2880000 { + compatible = "intel,lgm-io"; + reg = <0xe2880000 0x100000>; + + uart0-pins { + pins = <64>, /* UART_RX0 */ + <65>; /* UART_TX0 */ + function = "CONSOLE_UART0"; + pinmux = <1>, + <1>; + groups = "CONSOLE_UART0"; + }; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/pinctrl/intel,lgm-pinctrl.yaml b/Documentation/devicetree/bindings/pinctrl/intel,lgm-pinctrl.yaml --- a/Documentation/devicetree/bindings/pinctrl/intel,lgm-pinctrl.yaml +++ b/Documentation/devicetree/bindings/pinctrl/intel,lgm-pinctrl.yaml @@ -1,116 +0,0 @@ -# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) -%YAML 1.2 ---- -$id: http://devicetree.org/schemas/bindings/pinctrl/intel,lgm-pinctrl.yaml# -$schema: http://devicetree.org/meta-schemas/core.yaml# - -title: Intel Lightning Mountain SoC pinmux & GPIO controller binding - -maintainers: - - Rahul Tanwar - -description: | - Pinmux & GPIO controller controls pin multiplexing & configuration including - GPIO function selection & GPIO attributes configuration. - - Please refer to [1] for details of the common pinctrl bindings used by the - client devices. - - [1] Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt - -properties: - compatible: - const: intel,lgm-io - - reg: - maxItems: 1 - -# Client device subnode's properties -patternProperties: - '-pins$': - type: object - description: - Pinctrl node's client devices use subnodes for desired pin configuration. - Client device subnodes use below standard properties. - - properties: - function: - $ref: /schemas/types.yaml#/definitions/string - description: - A string containing the name of the function to mux to the group. - - groups: - $ref: /schemas/types.yaml#/definitions/string-array - description: - An array of strings identifying the list of groups. - - pins: - $ref: /schemas/types.yaml#/definitions/uint32-array - description: - List of pins to select with this function. - - pinmux: - description: The applicable mux group. - allOf: - - $ref: "/schemas/types.yaml#/definitions/uint32-array" - - bias-pull-up: - type: boolean - - bias-pull-down: - type: boolean - - drive-strength: - description: | - Selects the drive strength for the specified pins in mA. - 0: 2 mA - 1: 4 mA - 2: 8 mA - 3: 12 mA - allOf: - - $ref: /schemas/types.yaml#/definitions/uint32 - - enum: [0, 1, 2, 3] - - slew-rate: - type: boolean - description: | - Sets slew rate for specified pins. - 0: slow slew - 1: fast slew - - drive-open-drain: - type: boolean - - output-enable: - type: boolean - - required: - - function - - groups - - additionalProperties: false - -required: - - compatible - - reg - -additionalProperties: false - -examples: - # Pinmux controller node - - | - pinctrl: pinctrl@e2880000 { - compatible = "intel,lgm-pinctrl"; - reg = <0xe2880000 0x100000>; - - uart0-pins { - pins = <64>, /* UART_RX0 */ - <65>; /* UART_TX0 */ - function = "CONSOLE_UART0"; - pinmux = <1>, - <1>; - groups = "CONSOLE_UART0"; - }; - }; - -... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/pinctrl/pinmux-node.yaml b/Documentation/devicetree/bindings/pinctrl/pinmux-node.yaml --- a/Documentation/devicetree/bindings/pinctrl/pinmux-node.yaml +++ b/Documentation/devicetree/bindings/pinctrl/pinmux-node.yaml @@ -114,7 +114,7 @@ specific binding for the hardware defines whether the entries are integers or strings, and their meaning. - group: + groups: $ref: /schemas/types.yaml#/definitions/string-array description: the group to apply the properties to, if the driver supports diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/pinctrl/qcom,sc7180-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/qcom,sc7180-pinctrl.txt --- a/Documentation/devicetree/bindings/pinctrl/qcom,sc7180-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/qcom,sc7180-pinctrl.txt @@ -125,8 +125,9 @@ mi2s_1, mi2s_2, mss_lte, m_voc, pa_indicator, phase_flag, PLL_BIST, pll_bypassnl, pll_reset, prng_rosc, qdss, qdss_cti, qlink_enable, qlink_request, qspi_clk, qspi_cs, - qspi_data, qup00, qup01, qup02, qup03, qup04, qup05, - qup10, qup11, qup12, qup13, qup14, qup15, sdc1_tb, + qspi_data, qup00, qup01, qup02_i2c, qup02_uart, qup03, + qup04_i2c, qup04_uart, qup05, qup10, qup11_i2c, qup11_uart, + qup12, qup13_i2c, qup13_uart, qup14, qup15, sdc1_tb, sdc2_tb, sd_write, sp_cmu, tgu_ch0, tgu_ch1, tgu_ch2, tgu_ch3, tsense_pwm1, tsense_pwm2, uim1, uim2, uim_batt, usb_phy, vfr_1, _V_GPIO, _V_PPS_IN, _V_PPS_OUT, diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/power/avs/qcom,cpr.txt b/Documentation/devicetree/bindings/power/avs/qcom,cpr.txt --- a/Documentation/devicetree/bindings/power/avs/qcom,cpr.txt +++ b/Documentation/devicetree/bindings/power/avs/qcom,cpr.txt @@ -0,0 +1,130 @@ +QCOM CPR (Core Power Reduction) + +CPR (Core Power Reduction) is a technology to reduce core power on a CPU +or other device. Each OPP of a device corresponds to a "corner" that has +a range of valid voltages for a particular frequency. While the device is +running at a particular frequency, CPR monitors dynamic factors such as +temperature, etc. and suggests adjustments to the voltage to save power +and meet silicon characteristic requirements. + +- compatible: + Usage: required + Value type: + Definition: should be "qcom,qcs404-cpr", "qcom,cpr" for qcs404 + +- reg: + Usage: required + Value type: + Definition: base address and size of the rbcpr register region + +- interrupts: + Usage: required + Value type: + Definition: should specify the CPR interrupt + +- clocks: + Usage: required + Value type: + Definition: phandle to the reference clock + +- clock-names: + Usage: required + Value type: + Definition: must be "ref" + +- vdd-apc-supply: + Usage: required + Value type: + Definition: phandle to the vdd-apc-supply regulator + +- #power-domain-cells: + Usage: required + Value type: + Definition: should be 0 + +- operating-points-v2: + Usage: required + Value type: + Definition: A phandle to the OPP table containing the + performance states supported by the CPR + power domain + +- acc-syscon: + Usage: optional + Value type: + Definition: phandle to syscon for writing ACC settings + +- nvmem-cells: + Usage: required + Value type: + Definition: phandle to nvmem cells containing the data + that makes up a fuse corner, for each fuse corner. + As well as the CPR fuse revision. + +- nvmem-cell-names: + Usage: required + Value type: + Definition: should be "cpr_quotient_offset1", "cpr_quotient_offset2", + "cpr_quotient_offset3", "cpr_init_voltage1", + "cpr_init_voltage2", "cpr_init_voltage3", "cpr_quotient1", + "cpr_quotient2", "cpr_quotient3", "cpr_ring_osc1", + "cpr_ring_osc2", "cpr_ring_osc3", "cpr_fuse_revision" + for qcs404. + +Example: + + cpr_opp_table: cpr-opp-table { + compatible = "operating-points-v2-qcom-level"; + + cpr_opp1: opp1 { + opp-level = <1>; + qcom,opp-fuse-level = <1>; + }; + cpr_opp2: opp2 { + opp-level = <2>; + qcom,opp-fuse-level = <2>; + }; + cpr_opp3: opp3 { + opp-level = <3>; + qcom,opp-fuse-level = <3>; + }; + }; + + power-controller@b018000 { + compatible = "qcom,qcs404-cpr", "qcom,cpr"; + reg = <0x0b018000 0x1000>; + interrupts = <0 15 IRQ_TYPE_EDGE_RISING>; + clocks = <&xo_board>; + clock-names = "ref"; + vdd-apc-supply = <&pms405_s3>; + #power-domain-cells = <0>; + operating-points-v2 = <&cpr_opp_table>; + acc-syscon = <&tcsr>; + + nvmem-cells = <&cpr_efuse_quot_offset1>, + <&cpr_efuse_quot_offset2>, + <&cpr_efuse_quot_offset3>, + <&cpr_efuse_init_voltage1>, + <&cpr_efuse_init_voltage2>, + <&cpr_efuse_init_voltage3>, + <&cpr_efuse_quot1>, + <&cpr_efuse_quot2>, + <&cpr_efuse_quot3>, + <&cpr_efuse_ring1>, + <&cpr_efuse_ring2>, + <&cpr_efuse_ring3>, + <&cpr_efuse_revision>; + nvmem-cell-names = "cpr_quotient_offset1", + "cpr_quotient_offset2", + "cpr_quotient_offset3", + "cpr_init_voltage1", + "cpr_init_voltage2", + "cpr_init_voltage3", + "cpr_quotient1", + "cpr_quotient2", + "cpr_quotient3", + "cpr_ring_osc1", + "cpr_ring_osc2", + "cpr_ring_osc3", + "cpr_fuse_revision"; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/power/supply/cw2015_battery.txt b/Documentation/devicetree/bindings/power/supply/cw2015_battery.txt --- a/Documentation/devicetree/bindings/power/supply/cw2015_battery.txt +++ b/Documentation/devicetree/bindings/power/supply/cw2015_battery.txt @@ -0,0 +1,37 @@ +cw2015_battery +~~~~~~~~~~~~~~~~ + +The cellwise CW2015 is a shuntless single/multi-cell battery fuel gauge. + +Required properties : + - compatible : "cellwise,cw2015" + - cellwise,bat-config-info : Binary battery info blob + +Optional properties : + - cellwise,monitor-interval : Measurement interval in seconds + - cellwise,voltage-divider : Voltage divider for multi-cell packs, + specified as two integer values , in ohms. + - cellwise,virtual-power : Default to disconnected battery state (gauge in pack mode) + - cellwise,design-capacity : Design capacity of the battery cell in milliampere hours + - cellwise,alert-level : Low battery alarm level in percent + +Example: + cw2015@62 { + status = "okay"; + compatible = "cellwise,cw201x"; + reg = <0x62>; + cellwise,bat-config-info = < + 0x17 0x67 0x80 0x73 0x6E 0x6C 0x6B 0x63 + 0x77 0x51 0x5C 0x58 0x50 0x4C 0x48 0x36 + 0x15 0x0C 0x0C 0x19 0x5B 0x7D 0x6F 0x69 + 0x69 0x5B 0x0C 0x29 0x20 0x40 0x52 0x59 + 0x57 0x56 0x54 0x4F 0x3B 0x1F 0x7F 0x17 + 0x06 0x1A 0x30 0x5A 0x85 0x93 0x96 0x2D + 0x48 0x77 0x9C 0xB3 0x80 0x52 0x94 0xCB + 0x2F 0x00 0x64 0xA5 0xB5 0x11 0xF0 0x11 + >; + cellwise,monitor-interval = <5>; + cellwise,virtual-power; + cellwise,design-capacity = <9800>; + power-supplies = <&mains_charger>, <&usb_charger>; + } diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/ptp/ptp-ines.txt b/Documentation/devicetree/bindings/ptp/ptp-ines.txt --- a/Documentation/devicetree/bindings/ptp/ptp-ines.txt +++ b/Documentation/devicetree/bindings/ptp/ptp-ines.txt @@ -0,0 +1,35 @@ +ZHAW InES PTP time stamping IP core + +The IP core needs two different kinds of nodes. The control node +lives somewhere in the memory map and specifies the address of the +control registers. There can be up to three port handles placed as +attributes of PHY nodes. These associate a particular MII bus with a +port index within the IP core. + +Required properties of the control node: + +- compatible: "ines,ptp-ctrl" +- reg: physical address and size of the register bank + +Required format of the port handle within the PHY node: + +- timestamper: provides control node reference and + the port channel within the IP core + +Example: + + tstamper: timestamper@60000000 { + compatible = "ines,ptp-ctrl"; + reg = <0x60000000 0x80>; + }; + + ethernet@80000000 { + ... + mdio { + ... + ethernet-phy@3 { + ... + timestamper = <&tstamper 0>; + }; + }; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/ptp/timestamper.txt b/Documentation/devicetree/bindings/ptp/timestamper.txt --- a/Documentation/devicetree/bindings/ptp/timestamper.txt +++ b/Documentation/devicetree/bindings/ptp/timestamper.txt @@ -0,0 +1,42 @@ +Time stamps from MII bus snooping devices + +This binding supports non-PHY devices that snoop the MII bus and +provide time stamps. In contrast to PHY time stamping drivers (which +can simply attach their interface directly to the PHY instance), stand +alone MII time stamping drivers use this binding to specify the +connection between the snooping device and a given network interface. + +Non-PHY MII time stamping drivers typically talk to the control +interface over another bus like I2C, SPI, UART, or via a memory mapped +peripheral. This controller device is associated with one or more +time stamping channels, each of which snoops on a MII bus. + +The "timestamper" property lives in a phy node and links a time +stamping channel from the controller device to that phy's MII bus. + +Example: + + tstamper: timestamper@10000000 { + compatible = "ines,ptp-ctrl"; + reg = <0x10000000 0x80>; + }; + + ethernet@20000000 { + mdio { + ethernet-phy@1 { + timestamper = <&tstamper 0>; + }; + }; + }; + + ethernet@30000000 { + mdio { + ethernet-phy@2 { + timestamper = <&tstamper 1>; + }; + }; + }; + +In this example, time stamps from the MII bus attached to phy@1 will +appear on time stamp channel 0 (zero), and those from phy@2 appear on +channel 1. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/regulator/mp8859.txt b/Documentation/devicetree/bindings/regulator/mp8859.txt --- a/Documentation/devicetree/bindings/regulator/mp8859.txt +++ b/Documentation/devicetree/bindings/regulator/mp8859.txt @@ -0,0 +1,22 @@ +Monolithic Power Systems MP8859 voltage regulator + +Required properties: +- compatible: "mps,mp8859"; +- reg: I2C slave address. + +Optional subnode for regulator: "mp8859_dcdc", using common regulator +bindings given in . + +Example: + + mp8859: regulator@66 { + compatible = "mps,mp8859"; + reg = <0x66>; + dc_12v: mp8859_dcdc { + regulator-name = "dc_12v"; + regulator-min-microvolt = <12000000>; + regulator-max-microvolt = <12000000>; + regulator-boot-on; + regulator-always-on; + }; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml b/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml --- a/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml +++ b/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml @@ -0,0 +1,121 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/mps,mpq7920.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Monolithic Power System MPQ7920 PMIC + +maintainers: + - Saravanan Sekar + +properties: + $nodename: + pattern: "pmic@[0-9a-f]{1,2}" + compatible: + enum: + - mps,mpq7920 + + reg: + maxItems: 1 + + regulators: + type: object + allOf: + - $ref: regulator.yaml# + description: | + list of regulators provided by this controller, must be named + after their hardware counterparts BUCK[1-4], one LDORTC, and LDO[2-5] + + properties: + mps,switch-freq: + allOf: + - $ref: "/schemas/types.yaml#/definitions/uint8" + enum: [ 0, 1, 2, 3 ] + default: 2 + description: | + switching frequency must be one of following corresponding value + 1.1MHz, 1.65MHz, 2.2MHz, 2.75MHz + + patternProperties: + "^ldo[1-4]$": + type: object + allOf: + - $ref: regulator.yaml# + + "^ldortc$": + type: object + allOf: + - $ref: regulator.yaml# + + "^buck[1-4]$": + type: object + allOf: + - $ref: regulator.yaml# + + properties: + mps,buck-softstart: + allOf: + - $ref: "/schemas/types.yaml#/definitions/uint8" + enum: [ 0, 1, 2, 3 ] + description: | + defines the soft start time of this buck, must be one of the following + corresponding values 150us, 300us, 610us, 920us + + mps,buck-phase-delay: + allOf: + - $ref: "/schemas/types.yaml#/definitions/uint8" + enum: [ 0, 1, 2, 3 ] + description: | + defines the phase delay of this buck, must be one of the following + corresponding values 0deg, 90deg, 180deg, 270deg + + mps,buck-ovp-disable: + type: boolean + description: | + disables over voltage protection of this buck + + additionalProperties: false + additionalProperties: false + +required: + - compatible + - reg + - regulators + +additionalProperties: false + +examples: + - | + i2c { + #address-cells = <1>; + #size-cells = <0>; + + pmic@69 { + compatible = "mps,mpq7920"; + reg = <0x69>; + + regulators { + mps,switch-freq = /bits/ 8 <1>; + + buck1 { + regulator-name = "buck1"; + regulator-min-microvolt = <400000>; + regulator-max-microvolt = <3587500>; + regulator-min-microamp = <460000>; + regulator-max-microamp = <7600000>; + regulator-boot-on; + mps,buck-ovp-disable; + mps,buck-phase-delay = /bits/ 8 <2>; + mps,buck-softstart = /bits/ 8 <1>; + }; + + ldo2 { + regulator-name = "ldo2"; + regulator-min-microvolt = <650000>; + regulator-max-microvolt = <3587500>; + }; + }; + }; + }; +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/regulator/rohm,bd71828-regulator.yaml b/Documentation/devicetree/bindings/regulator/rohm,bd71828-regulator.yaml --- a/Documentation/devicetree/bindings/regulator/rohm,bd71828-regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/rohm,bd71828-regulator.yaml @@ -0,0 +1,107 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/rohm,bd71828-regulator.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: ROHM BD71828 Power Management Integrated Circuit regulators + +maintainers: + - Matti Vaittinen + +description: | + This module is part of the ROHM BD71828 MFD device. For more details + see Documentation/devicetree/bindings/mfd/rohm,bd71828-pmic.yaml. + + The regulator controller is represented as a sub-node of the PMIC node + on the device tree. + + Regulator nodes should be named to BUCK_ and LDO_. + The valid names for BD71828 regulator nodes are + BUCK1, BUCK2, BUCK3, BUCK4, BUCK5, BUCK6, BUCK7 + LDO1, LDO2, LDO3, LDO4, LDO5, LDO6, LDO7 + +patternProperties: + "^LDO[1-7]$": + type: object + allOf: + - $ref: regulator.yaml# + description: + Properties for single LDO regulator. + + properties: + regulator-name: + pattern: "^ldo[1-7]$" + description: + should be "ldo1", ..., "ldo7" + + "^BUCK[1-7]$": + type: object + allOf: + - $ref: regulator.yaml# + description: + Properties for single BUCK regulator. + + properties: + regulator-name: + pattern: "^buck[1-7]$" + description: + should be "buck1", ..., "buck7" + + rohm,dvs-run-voltage: + allOf: + - $ref: "/schemas/types.yaml#/definitions/uint32" + - minimum: 0 + maximum: 3300000 + description: + PMIC default "RUN" state voltage in uV. See below table for + bucks which support this. 0 means disabled. + + rohm,dvs-idle-voltage: + allOf: + - $ref: "/schemas/types.yaml#/definitions/uint32" + - minimum: 0 + maximum: 3300000 + description: + PMIC default "IDLE" state voltage in uV. See below table for + bucks which support this. 0 means disabled. + + rohm,dvs-suspend-voltage: + allOf: + - $ref: "/schemas/types.yaml#/definitions/uint32" + - minimum: 0 + maximum: 3300000 + description: + PMIC default "SUSPEND" state voltage in uV. See below table for + bucks which support this. 0 means disabled. + + rohm,dvs-lpsr-voltage: + allOf: + - $ref: "/schemas/types.yaml#/definitions/uint32" + - minimum: 0 + maximum: 3300000 + description: + PMIC default "LPSR" state voltage in uV. See below table for + bucks which support this. 0 means disabled. + + # Supported default DVS states: + # buck | run | idle | suspend | lpsr + #-------------------------------------------------------------- + # 1, 2, 6, and 7 | supported | supported | supported (*) + #-------------------------------------------------------------- + # 3, 4, and 5 | supported (**) + #-------------------------------------------------------------- + # + #(*) LPSR and SUSPEND states use same voltage but both states have own + # enable / + # disable settings. Voltage 0 can be specified for a state to make + # regulator disabled on that state. + # + #(**) All states use same voltage but have own enable / disable + # settings. Voltage 0 can be specified for a state to make + # regulator disabled on that state. + + required: + - regulator-name + additionalProperties: false +additionalProperties: false diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/regulator/st,stm32-booster.txt b/Documentation/devicetree/bindings/regulator/st,stm32-booster.txt --- a/Documentation/devicetree/bindings/regulator/st,stm32-booster.txt +++ b/Documentation/devicetree/bindings/regulator/st,stm32-booster.txt @@ -1,18 +0,0 @@ -STM32 BOOSTER - Booster for ADC analog input switches - -Some STM32 devices embed a 3.3V booster supplied by Vdda, that can be used -to supply ADC analog input switches. - -Required properties: -- compatible: Should be one of: - "st,stm32h7-booster" - "st,stm32mp1-booster" -- st,syscfg: Phandle to system configuration controller. -- vdda-supply: Phandle to the vdda input analog voltage. - -Example: - booster: regulator-booster { - compatible = "st,stm32mp1-booster"; - st,syscfg = <&syscfg>; - vdda-supply = <&vdda>; - }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/regulator/st,stm32-booster.yaml b/Documentation/devicetree/bindings/regulator/st,stm32-booster.yaml --- a/Documentation/devicetree/bindings/regulator/st,stm32-booster.yaml +++ b/Documentation/devicetree/bindings/regulator/st,stm32-booster.yaml @@ -0,0 +1,46 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/st,stm32-booster.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: STMicroelectronics STM32 booster for ADC analog input switches bindings + +maintainers: + - Fabrice Gasnier + +description: | + Some STM32 devices embed a 3.3V booster supplied by Vdda, that can be used + to supply ADC analog input switches. + +allOf: + - $ref: "regulator.yaml#" + +properties: + compatible: + enum: + - st,stm32h7-booster + - st,stm32mp1-booster + + st,syscfg: + allOf: + - $ref: "/schemas/types.yaml#/definitions/phandle-array" + description: phandle to system configuration controller. + + vdda-supply: + description: phandle to the vdda input analog voltage. + +required: + - compatible + - st,syscfg + - vdda-supply + +examples: + - | + regulator-booster { + compatible = "st,stm32mp1-booster"; + st,syscfg = <&syscfg>; + vdda-supply = <&vdda>; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/regulator/st,stm32-vrefbuf.txt b/Documentation/devicetree/bindings/regulator/st,stm32-vrefbuf.txt --- a/Documentation/devicetree/bindings/regulator/st,stm32-vrefbuf.txt +++ b/Documentation/devicetree/bindings/regulator/st,stm32-vrefbuf.txt @@ -1,20 +0,0 @@ -STM32 VREFBUF - Voltage reference buffer - -Some STM32 devices embed a voltage reference buffer which can be used as -voltage reference for ADCs, DACs and also as voltage reference for external -components through the dedicated VREF+ pin. - -Required properties: -- compatible: Must be "st,stm32-vrefbuf". -- reg: Offset and length of VREFBUF register set. -- clocks: Must contain an entry for peripheral clock. - -Example: - vrefbuf: regulator@58003c00 { - compatible = "st,stm32-vrefbuf"; - reg = <0x58003C00 0x8>; - clocks = <&rcc VREF_CK>; - regulator-min-microvolt = <1500000>; - regulator-max-microvolt = <2500000>; - vdda-supply = <&vdda>; - }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/regulator/st,stm32-vrefbuf.yaml b/Documentation/devicetree/bindings/regulator/st,stm32-vrefbuf.yaml --- a/Documentation/devicetree/bindings/regulator/st,stm32-vrefbuf.yaml +++ b/Documentation/devicetree/bindings/regulator/st,stm32-vrefbuf.yaml @@ -0,0 +1,52 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/st,stm32-vrefbuf.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: STMicroelectronics STM32 Voltage reference buffer bindings + +description: | + Some STM32 devices embed a voltage reference buffer which can be used as + voltage reference for ADCs, DACs and also as voltage reference for external + components through the dedicated VREF+ pin. + +maintainers: + - Fabrice Gasnier + +allOf: + - $ref: "regulator.yaml#" + +properties: + compatible: + const: st,stm32-vrefbuf + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + + vdda-supply: + description: phandle to the vdda input analog voltage. + +required: + - compatible + - reg + - clocks + - vdda-supply + +examples: + - | + #include + vrefbuf@50025000 { + compatible = "st,stm32-vrefbuf"; + reg = <0x50025000 0x8>; + regulator-min-microvolt = <1500000>; + regulator-max-microvolt = <2500000>; + clocks = <&rcc VREF>; + vdda-supply = <&vdda>; + }; + +... + diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.txt b/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.txt --- a/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.txt +++ b/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.txt @@ -1,43 +0,0 @@ -STM32MP1 PWR Regulators ------------------------ - -Available Regulators in STM32MP1 PWR block are: - - reg11 for regulator 1V1 - - reg18 for regulator 1V8 - - usb33 for the swtich USB3V3 - -Required properties: -- compatible: Must be "st,stm32mp1,pwr-reg" -- list of child nodes that specify the regulator reg11, reg18 or usb33 - initialization data for defined regulators. The definition for each of - these nodes is defined using the standard binding for regulators found at - Documentation/devicetree/bindings/regulator/regulator.txt. -- vdd-supply: phandle to the parent supply/regulator node for vdd input -- vdd_3v3_usbfs-supply: phandle to the parent supply/regulator node for usb33 - -Example: - -pwr_regulators: pwr@50001000 { - compatible = "st,stm32mp1,pwr-reg"; - reg = <0x50001000 0x10>; - vdd-supply = <&vdd>; - vdd_3v3_usbfs-supply = <&vdd_usb>; - - reg11: reg11 { - regulator-name = "reg11"; - regulator-min-microvolt = <1100000>; - regulator-max-microvolt = <1100000>; - }; - - reg18: reg18 { - regulator-name = "reg18"; - regulator-min-microvolt = <1800000>; - regulator-max-microvolt = <1800000>; - }; - - usb33: usb33 { - regulator-name = "usb33"; - regulator-min-microvolt = <3300000>; - regulator-max-microvolt = <3300000>; - }; -}; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.yaml b/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.yaml --- a/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.yaml +++ b/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.yaml @@ -0,0 +1,64 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/st,stm32mp1-pwr-reg.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: STM32MP1 PWR voltage regulators + +maintainers: + - Pascal Paillet + +properties: + compatible: + const: st,stm32mp1,pwr-reg + + reg: + maxItems: 1 + + vdd-supply: + description: Input supply phandle(s) for vdd input + + vdd_3v3_usbfs-supply: + description: Input supply phandle(s) for vdd_3v3_usbfs input + +patternProperties: + "^(reg11|reg18|usb33)$": + type: object + + allOf: + - $ref: "regulator.yaml#" + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + pwr@50001000 { + compatible = "st,stm32mp1,pwr-reg"; + reg = <0x50001000 0x10>; + vdd-supply = <&vdd>; + vdd_3v3_usbfs-supply = <&vdd_usb>; + + reg11 { + regulator-name = "reg11"; + regulator-min-microvolt = <1100000>; + regulator-max-microvolt = <1100000>; + }; + + reg18 { + regulator-name = "reg18"; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + }; + + usb33 { + regulator-name = "usb33"; + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + }; + }; +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/rng/brcm,iproc-rng200.txt b/Documentation/devicetree/bindings/rng/brcm,iproc-rng200.txt --- a/Documentation/devicetree/bindings/rng/brcm,iproc-rng200.txt +++ b/Documentation/devicetree/bindings/rng/brcm,iproc-rng200.txt @@ -2,6 +2,7 @@ Required properties: - compatible : Must be one of: + "brcm,bcm2711-rng200" "brcm,bcm7211-rng200" "brcm,bcm7278-rng200" "brcm,iproc-rng200" diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/slimbus/bus.txt b/Documentation/devicetree/bindings/slimbus/bus.txt --- a/Documentation/devicetree/bindings/slimbus/bus.txt +++ b/Documentation/devicetree/bindings/slimbus/bus.txt @@ -32,6 +32,10 @@ Product Code, shall be in lower case hexadecimal with leading zeroes suppressed +Optional property for SLIMbus child node if it is present: +- slim-ifc-dev - Should be phandle to SLIMBus Interface device. + Required for devices which deal with streams. + SLIMbus example for Qualcomm's slimbus manager component: slim@28080000 { @@ -43,8 +47,14 @@ #address-cells = <2>; #size-cell = <0>; + codec_ifd: ifd@0,0{ + compatible = "slim217,60"; + reg = <0 0>; + }; + codec: wcd9310@1,0{ compatible = "slim217,60"; reg = <1 0>; + slim-ifc-dev = <&codec_ifd>; }; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/soc/rockchip/rockchip-pm-config.txt b/Documentation/devicetree/bindings/soc/rockchip/rockchip-pm-config.txt --- a/Documentation/devicetree/bindings/soc/rockchip/rockchip-pm-config.txt +++ b/Documentation/devicetree/bindings/soc/rockchip/rockchip-pm-config.txt @@ -0,0 +1,39 @@ +* the suspend mode config + +- compatible: "rockchip,pm-config" + Compatibility with rk3399 + +- rockchip,sleep-mode-config : the sleep mode config, + ARMOFF, OSC disabled ... + +- rockchip,wakeup-config: the wake up sourece enable. + GPIO, USB, SD... + +- rockchip,pwm-regulator-config: the pwm regulator name. + +Example: + rockchip_suspend: rockchip_suspend { + compatible = "rockchip,pm-rk3399"; + status = "okay"; + rockchip,sleep-mode-config = < + (0 + | RKPM_SLP_ARMPD + | RKPM_SLP_PERILPPD + | RKPM_SLP_DDR_RET + | RKPM_SLP_PLLPD + | RKPM_SLP_OSC_DIS + | RKPM_SLP_CENTER_PD + | RKPM_SLP_AP_PWROFF + ) + >; + rockchip,wakeup-config = < + (0 | + RKPM_GPIO_WKUP_EN | + RKPM_PWM_WKUP_EN) + >; + rockchip,pwm-regulator-config = < + (0 | + PWM2_REGULATOR_EN + ) + >; + }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt b/Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt --- a/Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt +++ b/Documentation/devicetree/bindings/soc/ti/k3-ringacc.txt @@ -0,0 +1,59 @@ +* Texas Instruments K3 NavigatorSS Ring Accelerator + +The Ring Accelerator (RA) is a machine which converts read/write accesses +from/to a constant address into corresponding read/write accesses from/to a +circular data structure in memory. The RA eliminates the need for each DMA +controller which needs to access ring elements from having to know the current +state of the ring (base address, current offset). The DMA controller +performs a read or write access to a specific address range (which maps to the +source interface on the RA) and the RA replaces the address for the transaction +with a new address which corresponds to the head or tail element of the ring +(head for reads, tail for writes). + +The Ring Accelerator is a hardware module that is responsible for accelerating +management of the packet queues. The K3 SoCs can have more than one RA instances + +Required properties: +- compatible : Must be "ti,am654-navss-ringacc"; +- reg : Should contain register location and length of the following + named register regions. +- reg-names : should be + "rt" - The RA Ring Real-time Control/Status Registers + "fifos" - The RA Queues Registers + "proxy_gcfg" - The RA Proxy Global Config Registers + "proxy_target" - The RA Proxy Datapath Registers +- ti,num-rings : Number of rings supported by RA +- ti,sci-rm-range-gp-rings : TI-SCI RM subtype for GP ring range +- ti,sci : phandle on TI-SCI compatible System controller node +- ti,sci-dev-id : TI-SCI device id of the ring accelerator +- msi-parent : phandle for "ti,sci-inta" interrupt controller + +Optional properties: + -- ti,dma-ring-reset-quirk : enable ringacc / udma ring state interoperability + issue software w/a + +Example: + +ringacc: ringacc@3c000000 { + compatible = "ti,am654-navss-ringacc"; + reg = <0x0 0x3c000000 0x0 0x400000>, + <0x0 0x38000000 0x0 0x400000>, + <0x0 0x31120000 0x0 0x100>, + <0x0 0x33000000 0x0 0x40000>; + reg-names = "rt", "fifos", + "proxy_gcfg", "proxy_target"; + ti,num-rings = <818>; + ti,sci-rm-range-gp-rings = <0x2>; /* GP ring range */ + ti,dma-ring-reset-quirk; + ti,sci = <&dmsc>; + ti,sci-dev-id = <187>; + msi-parent = <&inta_main_udmass>; +}; + +client: + +dma_ipx: dma_ipx@ { + ... + ti,ringacc = <&ringacc>; + ... +} diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/amlogic,axg-fifo.txt b/Documentation/devicetree/bindings/sound/amlogic,axg-fifo.txt --- a/Documentation/devicetree/bindings/sound/amlogic,axg-fifo.txt +++ b/Documentation/devicetree/bindings/sound/amlogic,axg-fifo.txt @@ -17,6 +17,9 @@ * "arb" : memory ARB line (required) * "rst" : dedicated device reset line (optional) - #sound-dai-cells: must be 0. +- amlogic,fifo-depth: The size of the controller's fifo in bytes. This + is useful for determining certain configuration such + as the flush threshold of the fifo Example of FRDDR A on the A113 SoC: @@ -27,4 +30,5 @@ interrupts = ; clocks = <&clkc_audio AUD_CLKID_FRDDR_A>; resets = <&arb AXG_ARB_FRDDR_A>; + fifo-depth = <512>; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/fsl,asrc.txt b/Documentation/devicetree/bindings/sound/fsl,asrc.txt --- a/Documentation/devicetree/bindings/sound/fsl,asrc.txt +++ b/Documentation/devicetree/bindings/sound/fsl,asrc.txt @@ -8,7 +8,12 @@ Required properties: - - compatible : Contains "fsl,imx35-asrc" or "fsl,imx53-asrc". + - compatible : Compatible list, should contain one of the following + compatibles: + "fsl,imx35-asrc", + "fsl,imx53-asrc", + "fsl,imx8qm-asrc", + "fsl,imx8qxp-asrc", - reg : Offset and length of the register set for the device. @@ -35,6 +40,11 @@ - fsl,asrc-width : Defines a mutual sample width used by DPCM Back Ends. + - fsl,asrc-clk-map : Defines clock map used in driver. which is required + by imx8qm/imx8qxp platform + <0> - select the map for asrc0 in imx8qm/imx8qxp + <1> - select the map for asrc1 in imx8qm/imx8qxp + Optional properties: - big-endian : If this property is absent, the little endian mode diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/gtm601.txt b/Documentation/devicetree/bindings/sound/gtm601.txt --- a/Documentation/devicetree/bindings/sound/gtm601.txt +++ b/Documentation/devicetree/bindings/sound/gtm601.txt @@ -1,10 +1,16 @@ GTM601 UMTS modem audio interface CODEC -This device has no configuration interface. Sample rate is fixed - 8kHz. +This device has no configuration interface. The sample rate and channels are +based on the compatible string + "option,gtm601" = 8kHz mono + "broadmobi,bm818" = 48KHz stereo Required properties: - - compatible : "option,gtm601" + - compatible : one of + "option,gtm601" + "broadmobi,bm818" + Example: diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/ingenic,codec.yaml b/Documentation/devicetree/bindings/sound/ingenic,codec.yaml --- a/Documentation/devicetree/bindings/sound/ingenic,codec.yaml +++ b/Documentation/devicetree/bindings/sound/ingenic,codec.yaml @@ -0,0 +1,55 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/sound/ingenic,codec.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Ingenic JZ47xx internal codec DT bindings + +maintainers: + - Paul Cercueil + +properties: + $nodename: + pattern: '^audio-codec@.*' + + compatible: + oneOf: + - const: ingenic,jz4770-codec + - const: ingenic,jz4725b-codec + - const: ingenic,jz4740-codec + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + + clock-names: + items: + - const: aic + + '#sound-dai-cells': + const: 0 + +additionalProperties: false + +required: + - compatible + - reg + - clocks + - clock-names + - '#sound-dai-cells' + +examples: + - | + #include + codec: audio-codec@10020080 { + compatible = "ingenic,jz4740-codec"; + reg = <0x10020080 0x8>; + #sound-dai-cells = <0>; + clocks = <&cgu JZ4740_CLK_AIC>; + clock-names = "aic"; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/ingenic,jz4725b-codec.txt b/Documentation/devicetree/bindings/sound/ingenic,jz4725b-codec.txt --- a/Documentation/devicetree/bindings/sound/ingenic,jz4725b-codec.txt +++ b/Documentation/devicetree/bindings/sound/ingenic,jz4725b-codec.txt @@ -1,20 +0,0 @@ -Ingenic JZ4725B codec controller - -Required properties: -- compatible : "ingenic,jz4725b-codec" -- reg : codec registers location and length -- clocks : phandle to the AIC clock. -- clock-names: must be set to "aic". -- #sound-dai-cells: Must be set to 0. - -Example: - -codec: audio-codec@100200a4 { - compatible = "ingenic,jz4725b-codec"; - reg = <0x100200a4 0x8>; - - #sound-dai-cells = <0>; - - clocks = <&cgu JZ4725B_CLK_AIC>; - clock-names = "aic"; -}; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/ingenic,jz4740-codec.txt b/Documentation/devicetree/bindings/sound/ingenic,jz4740-codec.txt --- a/Documentation/devicetree/bindings/sound/ingenic,jz4740-codec.txt +++ b/Documentation/devicetree/bindings/sound/ingenic,jz4740-codec.txt @@ -1,20 +0,0 @@ -Ingenic JZ4740 codec controller - -Required properties: -- compatible : "ingenic,jz4740-codec" -- reg : codec registers location and length -- clocks : phandle to the AIC clock. -- clock-names: must be set to "aic". -- #sound-dai-cells: Must be set to 0. - -Example: - -codec: audio-codec@10020080 { - compatible = "ingenic,jz4740-codec"; - reg = <0x10020080 0x8>; - - #sound-dai-cells = <0>; - - clocks = <&cgu JZ4740_CLK_AIC>; - clock-names = "aic"; -}; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/qcom,sdm845.txt b/Documentation/devicetree/bindings/sound/qcom,sdm845.txt --- a/Documentation/devicetree/bindings/sound/qcom,sdm845.txt +++ b/Documentation/devicetree/bindings/sound/qcom,sdm845.txt @@ -5,7 +5,10 @@ - compatible: Usage: required Value type: - Definition: must be "qcom,sdm845-sndcard" + Definition: must be one of this + "qcom,sdm845-sndcard" + "qcom,db845c-sndcard" + "lenovo,yoga-c630-sndcard" - audio-routing: Usage: Optional diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/qcom,wcd934x.yaml b/Documentation/devicetree/bindings/sound/qcom,wcd934x.yaml --- a/Documentation/devicetree/bindings/sound/qcom,wcd934x.yaml +++ b/Documentation/devicetree/bindings/sound/qcom,wcd934x.yaml @@ -0,0 +1,175 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/sound/qcom,wcd934x.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Bindings for Qualcomm WCD9340/WCD9341 Audio Codec + +maintainers: + - Srinivas Kandagatla + +description: | + Qualcomm WCD9340/WCD9341 Codec is a standalone Hi-Fi audio codec IC. + It has in-built Soundwire controller, pin controller, interrupt mux and + supports both I2S/I2C and SLIMbus audio interfaces. + +properties: + compatible: + const: slim217,250 + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + reset-gpios: + description: GPIO spec for reset line to use + maxItems: 1 + + slim-ifc-dev: true + + clocks: + maxItems: 1 + + clock-names: + const: extclk + + vdd-buck-supply: + description: A reference to the 1.8V buck supply + + vdd-buck-sido-supply: + description: A reference to the 1.8V SIDO buck supply + + vdd-rx-supply: + description: A reference to the 1.8V rx supply + + vdd-tx-supply: + description: A reference to the 1.8V tx supply + + vdd-vbat-supply: + description: A reference to the vbat supply + + vdd-io-supply: + description: A reference to the 1.8V I/O supply + + vdd-micbias-supply: + description: A reference to the micbias supply + + qcom,micbias1-microvolt: + description: micbias1 voltage + minimum: 1800000 + maximum: 2850000 + + qcom,micbias2-microvolt: + description: micbias2 voltage + minimum: 1800000 + maximum: 2850000 + + qcom,micbias3-microvolt: + description: micbias3 voltage + minimum: 1800000 + maximum: 2850000 + + qcom,micbias4-microvolt: + description: micbias4 voltage + minimum: 1800000 + maximum: 2850000 + + clock-output-names: + const: mclk + + clock-frequency: + description: Clock frequency of output clk in Hz + + interrupt-controller: true + + '#interrupt-cells': + const: 1 + + '#clock-cells': + const: 0 + + '#sound-dai-cells': + const: 1 + + "#address-cells": + const: 1 + + "#size-cells": + const: 1 + + gpio@42: + type: object + allOf: + - $ref: ../gpio/qcom,wcd934x-gpio.yaml# + +patternProperties: + "^.*@[0-9a-f]+$": + type: object + description: | + WCD934x subnode for each slave devices. Bindings of each subnodes + depends on the specific driver providing the functionality and + documented in their respective bindings. + + properties: + reg: + maxItems: 1 + + required: + - reg + +required: + - compatible + - reg + - reset-gpios + - slim-ifc-dev + - interrupts + - interrupt-controller + - clock-frequency + - clock-output-names + - qcom,micbias1-microvolt + - qcom,micbias2-microvolt + - qcom,micbias3-microvolt + - qcom,micbias4-microvolt + - "#interrupt-cells" + - "#clock-cells" + - "#sound-dai-cells" + - "#address-cells" + - "#size-cells" + +examples: + - | + codec@1,0{ + compatible = "slim217,250"; + reg = <1 0>; + reset-gpios = <&tlmm 64 0>; + slim-ifc-dev = <&wcd9340_ifd>; + #sound-dai-cells = <1>; + interrupt-parent = <&tlmm>; + interrupts = <54 4>; + interrupt-controller; + #interrupt-cells = <1>; + #clock-cells = <0>; + clock-frequency = <9600000>; + clock-output-names = "mclk"; + qcom,micbias1-microvolt = <1800000>; + qcom,micbias2-microvolt = <1800000>; + qcom,micbias3-microvolt = <1800000>; + qcom,micbias4-microvolt = <1800000>; + clock-names = "extclk"; + clocks = <&rpmhcc 2>; + + #address-cells = <1>; + #size-cells = <1>; + + gpio@42 { + compatible = "qcom,wcd9340-gpio"; + reg = <0x42 0x2>; + gpio-controller; + #gpio-cells = <2>; + }; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/qcom,wsa881x.yaml b/Documentation/devicetree/bindings/sound/qcom,wsa881x.yaml --- a/Documentation/devicetree/bindings/sound/qcom,wsa881x.yaml +++ b/Documentation/devicetree/bindings/sound/qcom,wsa881x.yaml @@ -0,0 +1,68 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/sound/qcom,wsa881x.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Bindings for Qualcomm WSA8810/WSA8815 Class-D Smart Speaker Amplifier + +maintainers: + - Srinivas Kandagatla + +description: | + WSA8810 is a class-D smart speaker amplifier and WSA8815 + is a high-output power class-D smart speaker amplifier. + Their primary operating mode uses a SoundWire digital audio + interface. This binding is for SoundWire interface. + +properties: + compatible: + const: sdw10217201000 + + reg: + maxItems: 1 + + powerdown-gpios: + description: GPIO spec for Powerdown/Shutdown line to use + maxItems: 1 + + '#thermal-sensor-cells': + const: 0 + + '#sound-dai-cells': + const: 0 + +required: + - compatible + - reg + - powerdown-gpios + - "#thermal-sensor-cells" + - "#sound-dai-cells" + +additionalProperties: false + +examples: + - | + soundwire@c2d0000 { + #address-cells = <2>; + #size-cells = <0>; + reg = <0x0c2d0000 0x2000>; + + speaker@0,1 { + compatible = "sdw10217201000"; + reg = <0 1>; + powerdown-gpios = <&wcdpinctrl 2 0>; + #thermal-sensor-cells = <0>; + #sound-dai-cells = <0>; + }; + + speaker@0,2 { + compatible = "sdw10217201000"; + reg = <0 2>; + powerdown-gpios = <&wcdpinctrl 2 0>; + #thermal-sensor-cells = <0>; + #sound-dai-cells = <0>; + }; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/rt1015.txt b/Documentation/devicetree/bindings/sound/rt1015.txt --- a/Documentation/devicetree/bindings/sound/rt1015.txt +++ b/Documentation/devicetree/bindings/sound/rt1015.txt @@ -0,0 +1,17 @@ +RT1015 Mono Class D Audio Amplifier + +This device supports I2C only. + +Required properties: + +- compatible : "realtek,rt1015". + +- reg : The I2C address of the device. + + +Example: + +rt1015: codec@28 { + compatible = "realtek,rt1015"; + reg = <0x28>; +}; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/sound/rt5645.txt b/Documentation/devicetree/bindings/sound/rt5645.txt --- a/Documentation/devicetree/bindings/sound/rt5645.txt +++ b/Documentation/devicetree/bindings/sound/rt5645.txt @@ -10,6 +10,10 @@ - interrupts : The CODEC's interrupt output. +- avdd-supply: Power supply for AVDD, providing 1.8V. + +- cpvdd-supply: Power supply for CPVDD, providing 3.5V. + Optional properties: - hp-detect-gpios: diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/soundwire/qcom,sdw.txt b/Documentation/devicetree/bindings/soundwire/qcom,sdw.txt --- a/Documentation/devicetree/bindings/soundwire/qcom,sdw.txt +++ b/Documentation/devicetree/bindings/soundwire/qcom,sdw.txt @@ -0,0 +1,167 @@ +Qualcomm SoundWire Controller Bindings + + +This binding describes the Qualcomm SoundWire Controller along with its +board specific bus parameters. + +- compatible: + Usage: required + Value type: + Definition: must be "qcom,soundwire-v..", + Example: + "qcom,soundwire-v1.3.0" + "qcom,soundwire-v1.5.0" + "qcom,soundwire-v1.6.0" +- reg: + Usage: required + Value type: + Definition: the base address and size of SoundWire controller + address space. + +- interrupts: + Usage: required + Value type: + Definition: should specify the SoundWire Controller IRQ + +- clock-names: + Usage: required + Value type: + Definition: should be "iface" for SoundWire Controller interface clock + +- clocks: + Usage: required + Value type: + Definition: should specify the SoundWire Controller interface clock + +- #sound-dai-cells: + Usage: required + Value type: + Definition: must be 1 for digital audio interfaces on the controller. + +- qcom,dout-ports: + Usage: required + Value type: + Definition: must be count of data out ports + +- qcom,din-ports: + Usage: required + Value type: + Definition: must be count of data in ports + +- qcom,ports-offset1: + Usage: required + Value type: + Definition: should specify payload transport window offset1 of each + data port. Out ports followed by In ports. + More info in MIPI Alliance SoundWire 1.0 Specifications. + +- qcom,ports-offset2: + Usage: required + Value type: + Definition: should specify payload transport window offset2 of each + data port. Out ports followed by In ports. + More info in MIPI Alliance SoundWire 1.0 Specifications. + +- qcom,ports-sinterval-low: + Usage: required + Value type: + Definition: should be sample interval low of each data port. + Out ports followed by In ports. Used for Sample Interval + calculation. + More info in MIPI Alliance SoundWire 1.0 Specifications. + +- qcom,ports-word-length: + Usage: optional + Value type: + Definition: should be size of payload channel sample. + More info in MIPI Alliance SoundWire 1.0 Specifications. + +- qcom,ports-block-pack-mode: + Usage: optional + Value type: + Definition: should be 0 or 1 to indicate the block packing mode. + 0 to indicate Blocks are per Channel + 1 to indicate Blocks are per Port. + Out ports followed by In ports. + More info in MIPI Alliance SoundWire 1.0 Specifications. + +- qcom,ports-block-group-count: + Usage: optional + Value type: + Definition: should be in range 1 to 4 to indicate how many sample + intervals are combined into a payload. + Out ports followed by In ports. + More info in MIPI Alliance SoundWire 1.0 Specifications. + +- qcom,ports-lane-control: + Usage: optional + Value type: + Definition: should be in range 0 to 7 to identify which data lane + the data port uses. + Out ports followed by In ports. + More info in MIPI Alliance SoundWire 1.0 Specifications. + +- qcom,ports-hstart: + Usage: optional + Value type: + Definition: should be number identifying lowerst numbered coloum in + SoundWire Frame, i.e. left edge of the Transport sub-frame + for each port. Values between 0 and 15 are valid. + Out ports followed by In ports. + More info in MIPI Alliance SoundWire 1.0 Specifications. + +- qcom,ports-hstop: + Usage: optional + Value type: + Definition: should be number identifying highest numbered coloum in + SoundWire Frame, i.e. the right edge of the Transport + sub-frame for each port. Values between 0 and 15 are valid. + Out ports followed by In ports. + More info in MIPI Alliance SoundWire 1.0 Specifications. + +- qcom,dports-type: + Usage: optional + Value type: + Definition: should be one of the following types + 0 for reduced port + 1 for simple ports + 2 for full port + Out ports followed by In ports. + More info in MIPI Alliance SoundWire 1.0 Specifications. + +Note: + More Information on detail of encoding of these fields can be +found in MIPI Alliance SoundWire 1.0 Specifications. + += SoundWire devices +Each subnode of the bus represents SoundWire device attached to it. +The properties of these nodes are defined by the individual bindings. + += EXAMPLE +The following example represents a SoundWire controller on DB845c board +which has controller integrated inside WCD934x codec on SDM845 SoC. + +soundwire: soundwire@c85 { + compatible = "qcom,soundwire-v1.3.0"; + reg = <0xc85 0x20>; + interrupts = <20 IRQ_TYPE_EDGE_RISING>; + clocks = <&wcc>; + clock-names = "iface"; + #sound-dai-cells = <1>; + qcom,dports-type = <0>; + qcom,dout-ports = <6>; + qcom,din-ports = <2>; + qcom,ports-sinterval-low = /bits/ 8 <0x07 0x1F 0x3F 0x7 0x1F 0x3F 0x0F 0x0F>; + qcom,ports-offset1 = /bits/ 8 <0x01 0x02 0x0C 0x6 0x12 0x0D 0x07 0x0A >; + qcom,ports-offset2 = /bits/ 8 <0x00 0x00 0x1F 0x00 0x00 0x1F 0x00 0x00>; + + /* Left Speaker */ + left{ + .... + }; + + /* Right Speaker */ + right{ + .... + }; +}; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/soundwire/soundwire-controller.yaml b/Documentation/devicetree/bindings/soundwire/soundwire-controller.yaml --- a/Documentation/devicetree/bindings/soundwire/soundwire-controller.yaml +++ b/Documentation/devicetree/bindings/soundwire/soundwire-controller.yaml @@ -69,6 +69,7 @@ reg = <0 1>; powerdown-gpios = <&wcdpinctrl 2 0>; #thermal-sensor-cells = <0>; + #sound-dai-cells = <0>; }; speaker@0,2 { @@ -76,6 +77,7 @@ reg = <0 2>; powerdown-gpios = <&wcdpinctrl 2 0>; #thermal-sensor-cells = <0>; + #sound-dai-cells = <0>; }; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/spi/nuvoton,npcm-pspi.txt b/Documentation/devicetree/bindings/spi/nuvoton,npcm-pspi.txt --- a/Documentation/devicetree/bindings/spi/nuvoton,npcm-pspi.txt +++ b/Documentation/devicetree/bindings/spi/nuvoton,npcm-pspi.txt @@ -12,6 +12,7 @@ - clock-names: Should be "clk_apb5". - pinctrl-names : a pinctrl state named "default" must be defined. - pinctrl-0 : phandle referencing pin configuration of the device. + - resets : phandle to the reset control for this device. - cs-gpios: Specifies the gpio pins to be used for chipselects. See: Documentation/devicetree/bindings/spi/spi-bus.txt @@ -19,16 +20,6 @@ - clock-frequency : Input clock frequency to the PSPI block in Hz. Default is 25000000 Hz. -Aliases: -- All the SPI controller nodes should be represented in the aliases node using - the following format 'spi{n}' withe the correct numbered in "aliases" node. - -Example: - -aliases { - spi0 = &spi0; -}; - spi0: spi@f0200000 { compatible = "nuvoton,npcm750-pspi"; reg = <0xf0200000 0x1000>; @@ -39,5 +30,6 @@ interrupts = ; clocks = <&clk NPCM7XX_CLK_APB5>; clock-names = "clk_apb5"; + resets = <&rstc NPCM7XX_RESET_IPSRST2 NPCM7XX_RESET_PSPI1> cs-gpios = <&gpio6 11 GPIO_ACTIVE_LOW>; }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/spi/spi-stm32.txt b/Documentation/devicetree/bindings/spi/spi-stm32.txt --- a/Documentation/devicetree/bindings/spi/spi-stm32.txt +++ b/Documentation/devicetree/bindings/spi/spi-stm32.txt @@ -1,62 +0,0 @@ -STMicroelectronics STM32 SPI Controller - -The STM32 SPI controller is used to communicate with external devices using -the Serial Peripheral Interface. It supports full-duplex, half-duplex and -simplex synchronous serial communication with external devices. It supports -from 4 to 32-bit data size. Although it can be configured as master or slave, -only master is supported by the driver. - -Required properties: -- compatible: Should be one of: - "st,stm32h7-spi" - "st,stm32f4-spi" -- reg: Offset and length of the device's register set. -- interrupts: Must contain the interrupt id. -- clocks: Must contain an entry for spiclk (which feeds the internal clock - generator). -- #address-cells: Number of cells required to define a chip select address. -- #size-cells: Should be zero. - -Optional properties: -- resets: Must contain the phandle to the reset controller. -- A pinctrl state named "default" may be defined to set pins in mode of - operation for SPI transfer. -- dmas: DMA specifiers for tx and rx dma. DMA fifo mode must be used. See the - STM32 DMA bindings, Documentation/devicetree/bindings/dma/stm32-dma.txt. -- dma-names: DMA request names should include "tx" and "rx" if present. -- cs-gpios: list of GPIO chip selects. See the SPI bus bindings, - Documentation/devicetree/bindings/spi/spi-bus.txt - - -Child nodes represent devices on the SPI bus - See ../spi/spi-bus.txt - -Optional properties: -- st,spi-midi-ns: Only for STM32H7, (Master Inter-Data Idleness) minimum time - delay in nanoseconds inserted between two consecutive data - frames. - - -Example: - spi2: spi@40003800 { - #address-cells = <1>; - #size-cells = <0>; - compatible = "st,stm32h7-spi"; - reg = <0x40003800 0x400>; - interrupts = <36>; - clocks = <&rcc SPI2_CK>; - resets = <&rcc 1166>; - dmas = <&dmamux1 0 39 0x400 0x01>, - <&dmamux1 1 40 0x400 0x01>; - dma-names = "rx", "tx"; - pinctrl-0 = <&spi2_pins_b>; - pinctrl-names = "default"; - cs-gpios = <&gpioa 11 0>; - - aardvark@0 { - compatible = "totalphase,aardvark"; - reg = <0>; - spi-max-frequency = <4000000>; - st,spi-midi-ns = <4000>; - }; - }; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/spi/spi_atmel.txt b/Documentation/devicetree/bindings/spi/spi_atmel.txt --- a/Documentation/devicetree/bindings/spi/spi_atmel.txt +++ b/Documentation/devicetree/bindings/spi/spi_atmel.txt @@ -1,7 +1,7 @@ Atmel SPI device Required properties: -- compatible : should be "atmel,at91rm9200-spi". +- compatible : should be "atmel,at91rm9200-spi" or "microchip,sam9x60-spi". - reg: Address and length of the register set for the device - interrupts: Should contain spi interrupt - cs-gpios: chipselects (optional for SPI controller version >= 2 with the diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/spi/st,stm32-spi.yaml b/Documentation/devicetree/bindings/spi/st,stm32-spi.yaml --- a/Documentation/devicetree/bindings/spi/st,stm32-spi.yaml +++ b/Documentation/devicetree/bindings/spi/st,stm32-spi.yaml @@ -0,0 +1,105 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/spi/st,stm32-spi.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: STMicroelectronics STM32 SPI Controller bindings + +description: | + The STM32 SPI controller is used to communicate with external devices using + the Serial Peripheral Interface. It supports full-duplex, half-duplex and + simplex synchronous serial communication with external devices. It supports + from 4 to 32-bit data size. + +maintainers: + - Erwan Leray + - Fabrice Gasnier + +allOf: + - $ref: "spi-controller.yaml#" + - if: + properties: + compatible: + contains: + const: st,stm32f4-spi + + then: + properties: + st,spi-midi-ns: false + +properties: + compatible: + enum: + - st,stm32f4-spi + - st,stm32h7-spi + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + + interrupts: + maxItems: 1 + + resets: + maxItems: 1 + + dmas: + description: | + DMA specifiers for tx and rx dma. DMA fifo mode must be used. See + the STM32 DMA bindings Documentation/devicetree/bindings/dma/stm32-dma.txt. + items: + - description: rx DMA channel + - description: tx DMA channel + + dma-names: + items: + - const: rx + - const: tx + +patternProperties: + "^[a-zA-Z][a-zA-Z0-9,+\\-._]{0,63}@[0-9a-f]+$": + type: object + # SPI slave nodes must be children of the SPI master node and can + # contain the following properties. + properties: + st,spi-midi-ns: + description: | + Only for STM32H7, (Master Inter-Data Idleness) minimum time + delay in nanoseconds inserted between two consecutive data frames. + +required: + - compatible + - reg + - clocks + - interrupts + +examples: + - | + #include + #include + #include + spi@4000b000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "st,stm32h7-spi"; + reg = <0x4000b000 0x400>; + interrupts = ; + clocks = <&rcc SPI2_K>; + resets = <&rcc SPI2_R>; + dmas = <&dmamux1 0 39 0x400 0x05>, + <&dmamux1 1 40 0x400 0x05>; + dma-names = "rx", "tx"; + cs-gpios = <&gpioa 11 0>; + + aardvark@0 { + compatible = "totalphase,aardvark"; + reg = <0>; + spi-max-frequency = <4000000>; + st,spi-midi-ns = <4000>; + }; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/thermal/allwinner,sun8i-a83t-ths.yaml b/Documentation/devicetree/bindings/thermal/allwinner,sun8i-a83t-ths.yaml --- a/Documentation/devicetree/bindings/thermal/allwinner,sun8i-a83t-ths.yaml +++ b/Documentation/devicetree/bindings/thermal/allwinner,sun8i-a83t-ths.yaml @@ -0,0 +1,160 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/thermal/allwinner,sun8i-a83t-ths.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Allwinner SUN8I Thermal Controller Device Tree Bindings + +maintainers: + - Vasily Khoruzhick + - Yangtao Li + +properties: + compatible: + enum: + - allwinner,sun8i-a83t-ths + - allwinner,sun8i-h3-ths + - allwinner,sun8i-r40-ths + - allwinner,sun50i-a64-ths + - allwinner,sun50i-h5-ths + - allwinner,sun50i-h6-ths + + clocks: + minItems: 1 + maxItems: 2 + items: + - description: Bus Clock + - description: Module Clock + + clock-names: + minItems: 1 + maxItems: 2 + items: + - const: bus + - const: mod + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + resets: + maxItems: 1 + + nvmem-cells: + maxItems: 1 + description: Calibration data for thermal sensors + + nvmem-cell-names: + const: calibration + + # See ./thermal.txt for details + "#thermal-sensor-cells": + enum: + - 0 + - 1 + +allOf: + - if: + properties: + compatible: + contains: + const: allwinner,sun50i-h6-ths + + then: + properties: + clocks: + maxItems: 1 + + clock-names: + maxItems: 1 + + else: + properties: + clocks: + minItems: 2 + + clock-names: + minItems: 2 + + - if: + properties: + compatible: + contains: + const: allwinner,sun8i-h3-ths + + then: + properties: + "#thermal-sensor-cells": + const: 0 + + else: + properties: + "#thermal-sensor-cells": + const: 1 + + - if: + properties: + compatible: + contains: + enum: + - const: allwinner,sun8i-h3-ths + - const: allwinner,sun8i-r40-ths + - const: allwinner,sun50i-a64-ths + - const: allwinner,sun50i-h5-ths + - const: allwinner,sun50i-h6-ths + + then: + required: + - clocks + - clock-names + - resets + +required: + - compatible + - reg + - interrupts + - '#thermal-sensor-cells' + +additionalProperties: false + +examples: + - | + thermal-sensor@1f04000 { + compatible = "allwinner,sun8i-a83t-ths"; + reg = <0x01f04000 0x100>; + interrupts = <0 31 0>; + nvmem-cells = <&ths_calibration>; + nvmem-cell-names = "calibration"; + #thermal-sensor-cells = <1>; + }; + + - | + thermal-sensor@1c25000 { + compatible = "allwinner,sun8i-h3-ths"; + reg = <0x01c25000 0x400>; + clocks = <&ccu 0>, <&ccu 1>; + clock-names = "bus", "mod"; + resets = <&ccu 2>; + interrupts = <0 31 0>; + nvmem-cells = <&ths_calibration>; + nvmem-cell-names = "calibration"; + #thermal-sensor-cells = <0>; + }; + + - | + thermal-sensor@5070400 { + compatible = "allwinner,sun50i-h6-ths"; + reg = <0x05070400 0x100>; + clocks = <&ccu 0>; + clock-names = "bus"; + resets = <&ccu 2>; + interrupts = <0 15 0>; + nvmem-cells = <&ths_calibration>; + nvmem-cell-names = "calibration"; + #thermal-sensor-cells = <1>; + }; + +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/thermal/brcm,avs-ro-thermal.yaml b/Documentation/devicetree/bindings/thermal/brcm,avs-ro-thermal.yaml --- a/Documentation/devicetree/bindings/thermal/brcm,avs-ro-thermal.yaml +++ b/Documentation/devicetree/bindings/thermal/brcm,avs-ro-thermal.yaml @@ -0,0 +1,48 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/thermal/brcm,avs-ro-thermal.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Broadcom AVS ring oscillator thermal + +maintainers: + - Stefan Wahren + +description: |+ + The thermal node should be the child of a syscon node with the + required property: + + - compatible: Should be one of the following: + "brcm,bcm2711-avs-monitor", "syscon", "simple-mfd" + + Refer to the the bindings described in + Documentation/devicetree/bindings/mfd/syscon.txt + +properties: + compatible: + const: brcm,bcm2711-thermal + + # See ./thermal.txt for details + "#thermal-sensor-cells": + const: 0 + +required: + - compatible + - '#thermal-sensor-cells' + +additionalProperties: false + +examples: + - | + avs-monitor@7d5d2000 { + compatible = "brcm,bcm2711-avs-monitor", + "syscon", "simple-mfd"; + reg = <0x7d5d2000 0xf00>; + + thermal: thermal { + compatible = "brcm,bcm2711-thermal"; + #thermal-sensor-cells = <0>; + }; + }; +... diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/thermal/brcm,avs-tmon.txt b/Documentation/devicetree/bindings/thermal/brcm,avs-tmon.txt --- a/Documentation/devicetree/bindings/thermal/brcm,avs-tmon.txt +++ b/Documentation/devicetree/bindings/thermal/brcm,avs-tmon.txt @@ -3,9 +3,13 @@ Thermal management core, provided by the AVS TMON hardware block. Required properties: -- compatible: must be "brcm,avs-tmon" and/or "brcm,avs-tmon-bcm7445" +- compatible: must be one of: + "brcm,avs-tmon-bcm7216" + "brcm,avs-tmon-bcm7445" + "brcm,avs-tmon" - reg: address range for the AVS TMON registers -- interrupts: temperature monitor interrupt, for high/low threshold triggers +- interrupts: temperature monitor interrupt, for high/low threshold triggers, + required except for "brcm,avs-tmon-bcm7216" - interrupt-names: should be "tmon" Example: diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt b/Documentation/devicetree/bindings/timer/renesas,cmt.txt --- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt +++ b/Documentation/devicetree/bindings/timer/renesas,cmt.txt @@ -29,6 +29,8 @@ - "renesas,r8a77470-cmt1" for the 48-bit CMT1 device included in r8a77470. - "renesas,r8a774a1-cmt0" for the 32-bit CMT0 device included in r8a774a1. - "renesas,r8a774a1-cmt1" for the 48-bit CMT devices included in r8a774a1. + - "renesas,r8a774b1-cmt0" for the 32-bit CMT0 device included in r8a774b1. + - "renesas,r8a774b1-cmt1" for the 48-bit CMT devices included in r8a774b1. - "renesas,r8a774c0-cmt0" for the 32-bit CMT0 device included in r8a774c0. - "renesas,r8a774c0-cmt1" for the 48-bit CMT devices included in r8a774c0. - "renesas,r8a7790-cmt0" for the 32-bit CMT0 device included in r8a7790. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt --- a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt +++ b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt @@ -15,6 +15,10 @@ "qcom,ci-hdrc" "chipidea,usb2" "xlnx,zynq-usb-2.20a" + "nvidia,tegra20-udc" + "nvidia,tegra30-udc" + "nvidia,tegra114-udc" + "nvidia,tegra124-udc" - reg: base address and length of the registers - interrupts: interrupt for the USB controller diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/usb/mediatek,musb.txt b/Documentation/devicetree/bindings/usb/mediatek,musb.txt --- a/Documentation/devicetree/bindings/usb/mediatek,musb.txt +++ b/Documentation/devicetree/bindings/usb/mediatek,musb.txt @@ -0,0 +1,57 @@ +MediaTek musb DRD/OTG controller +------------------------------------------- + +Required properties: + - compatible : should be one of: + "mediatek,mt2701-musb" + ... + followed by "mediatek,mtk-musb" + - reg : specifies physical base address and size of + the registers + - interrupts : interrupt used by musb controller + - interrupt-names : must be "mc" + - phys : PHY specifier for the OTG phy + - dr_mode : should be one of "host", "peripheral" or "otg", + refer to usb/generic.txt + - clocks : a list of phandle + clock-specifier pairs, one for + each entry in clock-names + - clock-names : must contain "main", "mcu", "univpll" + for clocks of controller + +Optional properties: + - power-domains : a phandle to USB power domain node to control USB's + MTCMOS + +Required child nodes: + usb connector node as defined in bindings/connector/usb-connector.txt +Optional properties: + - id-gpios : input GPIO for USB ID pin. + - vbus-gpios : input GPIO for USB VBUS pin. + - vbus-supply : reference to the VBUS regulator, needed when supports + dual-role mode + - usb-role-switch : use USB Role Switch to support dual-role switch, see + usb/generic.txt. + +Example: + +usb2: usb@11200000 { + compatible = "mediatek,mt2701-musb", + "mediatek,mtk-musb"; + reg = <0 0x11200000 0 0x1000>; + interrupts = ; + interrupt-names = "mc"; + phys = <&u2port2 PHY_TYPE_USB2>; + dr_mode = "otg"; + clocks = <&pericfg CLK_PERI_USB0>, + <&pericfg CLK_PERI_USB0_MCU>, + <&pericfg CLK_PERI_USB_SLV>; + clock-names = "main","mcu","univpll"; + power-domains = <&scpsys MT2701_POWER_DOMAIN_IFR_MSC>; + usb-role-switch; + connector{ + compatible = "gpio-usb-b-connector", "usb-b-connector"; + type = "micro"; + id-gpios = <&pio 44 GPIO_ACTIVE_HIGH>; + vbus-supply = <&usb_vbus>; + }; +}; diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/devicetree/bindings/vendor-prefixes.yaml b/Documentation/devicetree/bindings/vendor-prefixes.yaml --- a/Documentation/devicetree/bindings/vendor-prefixes.yaml +++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml @@ -109,6 +109,8 @@ description: Artesyn Embedded Technologies Inc. "^asahi-kasei,.*": description: Asahi Kasei Corp. + "^asc,.*": + description: All Sensors Corporation "^aspeed,.*": description: ASPEED Technology Inc. "^asus,.*": @@ -717,6 +719,8 @@ description: Panasonic Corporation "^parade,.*": description: Parade Technologies Inc. + "^parallax,.*": + description: Parallax Inc. "^pda,.*": description: Precision Design Associates, Inc. "^pericom,.*": @@ -1060,6 +1064,8 @@ description: Xilinx "^xunlong,.*": description: Shenzhen Xunlong Software CO.,Limited + "^xylon,.*": + description: Xylon "^yones-toptech,.*": description: Yones Toptech Co., Ltd. "^ysoft,.*": diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/doc-guide/contributing.rst b/Documentation/doc-guide/contributing.rst --- a/Documentation/doc-guide/contributing.rst +++ b/Documentation/doc-guide/contributing.rst @@ -0,0 +1,294 @@ +.. SPDX-License-Identifier: GPL-2.0 +How to help improve kernel documentation +======================================== + +Documentation is an important part of any software-development project. +Good documentation helps to bring new developers in and helps established +developers work more effectively. Without top-quality documentation, a lot +of time is wasted in reverse-engineering the code and making avoidable +mistakes. + +Unfortunately, the kernel's documentation currently falls far short of what +it needs to be to support a project of this size and importance. + +This guide is for contributors who would like to improve that situation. +Kernel documentation improvements can be made by developers at a variety of +skill levels; they are a relatively easy way to learn the kernel process in +general and find a place in the community. The bulk of what follows is the +documentation maintainer's list of tasks that most urgently need to be +done. + +The documentation TODO list +--------------------------- + +There is an endless list of tasks that need to be carried out to get our +documentation to where it should be. This list contains a number of +important items, but is far from exhaustive; if you see a different way to +improve the documentation, please do not hold back! + +Addressing warnings +~~~~~~~~~~~~~~~~~~~ + +The documentation build currently spews out an unbelievable number of +warnings. When you have that many, you might as well have none at all; +people ignore them, and they will never notice when their work adds new +ones. For this reason, eliminating warnings is one of the highest-priority +tasks on the documentation TODO list. The task itself is reasonably +straightforward, but it must be approached in the right way to be +successful. + +Warnings issued by a compiler for C code can often be dismissed as false +positives, leading to patches aimed at simply shutting the compiler up. +Warnings from the documentation build almost always point at a real +problem; making those warnings go away requires understanding the problem +and fixing it at its source. For this reason, patches fixing documentation +warnings should probably not say "fix a warning" in the changelog title; +they should indicate the real problem that has been fixed. + +Another important point is that documentation warnings are often created by +problems in kerneldoc comments in C code. While the documentation +maintainer appreciates being copied on fixes for these warnings, the +documentation tree is often not the right one to actually carry those +fixes; they should go to the maintainer of the subsystem in question. + +For example, in a documentation build I grabbed a pair of warnings nearly +at random:: + + ./drivers/devfreq/devfreq.c:1818: warning: bad line: + - Resource-managed devfreq_register_notifier() + ./drivers/devfreq/devfreq.c:1854: warning: bad line: + - Resource-managed devfreq_unregister_notifier() + +(The lines were split for readability). + +A quick look at the source file named above turned up a couple of kerneldoc +comments that look like this:: + + /** + * devm_devfreq_register_notifier() + - Resource-managed devfreq_register_notifier() + * @dev: The devfreq user device. (parent of devfreq) + * @devfreq: The devfreq object. + * @nb: The notifier block to be unregistered. + * @list: DEVFREQ_TRANSITION_NOTIFIER. + */ + +The problem is the missing "*", which confuses the build system's +simplistic idea of what C comment blocks look like. This problem had been +present since that comment was added in 2016 — a full four years. Fixing +it was a matter of adding the missing asterisks. A quick look at the +history for that file showed what the normal format for subject lines is, +and ``scripts/get_maintainer.pl`` told me who should receive it. The +resulting patch looked like this:: + + [PATCH] PM / devfreq: Fix two malformed kerneldoc comments + + Two kerneldoc comments in devfreq.c fail to adhere to the required format, + resulting in these doc-build warnings: + + ./drivers/devfreq/devfreq.c:1818: warning: bad line: + - Resource-managed devfreq_register_notifier() + ./drivers/devfreq/devfreq.c:1854: warning: bad line: + - Resource-managed devfreq_unregister_notifier() + + Add a couple of missing asterisks and make kerneldoc a little happier. + + Signed-off-by: Jonathan Corbet + --- + drivers/devfreq/devfreq.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + + diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c + index 57f6944d65a6..00c9b80b3d33 100644 + --- a/drivers/devfreq/devfreq.c + +++ b/drivers/devfreq/devfreq.c + @@ -1814,7 +1814,7 @@ static void devm_devfreq_notifier_release(struct device *dev, void *res) + + /** + * devm_devfreq_register_notifier() + - - Resource-managed devfreq_register_notifier() + + * - Resource-managed devfreq_register_notifier() + * @dev: The devfreq user device. (parent of devfreq) + * @devfreq: The devfreq object. + * @nb: The notifier block to be unregistered. + @@ -1850,7 +1850,7 @@ EXPORT_SYMBOL(devm_devfreq_register_notifier); + + /** + * devm_devfreq_unregister_notifier() + - - Resource-managed devfreq_unregister_notifier() + + * - Resource-managed devfreq_unregister_notifier() + * @dev: The devfreq user device. (parent of devfreq) + * @devfreq: The devfreq object. + * @nb: The notifier block to be unregistered. + -- + 2.24.1 + +The entire process only took a few minutes. Of course, I then found that +somebody else had fixed it in a separate tree, highlighting another lesson: +always check linux-next to see if a problem has been fixed before you dig +into it. + +Other fixes will take longer, especially those relating to structure +members or function parameters that lack documentation. In such cases, it +is necessary to work out what the role of those members or parameters is +and describe them correctly. Overall, this task gets a little tedious at +times, but it's highly important. If we can actually eliminate warnings +from the documentation build, then we can start expecting developers to +avoid adding new ones. + +Languishing kerneldoc comments +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Developers are encouraged to write kerneldoc comments for their code, but +many of those comments are never pulled into the docs build. That makes +this information harder to find and, for example, makes Sphinx unable to +generate links to that documentation. Adding ``kernel-doc`` directives to +the documentation to bring those comments in can help the community derive +the full value of the work that has gone into creating them. + +The ``scripts/find-unused-docs.sh`` tool can be used to find these +overlooked comments. + +Note that the most value comes from pulling in the documentation for +exported functions and data structures. Many subsystems also have +kerneldoc comments for internal use; those should not be pulled into the +documentation build unless they are placed in a document that is +specifically aimed at developers working within the relevant subsystem. + + +Typo fixes +~~~~~~~~~~ + +Fixing typographical or formatting errors in the documentation is a quick +way to figure out how to create and send patches, and it is a useful +service. I am always willing to accept such patches. That said, once you +have fixed a few, please consider moving on to more advanced tasks, leaving +some typos for the next beginner to address. + +Please note that some things are *not* typos and should not be "fixed": + + - Both American and British English spellings are allowed within the + kernel documentation. There is no need to fix one by replacing it with + the other. + + - The question of whether a period should be followed by one or two spaces + is not to be debated in the context of kernel documentation. Other + areas of rational disagreement, such as the "Oxford comma", are also + off-topic here. + +As with any patch to any project, please consider whether your change is +really making things better. + +Ancient documentation +~~~~~~~~~~~~~~~~~~~~~ + +Some kernel documentation is current, maintained, and useful. Some +documentation is ... not. Dusty, old, and inaccurate documentation can +mislead readers and casts doubt on our documentation as a whole. Anything +that can be done to address such problems is more than welcome. + +Whenever you are working with a document, please consider whether it is +current, whether it needs updating, or whether it should perhaps be removed +altogether. There are a number of warning signs that you can pay attention +to here: + + - References to 2.x kernels + - Pointers to SourceForge repositories + - Nothing but typo fixes in the history for several years + - Discussion of pre-Git workflows + +The best thing to do, of course, would be to bring the documentation +current, adding whatever information is needed. Such work often requires +the cooperation of developers familiar with the subsystem in question, of +course. Developers are often more than willing to cooperate with people +working to improve the documentation when asked nicely, and when their +answers are listened to and acted upon. + +Some documentation is beyond hope; we occasionally find documents that +refer to code that was removed from the kernel long ago, for example. +There is surprising resistance to removing obsolete documentation, but we +should do that anyway. Extra cruft in our documentation helps nobody. + +In cases where there is perhaps some useful information in a badly outdated +document, and you are unable to update it, the best thing to do may be to +add a warning at the beginning. The following text is recommended:: + + .. warning :: + This document is outdated and in need of attention. Please use + this information with caution, and please consider sending patches + to update it. + +That way, at least our long-suffering readers have been warned that the +document may lead them astray. + +Documentation coherency +~~~~~~~~~~~~~~~~~~~~~~~ + +The old-timers around here will remember the Linux books that showed up on +the shelves in the 1990s. They were simply collections of documentation +files scrounged from various locations on the net. The books have (mostly) +improved since then, but the kernel's documentation is still mostly built +on that model. It is thousands of files, almost each of which was written +in isolation from all of the others. We don't have a coherent body of +kernel documentation; we have thousands of individual documents. + +We have been trying to improve the situation through the creation of +a set of "books" that group documentation for specific readers. These +include: + + - :doc:`../admin-guide/index` + - :doc:`../core-api/index` + - :doc:`../driver-api/index` + - :doc:`../userspace-api/index` + +As well as this book on documentation itself. + +Moving documents into the appropriate books is an important task and needs +to continue. There are a couple of challenges associated with this work, +though. Moving documentation files creates short-term pain for the people +who work with those files; they are understandably unenthusiastic about +such changes. Usually the case can be made to move a document once; we +really don't want to keep shifting them around, though. + +Even when all documents are in the right place, though, we have only +managed to turn a big pile into a group of smaller piles. The work of +trying to knit all of those documents together into a single whole has not +yet begun. If you have bright ideas on how we could proceed on that front, +we would be more than happy to hear them. + +Stylesheet improvements +~~~~~~~~~~~~~~~~~~~~~~~ + +With the adoption of Sphinx we have much nicer-looking HTML output than we +once did. But it could still use a lot of improvement; Donald Knuth and +Edward Tufte would be unimpressed. That requires tweaking our stylesheets +to create more typographically sound, accessible, and readable output. + +Be warned: if you take on this task you are heading into classic bikeshed +territory. Expect a lot of opinions and discussion for even relatively +obvious changes. That is, alas, the nature of the world we live in. + +Non-LaTeX PDF build +~~~~~~~~~~~~~~~~~~~ + +This is a decidedly nontrivial task for somebody with a lot of time and +Python skills. The Sphinx toolchain is relatively small and well +contained; it is easy to add to a development system. But building PDF or +EPUB output requires installing LaTeX, which is anything but small or well +contained. That would be a nice thing to eliminate. + +The original hope had been to use the rst2pdf tool (https://rst2pdf.org/) +for PDF generation, but it turned out to not be up to the task. +Development work on rst2pdf seems to have picked up again in recent times, +though, which is a hopeful sign. If a suitably motivated developer were to +work with that project to make rst2pdf work with the kernel documentation +build, the world would be eternally grateful. + +Write more documentation +~~~~~~~~~~~~~~~~~~~~~~~~ + +Naturally, there are massive parts of the kernel that are severely +underdocumented. If you have the knowledge to document a specific kernel +subsystem and the desire to do so, please do not hesitate to do some +writing and contribute the result to the kernel. Untold numbers of kernel +developers and users will thank you. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/doc-guide/index.rst b/Documentation/doc-guide/index.rst --- a/Documentation/doc-guide/index.rst +++ b/Documentation/doc-guide/index.rst @@ -10,6 +10,8 @@ sphinx kernel-doc parse-headers + contributing + maintainer-profile .. only:: subproject and html diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/doc-guide/maintainer-profile.rst b/Documentation/doc-guide/maintainer-profile.rst --- a/Documentation/doc-guide/maintainer-profile.rst +++ b/Documentation/doc-guide/maintainer-profile.rst @@ -0,0 +1,44 @@ +.. SPDX-License-Identifier: GPL-2.0 +Documentation subsystem maintainer entry profile +================================================ + +The documentation "subsystem" is the central coordinating point for the +kernel's documentation and associated infrastructure. It covers the +hierarchy under Documentation/ (with the exception of +Documentation/device-tree), various utilities under scripts/ and, at least +some of the time, LICENSES/. + +It's worth noting, though, that the boundaries of this subsystem are rather +fuzzier than normal. Many other subsystem maintainers like to keep control +of portions of Documentation/, and many more freely apply changes there +when it is convenient. Beyond that, much of the kernel's documentation is +found in the source as kerneldoc comments; those are usually (but not +always) maintained by the relevant subsystem maintainer. + +The mailing list for documentation is linux-doc@vger.kernel.org. Patches +should be made against the docs-next tree whenever possible. + +Submit checklist addendum +------------------------- + +When making documentation changes, you should actually build the +documentation and ensure that no new errors or warnings have been +introduced. Generating HTML documents and looking at the result will help +to avoid unsightly misunderstandings about how things will be rendered. + +Key cycle dates +--------------- + +Patches can be sent anytime, but response will be slower than usual during +the merge window. The docs tree tends to close late before the merge +window opens, since the risk of regressions from documentation patches is +low. + +Review cadence +-------------- + +I am the sole maintainer for the documentation subsystem, and I am doing +the work on my own time, so the response to patches will occasionally be +slow. I try to always send out a notification when a patch is merged (or +when I decide that one cannot be). Do not hesitate to send a ping if you +have not heard back within a week of sending a patch. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/driver-api/dmaengine/client.rst b/Documentation/driver-api/dmaengine/client.rst --- a/Documentation/driver-api/dmaengine/client.rst +++ b/Documentation/driver-api/dmaengine/client.rst @@ -151,6 +151,93 @@ Note that callbacks will always be invoked from the DMA engines tasklet, never from interrupt context. + Optional: per descriptor metadata + --------------------------------- + DMAengine provides two ways for metadata support. + + DESC_METADATA_CLIENT + + The metadata buffer is allocated/provided by the client driver and it is + attached to the descriptor. + + .. code-block:: c + + int dmaengine_desc_attach_metadata(struct dma_async_tx_descriptor *desc, + void *data, size_t len); + + DESC_METADATA_ENGINE + + The metadata buffer is allocated/managed by the DMA driver. The client + driver can ask for the pointer, maximum size and the currently used size of + the metadata and can directly update or read it. + + Becasue the DMA driver manages the memory area containing the metadata, + clients must make sure that they do not try to access or get the pointer + after their transfer completion callback has run for the descriptor. + If no completion callback has been defined for the transfer, then the + metadata must not be accessed after issue_pending. + In other words: if the aim is to read back metadata after the transfer is + completed, then the client must use completion callback. + + .. code-block:: c + + void *dmaengine_desc_get_metadata_ptr(struct dma_async_tx_descriptor *desc, + size_t *payload_len, size_t *max_len); + + int dmaengine_desc_set_metadata_len(struct dma_async_tx_descriptor *desc, + size_t payload_len); + + Client drivers can query if a given mode is supported with: + + .. code-block:: c + + bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan, + enum dma_desc_metadata_mode mode); + + Depending on the used mode client drivers must follow different flow. + + DESC_METADATA_CLIENT + + - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM: + 1. prepare the descriptor (dmaengine_prep_*) + construct the metadata in the client's buffer + 2. use dmaengine_desc_attach_metadata() to attach the buffer to the + descriptor + 3. submit the transfer + - DMA_DEV_TO_MEM: + 1. prepare the descriptor (dmaengine_prep_*) + 2. use dmaengine_desc_attach_metadata() to attach the buffer to the + descriptor + 3. submit the transfer + 4. when the transfer is completed, the metadata should be available in the + attached buffer + + DESC_METADATA_ENGINE + + - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM: + 1. prepare the descriptor (dmaengine_prep_*) + 2. use dmaengine_desc_get_metadata_ptr() to get the pointer to the + engine's metadata area + 3. update the metadata at the pointer + 4. use dmaengine_desc_set_metadata_len() to tell the DMA engine the + amount of data the client has placed into the metadata buffer + 5. submit the transfer + - DMA_DEV_TO_MEM: + 1. prepare the descriptor (dmaengine_prep_*) + 2. submit the transfer + 3. on transfer completion, use dmaengine_desc_get_metadata_ptr() to get + the pointer to the engine's metadata area + 4. read out the metadata from the pointer + + .. note:: + + When DESC_METADATA_ENGINE mode is used the metadata area for the descriptor + is no longer valid after the transfer has been completed (valid up to the + point when the completion callback returns if used). + + Mixed use of DESC_METADATA_CLIENT / DESC_METADATA_ENGINE is not allowed, + client drivers must use either of the modes per descriptor. + 4. Submit the transaction Once the descriptor has been prepared and the callback information diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/driver-api/dmaengine/provider.rst b/Documentation/driver-api/dmaengine/provider.rst --- a/Documentation/driver-api/dmaengine/provider.rst +++ b/Documentation/driver-api/dmaengine/provider.rst @@ -247,6 +247,54 @@ (DMA_CYCLIC). Addresses pointing to a device's register (e.g. a FIFO) are typically fixed. +Per descriptor metadata support +------------------------------- +Some data movement architecture (DMA controller and peripherals) uses metadata +associated with a transaction. The DMA controller role is to transfer the +payload and the metadata alongside. +The metadata itself is not used by the DMA engine itself, but it contains +parameters, keys, vectors, etc for peripheral or from the peripheral. + +The DMAengine framework provides a generic ways to facilitate the metadata for +descriptors. Depending on the architecture the DMA driver can implement either +or both of the methods and it is up to the client driver to choose which one +to use. + +- DESC_METADATA_CLIENT + + The metadata buffer is allocated/provided by the client driver and it is + attached (via the dmaengine_desc_attach_metadata() helper to the descriptor. + + From the DMA driver the following is expected for this mode: + - DMA_MEM_TO_DEV / DEV_MEM_TO_MEM + The data from the provided metadata buffer should be prepared for the DMA + controller to be sent alongside of the payload data. Either by copying to a + hardware descriptor, or highly coupled packet. + - DMA_DEV_TO_MEM + On transfer completion the DMA driver must copy the metadata to the client + provided metadata buffer before notifying the client about the completion. + After the transfer completion, DMA drivers must not touch the metadata + buffer provided by the client. + +- DESC_METADATA_ENGINE + + The metadata buffer is allocated/managed by the DMA driver. The client driver + can ask for the pointer, maximum size and the currently used size of the + metadata and can directly update or read it. dmaengine_desc_get_metadata_ptr() + and dmaengine_desc_set_metadata_len() is provided as helper functions. + + From the DMA driver the following is expected for this mode: + - get_metadata_ptr + Should return a pointer for the metadata buffer, the maximum size of the + metadata buffer and the currently used / valid (if any) bytes in the buffer. + - set_metadata_len + It is called by the clients after it have placed the metadata to the buffer + to let the DMA driver know the number of valid bytes provided. + + Note: since the client will ask for the metadata pointer in the completion + callback (in DMA_DEV_TO_MEM case) the DMA driver must ensure that the + descriptor is not freed up prior the callback is called. + Device operations ----------------- diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst --- a/Documentation/driver-api/driver-model/devres.rst +++ b/Documentation/driver-api/driver-model/devres.rst @@ -267,6 +267,8 @@ GPIO devm_gpiod_get() + devm_gpiod_get_array() + devm_gpiod_get_array_optional() devm_gpiod_get_index() devm_gpiod_get_index_optional() devm_gpiod_get_optional() @@ -313,7 +315,6 @@ devm_ioport_map() devm_ioport_unmap() devm_ioremap() - devm_ioremap_nocache() devm_ioremap_uc() devm_ioremap_wc() devm_ioremap_resource() : checks resource, requests memory region, ioremaps diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/driver-api/gpio/driver.rst b/Documentation/driver-api/gpio/driver.rst --- a/Documentation/driver-api/gpio/driver.rst +++ b/Documentation/driver-api/gpio/driver.rst @@ -507,11 +507,6 @@ cascaded irq has to be handled by a threaded interrupt handler. Apart from that it works exactly like the chained irqchip. -- DEPRECATED: gpiochip_set_chained_irqchip(): sets up a chained cascaded irq - handler for a gpio_chip from a parent IRQ and passes the struct gpio_chip* - as handler data. Notice that we pass is as the handler data, since the - irqchip data is likely used by the parent irqchip. - - gpiochip_set_nested_irqchip(): sets up a nested cascaded irq handler for a gpio_chip from a parent IRQ. As the parent IRQ has usually been explicitly requested by the driver, this does very little more than diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/driver-api/gpio/drivers-on-gpio.rst b/Documentation/driver-api/gpio/drivers-on-gpio.rst --- a/Documentation/driver-api/gpio/drivers-on-gpio.rst +++ b/Documentation/driver-api/gpio/drivers-on-gpio.rst @@ -95,7 +95,7 @@ MTD NOR flash has add-ons for extra GPIO lines too, though the address bus is usually connected directly to the flash. -Use those instead of talking directly to the GPIOs using sysfs; they integrate -with kernel frameworks better than your userspace code could. Needless to say, -just using the appropriate kernel drivers will simplify and speed up your -embedded hacking in particular by providing ready-made components. +Use those instead of talking directly to the GPIOs from userspace; they +integrate with kernel frameworks better than your userspace code could. +Needless to say, just using the appropriate kernel drivers will simplify and +speed up your embedded hacking in particular by providing ready-made components. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/driver-api/gpio/index.rst b/Documentation/driver-api/gpio/index.rst --- a/Documentation/driver-api/gpio/index.rst +++ b/Documentation/driver-api/gpio/index.rst @@ -8,6 +8,7 @@ :maxdepth: 2 intro + using-gpio driver consumer board diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/driver-api/gpio/using-gpio.rst b/Documentation/driver-api/gpio/using-gpio.rst --- a/Documentation/driver-api/gpio/using-gpio.rst +++ b/Documentation/driver-api/gpio/using-gpio.rst @@ -0,0 +1,50 @@ +========================= +Using GPIO Lines in Linux +========================= + +The Linux kernel exists to abstract and present hardware to users. GPIO lines +as such are normally not user facing abstractions. The most obvious, natural +and preferred way to use GPIO lines is to let kernel hardware drivers deal +with them. + +For examples of already existing generic drivers that will also be good +examples for any other kernel drivers you want to author, refer to +:doc:`drivers-on-gpio` + +For any kind of mass produced system you want to support, such as servers, +laptops, phones, tablets, routers, and any consumer or office or business goods +using appropriate kernel drivers is paramount. Submit your code for inclusion +in the upstream Linux kernel when you feel it is mature enough and you will get +help to refine it, see :doc:`../../process/submitting-patches`. + +In Linux GPIO lines also have a userspace ABI. + +The userspace ABI is intended for one-off deployments. Examples are prototypes, +factory lines, maker community projects, workshop specimen, production tools, +industrial automation, PLC-type use cases, door controllers, in short a piece +of specialized equipment that is not produced by the numbers, requiring +operators to have a deep knowledge of the equipment and knows about the +software-hardware interface to be set up. They should not have a natural fit +to any existing kernel subsystem and not be a good fit for an operating system, +because of not being reusable or abstract enough, or involving a lot of non +computer hardware related policy. + +Applications that have a good reason to use the industrial I/O (IIO) subsystem +from userspace will likely be a good fit for using GPIO lines from userspace as +well. + +Do not under any circumstances abuse the GPIO userspace ABI to cut corners in +any product development projects. If you use it for prototyping, then do not +productify the prototype: rewrite it using proper kernel drivers. Do not under +any circumstances deploy any uniform products using GPIO from userspace. + +The userspace ABI is a character device for each GPIO hardware unit (GPIO chip). +These devices will appear on the system as ``/dev/gpiochip0`` thru +``/dev/gpiochipN``. Examples of how to directly use the userspace ABI can be +found in the kernel tree ``tools/gpio`` subdirectory. + +For structured and managed applications, we recommend that you make use of the +libgpiod_ library. This provides helper abstractions, command line utlities +and arbitration for multiple simultaneous consumers on the same GPIO chip. + +.. _libgpiod: https://git.kernel.org/pub/scm/libs/libgpiod/libgpiod.git/ diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/driver-api/interconnect.rst b/Documentation/driver-api/interconnect.rst --- a/Documentation/driver-api/interconnect.rst +++ b/Documentation/driver-api/interconnect.rst @@ -91,3 +91,25 @@ get paths between endpoints and set their bandwidth/latency/QoS requirements for these interconnect paths. These interfaces are not currently documented. + +Interconnect debugfs interfaces +------------------------------- + +Like several other subsystems interconnect will create some files for debugging +and introspection. Files in debugfs are not considered ABI so application +software shouldn't rely on format details change between kernel versions. + +``/sys/kernel/debug/interconnect/interconnect_summary``: + +Show all interconnect nodes in the system with their aggregated bandwidth +request. Indented under each node show bandwidth requests from each device. + +``/sys/kernel/debug/interconnect/interconnect_graph``: + +Show the interconnect graph in the graphviz dot format. It shows all +interconnect nodes and links in the system and groups together nodes from the +same provider as subgraphs. The format is human-readable and can also be piped +through dot to generate diagrams in many graphical formats:: + + $ cat /sys/kernel/debug/interconnect/interconnect_graph | \ + dot -Tsvg > interconnect_graph.svg diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/driver-api/thermal/cpu-idle-cooling.rst b/Documentation/driver-api/thermal/cpu-idle-cooling.rst --- a/Documentation/driver-api/thermal/cpu-idle-cooling.rst +++ b/Documentation/driver-api/thermal/cpu-idle-cooling.rst @@ -0,0 +1,189 @@ + +Situation: +---------- + +Under certain circumstances a SoC can reach a critical temperature +limit and is unable to stabilize the temperature around a temperature +control. When the SoC has to stabilize the temperature, the kernel can +act on a cooling device to mitigate the dissipated power. When the +critical temperature is reached, a decision must be taken to reduce +the temperature, that, in turn impacts performance. + +Another situation is when the silicon temperature continues to +increase even after the dynamic leakage is reduced to its minimum by +clock gating the component. This runaway phenomenon can continue due +to the static leakage. The only solution is to power down the +component, thus dropping the dynamic and static leakage that will +allow the component to cool down. + +Last but not least, the system can ask for a specific power budget but +because of the OPP density, we can only choose an OPP with a power +budget lower than the requested one and under-utilize the CPU, thus +losing performance. In other words, one OPP under-utilizes the CPU +with a power less than the requested power budget and the next OPP +exceeds the power budget. An intermediate OPP could have been used if +it were present. + +Solutions: +---------- + +If we can remove the static and the dynamic leakage for a specific +duration in a controlled period, the SoC temperature will +decrease. Acting on the idle state duration or the idle cycle +injection period, we can mitigate the temperature by modulating the +power budget. + +The Operating Performance Point (OPP) density has a great influence on +the control precision of cpufreq, however different vendors have a +plethora of OPP density, and some have large power gap between OPPs, +that will result in loss of performance during thermal control and +loss of power in other scenarios. + +At a specific OPP, we can assume that injecting idle cycle on all CPUs +belong to the same cluster, with a duration greater than the cluster +idle state target residency, we lead to dropping the static and the +dynamic leakage for this period (modulo the energy needed to enter +this state). So the sustainable power with idle cycles has a linear +relation with the OPP’s sustainable power and can be computed with a +coefficient similar to: + + Power(IdleCycle) = Coef x Power(OPP) + +Idle Injection: +--------------- + +The base concept of the idle injection is to force the CPU to go to an +idle state for a specified time each control cycle, it provides +another way to control CPU power and heat in addition to +cpufreq. Ideally, if all CPUs belonging to the same cluster, inject +their idle cycles synchronously, the cluster can reach its power down +state with a minimum power consumption and reduce the static leakage +to almost zero. However, these idle cycles injection will add extra +latencies as the CPUs will have to wakeup from a deep sleep state. + +We use a fixed duration of idle injection that gives an acceptable +performance penalty and a fixed latency. Mitigation can be increased +or decreased by modulating the duty cycle of the idle injection. + + ^ + | + | + |------- ------- + |_______|_______________________|_______|___________ + + <------> + idle <----------------------> + running + + <-----------------------------> + duty cycle 25% + + +The implementation of the cooling device bases the number of states on +the duty cycle percentage. When no mitigation is happening the cooling +device state is zero, meaning the duty cycle is 0%. + +When the mitigation begins, depending on the governor's policy, a +starting state is selected. With a fixed idle duration and the duty +cycle (aka the cooling device state), the running duration can be +computed. + +The governor will change the cooling device state thus the duty cycle +and this variation will modulate the cooling effect. + + ^ + | + | + |------- ------- + |_______|_______________|_______|___________ + + <------> + idle <--------------> + running + + <-----------------------------> + duty cycle 33% + + + ^ + | + | + |------- ------- + |_______|_______|_______|___________ + + <------> + idle <------> + running + + <-------------> + duty cycle 50% + +The idle injection duration value must comply with the constraints: + +- It is less than or equal to the latency we tolerate when the + mitigation begins. It is platform dependent and will depend on the + user experience, reactivity vs performance trade off we want. This + value should be specified. + +- It is greater than the idle state’s target residency we want to go + for thermal mitigation, otherwise we end up consuming more energy. + +Power considerations +-------------------- + +When we reach the thermal trip point, we have to sustain a specified +power for a specific temperature but at this time we consume: + + Power = Capacitance x Voltage^2 x Frequency x Utilisation + +... which is more than the sustainable power (or there is something +wrong in the system setup). The ‘Capacitance’ and ‘Utilisation’ are a +fixed value, ‘Voltage’ and the ‘Frequency’ are fixed artificially +because we don’t want to change the OPP. We can group the +‘Capacitance’ and the ‘Utilisation’ into a single term which is the +‘Dynamic Power Coefficient (Cdyn)’ Simplifying the above, we have: + + Pdyn = Cdyn x Voltage^2 x Frequency + +The power allocator governor will ask us somehow to reduce our power +in order to target the sustainable power defined in the device +tree. So with the idle injection mechanism, we want an average power +(Ptarget) resulting in an amount of time running at full power on a +specific OPP and idle another amount of time. That could be put in a +equation: + + P(opp)target = ((Trunning x (P(opp)running) + (Tidle x P(opp)idle)) / + (Trunning + Tidle) + ... + + Tidle = Trunning x ((P(opp)running / P(opp)target) - 1) + +At this point if we know the running period for the CPU, that gives us +the idle injection we need. Alternatively if we have the idle +injection duration, we can compute the running duration with: + + Trunning = Tidle / ((P(opp)running / P(opp)target) - 1) + +Practically, if the running power is less than the targeted power, we +end up with a negative time value, so obviously the equation usage is +bound to a power reduction, hence a higher OPP is needed to have the +running power greater than the targeted power. + +However, in this demonstration we ignore three aspects: + + * The static leakage is not defined here, we can introduce it in the + equation but assuming it will be zero most of the time as it is + difficult to get the values from the SoC vendors + + * The idle state wake up latency (or entry + exit latency) is not + taken into account, it must be added in the equation in order to + rigorously compute the idle injection + + * The injected idle duration must be greater than the idle state + target residency, otherwise we end up consuming more energy and + potentially invert the mitigation effect + +So the final equation is: + + Trunning = (Tidle - Twakeup ) x + (((P(opp)dyn + P(opp)static ) - P(opp)target) / P(opp)target ) diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/driver-api/thermal/exynos_thermal.rst b/Documentation/driver-api/thermal/exynos_thermal.rst --- a/Documentation/driver-api/thermal/exynos_thermal.rst +++ b/Documentation/driver-api/thermal/exynos_thermal.rst @@ -4,7 +4,7 @@ Supported chips: -* ARM SAMSUNG EXYNOS4, EXYNOS5 series of SoC +* ARM Samsung Exynos4, Exynos5 series of SoC Datasheet: Not publicly available @@ -14,7 +14,7 @@ TMU controller Description: --------------------------- -This driver allows to read temperature inside SAMSUNG EXYNOS4/5 series of SoC. +This driver allows to read temperature inside Samsung Exynos4/5 series of SoC. The chip only exposes the measured 8-bit temperature code value through a register. @@ -43,7 +43,7 @@ Trimming info for 85 degree Celsius (stored at TRIMINFO register) Temperature code measured at 85 degree Celsius which is unchanged -TMU(Thermal Management Unit) in EXYNOS4/5 generates interrupt +TMU(Thermal Management Unit) in Exynos4/5 generates interrupt when temperature exceeds pre-defined levels. The maximum number of configurable threshold is five. The threshold levels are defined as follows:: @@ -67,7 +67,7 @@ The exynos thermal driver is structured as:: Kernel Core thermal framework - (thermal_core.c, step_wise.c, cpu_cooling.c) + (thermal_core.c, step_wise.c, cpufreq_cooling.c) ^ | | diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/adfs.txt b/Documentation/filesystems/adfs.txt --- a/Documentation/filesystems/adfs.txt +++ b/Documentation/filesystems/adfs.txt @@ -1,3 +1,27 @@ +Filesystems supported by ADFS +----------------------------- + +The ADFS module supports the following Filecore formats which have: + +- new maps +- new directories or big directories + +In terms of the named formats, this means we support: + +- E and E+, with or without boot block +- F and F+ + +We fully support reading files from these filesystems, and writing to +existing files within their existing allocation. Essentially, we do +not support changing any of the filesystem metadata. + +This is intended to support loopback mounted Linux native filesystems +on a RISC OS Filecore filesystem, but will allow the data within files +to be changed. + +If write support (ADFS_FS_RW) is configured, we allow rudimentary +directory updates, specifically updating the access mode and timestamp. + Mount options for ADFS ---------------------- diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt --- a/Documentation/filesystems/automount-support.txt +++ b/Documentation/filesystems/automount-support.txt @@ -9,7 +9,7 @@ IN-KERNEL AUTOMOUNTING ====================== -See section "Mount Traps" of Documentation/filesystems/autofs.txt +See section "Mount Traps" of Documentation/filesystems/autofs.rst Then from userspace, you can just do something like: diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -234,8 +234,8 @@ entropy from the master key. HKDF is also standardized and widely used by other software, whereas the AES-128-ECB based KDF is ad-hoc. -Per-file keys -------------- +Per-file encryption keys +------------------------ Since each master key can protect many files, it is necessary to "tweak" the encryption of each file so that the same plaintext in two @@ -268,9 +268,9 @@ Therefore, to improve performance and save memory, for Adiantum a "direct key" configuration is supported. When the user has enabled this by setting FSCRYPT_POLICY_FLAG_DIRECT_KEY in the fscrypt policy, -per-file keys are not used. Instead, whenever any data (contents or -filenames) is encrypted, the file's 16-byte nonce is included in the -IV. Moreover: +per-file encryption keys are not used. Instead, whenever any data +(contents or filenames) is encrypted, the file's 16-byte nonce is +included in the IV. Moreover: - For v1 encryption policies, the encryption is done directly with the master key. Because of this, users **must not** use the same master @@ -302,6 +302,16 @@ identifier" is also derived using the KDF. This value is stored in the clear, since it is needed to reliably identify the key itself. +Dirhash keys +------------ + +For directories that are indexed using a secret-keyed dirhash over the +plaintext filenames, the KDF is also used to derive a 128-bit +SipHash-2-4 key per directory in order to hash filenames. This works +just like deriving a per-file encryption key, except that a different +KDF context is used. Currently, only casefolded ("case-insensitive") +encrypted directories use this style of hashing. + Encryption modes and usage ========================== @@ -325,11 +335,11 @@ Adiantum is a (primarily) stream cipher-based mode that is fast even on CPUs without dedicated crypto instructions. It's also a true wide-block mode, unlike XTS. It can also eliminate the need to derive -per-file keys. However, it depends on the security of two primitives, -XChaCha12 and AES-256, rather than just one. See the paper -"Adiantum: length-preserving encryption for entry-level processors" -(https://eprint.iacr.org/2018/720.pdf) for more details. To use -Adiantum, CONFIG_CRYPTO_ADIANTUM must be enabled. Also, fast +per-file encryption keys. However, it depends on the security of two +primitives, XChaCha12 and AES-256, rather than just one. See the +paper "Adiantum: length-preserving encryption for entry-level +processors" (https://eprint.iacr.org/2018/720.pdf) for more details. +To use Adiantum, CONFIG_CRYPTO_ADIANTUM must be enabled. Also, fast implementations of ChaCha and NHPoly1305 should be enabled, e.g. CONFIG_CRYPTO_CHACHA20_NEON and CONFIG_CRYPTO_NHPOLY1305_NEON for ARM. @@ -513,7 +523,9 @@ - ``EEXIST``: the file is already encrypted with an encryption policy different from the one specified - ``EINVAL``: an invalid encryption policy was specified (invalid - version, mode(s), or flags; or reserved bits were set) + version, mode(s), or flags; or reserved bits were set); or a v1 + encryption policy was specified but the directory has the casefold + flag enabled (casefolding is incompatible with v1 policies). - ``ENOKEY``: a v2 encryption policy was specified, but the key with the specified ``master_key_identifier`` has not been added, nor does the process have the CAP_FOWNER capability in the initial user @@ -638,7 +650,8 @@ struct fscrypt_add_key_arg { struct fscrypt_key_specifier key_spec; __u32 raw_size; - __u32 __reserved[9]; + __u32 key_id; + __u32 __reserved[8]; __u8 raw[]; }; @@ -655,6 +668,12 @@ } u; }; + struct fscrypt_provisioning_key_payload { + __u32 type; + __u32 __reserved; + __u8 raw[]; + }; + :c:type:`struct fscrypt_add_key_arg` must be zeroed, then initialized as follows: @@ -677,9 +696,26 @@ ``Documentation/security/keys/core.rst``). - ``raw_size`` must be the size of the ``raw`` key provided, in bytes. + Alternatively, if ``key_id`` is nonzero, this field must be 0, since + in that case the size is implied by the specified Linux keyring key. + +- ``key_id`` is 0 if the raw key is given directly in the ``raw`` + field. Otherwise ``key_id`` is the ID of a Linux keyring key of + type "fscrypt-provisioning" whose payload is a :c:type:`struct + fscrypt_provisioning_key_payload` whose ``raw`` field contains the + raw key and whose ``type`` field matches ``key_spec.type``. Since + ``raw`` is variable-length, the total size of this key's payload + must be ``sizeof(struct fscrypt_provisioning_key_payload)`` plus the + raw key size. The process must have Search permission on this key. + + Most users should leave this 0 and specify the raw key directly. + The support for specifying a Linux keyring key is intended mainly to + allow re-adding keys after a filesystem is unmounted and re-mounted, + without having to store the raw keys in userspace memory. - ``raw`` is a variable-length field which must contain the actual - key, ``raw_size`` bytes long. + key, ``raw_size`` bytes long. Alternatively, if ``key_id`` is + nonzero, then this field is unused. For v2 policy keys, the kernel keeps track of which user (identified by effective user ID) added the key, and only allows the key to be @@ -701,11 +737,16 @@ - ``EACCES``: FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR was specified, but the caller does not have the CAP_SYS_ADMIN capability in the initial - user namespace + user namespace; or the raw key was specified by Linux key ID but the + process lacks Search permission on the key. - ``EDQUOT``: the key quota for this user would be exceeded by adding the key - ``EINVAL``: invalid key size or key specifier type, or reserved bits were set +- ``EKEYREJECTED``: the raw key was specified by Linux key ID, but the + key has the wrong type +- ``ENOKEY``: the raw key was specified by Linux key ID, but no key + exists with that ID - ``ENOTTY``: this type of filesystem does not implement encryption - ``EOPNOTSUPP``: the kernel was not configured with encryption support for this filesystem, or the filesystem superblock has not @@ -1108,8 +1149,8 @@ policy structs (see `Setting an encryption policy`_), except that the context structs also contain a nonce. The nonce is randomly generated by the kernel and is used as KDF input or as a tweak to cause -different files to be encrypted differently; see `Per-file keys`_ and -`DIRECT_KEY policies`_. +different files to be encrypted differently; see `Per-file encryption +keys`_ and `DIRECT_KEY policies`_. Data path changes ----------------- @@ -1161,7 +1202,7 @@ allows the filesystem to still, with a high degree of confidence, map the filename given in ->lookup() back to a particular directory entry that was previously listed by readdir(). See :c:type:`struct -fscrypt_digested_name` in the source for more details. +fscrypt_nokey_name` in the source for more details. Note that the precise way that filenames are presented to userspace without the key is subject to change in the future. It is only meant diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -47,4 +47,6 @@ :maxdepth: 2 autofs + overlayfs virtiofs + vfat diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/nfs/fault_injection.txt b/Documentation/filesystems/nfs/fault_injection.txt --- a/Documentation/filesystems/nfs/fault_injection.txt +++ b/Documentation/filesystems/nfs/fault_injection.txt @@ -1,69 +0,0 @@ - -Fault Injection -=============== -Fault injection is a method for forcing errors that may not normally occur, or -may be difficult to reproduce. Forcing these errors in a controlled environment -can help the developer find and fix bugs before their code is shipped in a -production system. Injecting an error on the Linux NFS server will allow us to -observe how the client reacts and if it manages to recover its state correctly. - -NFSD_FAULT_INJECTION must be selected when configuring the kernel to use this -feature. - - -Using Fault Injection -===================== -On the client, mount the fault injection server through NFS v4.0+ and do some -work over NFS (open files, take locks, ...). - -On the server, mount the debugfs filesystem to and ls -/nfsd. This will show a list of files that will be used for -injecting faults on the NFS server. As root, write a number n to the file -corresponding to the action you want the server to take. The server will then -process the first n items it finds. So if you want to forget 5 locks, echo '5' -to /nfsd/forget_locks. A value of 0 will tell the server to forget -all corresponding items. A log message will be created containing the number -of items forgotten (check dmesg). - -Go back to work on the client and check if the client recovered from the error -correctly. - - -Available Faults -================ -forget_clients: - The NFS server keeps a list of clients that have placed a mount call. If - this list is cleared, the server will have no knowledge of who the client - is, forcing the client to reauthenticate with the server. - -forget_openowners: - The NFS server keeps a list of what files are currently opened and who - they were opened by. Clearing this list will force the client to reopen - its files. - -forget_locks: - The NFS server keeps a list of what files are currently locked in the VFS. - Clearing this list will force the client to reclaim its locks (files are - unlocked through the VFS as they are cleared from this list). - -forget_delegations: - A delegation is used to assure the client that a file, or part of a file, - has not changed since the delegation was awarded. Clearing this list will - force the client to reacquire its delegation before accessing the file - again. - -recall_delegations: - Delegations can be recalled by the server when another client attempts to - access a file. This test will notify the client that its delegation has - been revoked, forcing the client to reacquire the delegation before using - the file again. - - -tools/nfs/inject_faults.sh script -================================= -This script has been created to ease the fault injection process. This script -will detect the mounted debugfs directory and write to the files located there -based on the arguments passed by the user. For example, running -`inject_faults.sh forget_locks 1` as root will instruct the server to forget -one lock. Running `inject_faults forget_locks` will instruct the server to -forgetall locks. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/nfs/idmapper.txt b/Documentation/filesystems/nfs/idmapper.txt --- a/Documentation/filesystems/nfs/idmapper.txt +++ b/Documentation/filesystems/nfs/idmapper.txt @@ -1,75 +0,0 @@ - -========= -ID Mapper -========= -Id mapper is used by NFS to translate user and group ids into names, and to -translate user and group names into ids. Part of this translation involves -performing an upcall to userspace to request the information. There are two -ways NFS could obtain this information: placing a call to /sbin/request-key -or by placing a call to the rpc.idmap daemon. - -NFS will attempt to call /sbin/request-key first. If this succeeds, the -result will be cached using the generic request-key cache. This call should -only fail if /etc/request-key.conf is not configured for the id_resolver key -type, see the "Configuring" section below if you wish to use the request-key -method. - -If the call to /sbin/request-key fails (if /etc/request-key.conf is not -configured with the id_resolver key type), then the idmapper will ask the -legacy rpc.idmap daemon for the id mapping. This result will be stored -in a custom NFS idmap cache. - - -=========== -Configuring -=========== -The file /etc/request-key.conf will need to be modified so /sbin/request-key can -direct the upcall. The following line should be added: - -#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... -#====== ======= =============== =============== =============================== -create id_resolver * * /usr/sbin/nfs.idmap %k %d 600 - -This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap. -The last parameter, 600, defines how many seconds into the future the key will -expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout -is not specified, nfs.idmap will default to 600 seconds. - -id mapper uses for key descriptions: - uid: Find the UID for the given user - gid: Find the GID for the given group - user: Find the user name for the given UID - group: Find the group name for the given GID - -You can handle any of these individually, rather than using the generic upcall -program. If you would like to use your own program for a uid lookup then you -would edit your request-key.conf so it look similar to this: - -#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... -#====== ======= =============== =============== =============================== -create id_resolver uid:* * /some/other/program %k %d 600 -create id_resolver * * /usr/sbin/nfs.idmap %k %d 600 - -Notice that the new line was added above the line for the generic program. -request-key will find the first matching line and corresponding program. In -this case, /some/other/program will handle all uid lookups and -/usr/sbin/nfs.idmap will handle gid, user, and group lookups. - -See for more information -about the request-key function. - - -========= -nfs.idmap -========= -nfs.idmap is designed to be called by request-key, and should not be run "by -hand". This program takes two arguments, a serialized key and a key -description. The serialized key is first converted into a key_serial_t, and -then passed as an argument to keyctl_instantiate (both are part of keyutils.h). - -The actual lookups are performed by functions found in nfsidmap.h. nfs.idmap -determines the correct function to call by looking at the first part of the -description string. For example, a uid lookup description will appear as -"uid:user@domain". - -nfs.idmap will return 0 if the key was instantiated, and non-zero otherwise. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/nfs/nfs-rdma.txt b/Documentation/filesystems/nfs/nfs-rdma.txt --- a/Documentation/filesystems/nfs/nfs-rdma.txt +++ b/Documentation/filesystems/nfs/nfs-rdma.txt @@ -1,274 +0,0 @@ -################################################################################ -# # -# NFS/RDMA README # -# # -################################################################################ - - Author: NetApp and Open Grid Computing - Date: May 29, 2008 - -Table of Contents -~~~~~~~~~~~~~~~~~ - - Overview - - Getting Help - - Installation - - Check RDMA and NFS Setup - - NFS/RDMA Setup - -Overview -~~~~~~~~ - - This document describes how to install and setup the Linux NFS/RDMA client - and server software. - - The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server - was first included in the following release, Linux 2.6.25. - - In our testing, we have obtained excellent performance results (full 10Gbit - wire bandwidth at minimal client CPU) under many workloads. The code passes - the full Connectathon test suite and operates over both Infiniband and iWARP - RDMA adapters. - -Getting Help -~~~~~~~~~~~~ - - If you get stuck, you can ask questions on the - - nfs-rdma-devel@lists.sourceforge.net - - mailing list. - -Installation -~~~~~~~~~~~~ - - These instructions are a step by step guide to building a machine for - use with NFS/RDMA. - - - Install an RDMA device - - Any device supported by the drivers in drivers/infiniband/hw is acceptable. - - Testing has been performed using several Mellanox-based IB cards, the - Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. - - - Install a Linux distribution and tools - - The first kernel release to contain both the NFS/RDMA client and server was - Linux 2.6.25 Therefore, a distribution compatible with this and subsequent - Linux kernel release should be installed. - - The procedures described in this document have been tested with - distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). - - - Install nfs-utils-1.1.2 or greater on the client - - An NFS/RDMA mount point can be obtained by using the mount.nfs command in - nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils - version with support for NFS/RDMA mounts, but for various reasons we - recommend using nfs-utils-1.1.2 or greater). To see which version of - mount.nfs you are using, type: - - $ /sbin/mount.nfs -V - - If the version is less than 1.1.2 or the command does not exist, - you should install the latest version of nfs-utils. - - Download the latest package from: - - http://www.kernel.org/pub/linux/utils/nfs - - Uncompress the package and follow the installation instructions. - - If you will not need the idmapper and gssd executables (you do not need - these to create an NFS/RDMA enabled mount command), the installation - process can be simplified by disabling these features when running - configure: - - $ ./configure --disable-gss --disable-nfsv4 - - To build nfs-utils you will need the tcp_wrappers package installed. For - more information on this see the package's README and INSTALL files. - - After building the nfs-utils package, there will be a mount.nfs binary in - the utils/mount directory. This binary can be used to initiate NFS v2, v3, - or v4 mounts. To initiate a v4 mount, the binary must be called - mount.nfs4. The standard technique is to create a symlink called - mount.nfs4 to mount.nfs. - - This mount.nfs binary should be installed at /sbin/mount.nfs as follows: - - $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs - - In this location, mount.nfs will be invoked automatically for NFS mounts - by the system mount command. - - NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed - on the NFS client machine. You do not need this specific version of - nfs-utils on the server. Furthermore, only the mount.nfs command from - nfs-utils-1.1.2 is needed on the client. - - - Install a Linux kernel with NFS/RDMA - - The NFS/RDMA client and server are both included in the mainline Linux - kernel version 2.6.25 and later. This and other versions of the Linux - kernel can be found at: - - https://www.kernel.org/pub/linux/kernel/ - - Download the sources and place them in an appropriate location. - - - Configure the RDMA stack - - Make sure your kernel configuration has RDMA support enabled. Under - Device Drivers -> InfiniBand support, update the kernel configuration - to enable InfiniBand support [NOTE: the option name is misleading. Enabling - InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. - - Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or - iWARP adapter support (amso, cxgb3, etc.). - - If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. - - - Configure the NFS client and server - - Your kernel configuration must also have NFS file system support and/or - NFS server support enabled. These and other NFS related configuration - options can be found under File Systems -> Network File Systems. - - - Build, install, reboot - - The NFS/RDMA code will be enabled automatically if NFS and RDMA - are turned on. The NFS/RDMA client and server are configured via the hidden - SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The - value of SUNRPC_XPRT_RDMA will be: - - - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client - and server will not be built - - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, - in this case the NFS/RDMA client and server will be built as modules - - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client - and server will be built into the kernel - - Therefore, if you have followed the steps above and turned no NFS and RDMA, - the NFS/RDMA client and server will be built. - - Build a new kernel, install it, boot it. - -Check RDMA and NFS Setup -~~~~~~~~~~~~~~~~~~~~~~~~ - - Before configuring the NFS/RDMA software, it is a good idea to test - your new kernel to ensure that the kernel is working correctly. - In particular, it is a good idea to verify that the RDMA stack - is functioning as expected and standard NFS over TCP/IP and/or UDP/IP - is working properly. - - - Check RDMA Setup - - If you built the RDMA components as modules, load them at - this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel - card: - - $ modprobe ib_mthca - $ modprobe ib_ipoib - - If you are using InfiniBand, make sure there is a Subnet Manager (SM) - running on the network. If your IB switch has an embedded SM, you can - use it. Otherwise, you will need to run an SM, such as OpenSM, on one - of your end nodes. - - If an SM is running on your network, you should see the following: - - $ cat /sys/class/infiniband/driverX/ports/1/state - 4: ACTIVE - - where driverX is mthca0, ipath5, ehca3, etc. - - To further test the InfiniBand software stack, use IPoIB (this - assumes you have two IB hosts named host1 and host2): - - host1$ ip link set dev ib0 up - host1$ ip address add dev ib0 a.b.c.x - host2$ ip link set dev ib0 up - host2$ ip address add dev ib0 a.b.c.y - host1$ ping a.b.c.y - host2$ ping a.b.c.x - - For other device types, follow the appropriate procedures. - - - Check NFS Setup - - For the NFS components enabled above (client and/or server), - test their functionality over standard Ethernet using TCP/IP or UDP/IP. - -NFS/RDMA Setup -~~~~~~~~~~~~~~ - - We recommend that you use two machines, one to act as the client and - one to act as the server. - - One time configuration: - - - On the server system, configure the /etc/exports file and - start the NFS/RDMA server. - - Exports entries with the following formats have been tested: - - /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) - /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) - - The IP address(es) is(are) the client's IPoIB address for an InfiniBand - HCA or the client's iWARP address(es) for an RNIC. - - NOTE: The "insecure" option must be used because the NFS/RDMA client does - not use a reserved port. - - Each time a machine boots: - - - Load and configure the RDMA drivers - - For InfiniBand using a Mellanox adapter: - - $ modprobe ib_mthca - $ modprobe ib_ipoib - $ ip li set dev ib0 up - $ ip addr add dev ib0 a.b.c.d - - NOTE: use unique addresses for the client and server - - - Start the NFS server - - If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in - kernel config), load the RDMA transport module: - - $ modprobe svcrdma - - Regardless of how the server was built (module or built-in), start the - server: - - $ /etc/init.d/nfs start - - or - - $ service nfs start - - Instruct the server to listen on the RDMA transport: - - $ echo rdma 20049 > /proc/fs/nfsd/portlist - - - On the client system - - If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in - kernel config), load the RDMA client module: - - $ modprobe xprtrdma.ko - - Regardless of how the client was built (module or built-in), use this - command to mount the NFS/RDMA server: - - $ mount -o rdma,port=20049 :/ /mnt - - To verify that the mount is using RDMA, run "cat /proc/mounts" and check - the "proto" field for the given mount. - - Congratulations! You're using NFS/RDMA! diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/nfs/nfs.txt b/Documentation/filesystems/nfs/nfs.txt --- a/Documentation/filesystems/nfs/nfs.txt +++ b/Documentation/filesystems/nfs/nfs.txt @@ -1,136 +0,0 @@ - -The NFS client -============== - -The NFS version 2 protocol was first documented in RFC1094 (March 1989). -Since then two more major releases of NFS have been published, with NFSv3 -being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April -2003). - -The Linux NFS client currently supports all the above published versions, -and work is in progress on adding support for minor version 1 of the NFSv4 -protocol. - -The purpose of this document is to provide information on some of the -special features of the NFS client that can be configured by system -administrators. - - -The nfs4_unique_id parameter -============================ - -NFSv4 requires clients to identify themselves to servers with a unique -string. File open and lock state shared between one client and one server -is associated with this identity. To support robust NFSv4 state recovery -and transparent state migration, this identity string must not change -across client reboots. - -Without any other intervention, the Linux client uses a string that contains -the local system's node name. System administrators, however, often do not -take care to ensure that node names are fully qualified and do not change -over the lifetime of a client system. Node names can have other -administrative requirements that require particular behavior that does not -work well as part of an nfs_client_id4 string. - -The nfs.nfs4_unique_id boot parameter specifies a unique string that can be -used instead of a system's node name when an NFS client identifies itself to -a server. Thus, if the system's node name is not unique, or it changes, its -nfs.nfs4_unique_id stays the same, preventing collision with other clients -or loss of state during NFS reboot recovery or transparent state migration. - -The nfs.nfs4_unique_id string is typically a UUID, though it can contain -anything that is believed to be unique across all NFS clients. An -nfs4_unique_id string should be chosen when a client system is installed, -just as a system's root file system gets a fresh UUID in its label at -install time. - -The string should remain fixed for the lifetime of the client. It can be -changed safely if care is taken that the client shuts down cleanly and all -outstanding NFSv4 state has expired, to prevent loss of NFSv4 state. - -This string can be stored in an NFS client's grub.conf, or it can be provided -via a net boot facility such as PXE. It may also be specified as an nfs.ko -module parameter. Specifying a uniquifier string is not support for NFS -clients running in containers. - - -The DNS resolver -================ - -NFSv4 allows for one server to refer the NFS client to data that has been -migrated onto another server by means of the special "fs_locations" -attribute. See - http://tools.ietf.org/html/rfc3530#section-6 -and - http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 - -The fs_locations information can take the form of either an ip address and -a path, or a DNS hostname and a path. The latter requires the NFS client to -do a DNS lookup in order to mount the new volume, and hence the need for an -upcall to allow userland to provide this service. - -Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual -/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps: - - (1) The process checks the dns_resolve cache to see if it contains a - valid entry. If so, it returns that entry and exits. - - (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' - (may be changed using the 'nfs.cache_getent' kernel boot parameter) - is run, with two arguments: - - the cache name, "dns_resolve" - - the hostname to resolve - - (3) After looking up the corresponding ip address, the helper script - writes the result into the rpc_pipefs pseudo-file - '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel' - in the following (text) format: - - " \n" - - Where is in the usual IPv4 (123.456.78.90) or IPv6 - (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format. - is identical to the second argument of the helper - script, and is the 'time to live' of this cache entry (in - units of seconds). - - Note: If is invalid, say the string "0", then a negative - entry is created, which will cause the kernel to treat the hostname - as having no valid DNS translation. - - - - -A basic sample /sbin/nfs_cache_getent -===================================== - -#!/bin/bash -# -ttl=600 -# -cut=/usr/bin/cut -getent=/usr/bin/getent -rpc_pipefs=/var/lib/nfs/rpc_pipefs -# -die() -{ - echo "Usage: $0 cache_name entry_name" - exit 1 -} - -[ $# -lt 2 ] && die -cachename="$1" -cache_path=${rpc_pipefs}/cache/${cachename}/channel - -case "${cachename}" in - dns_resolve) - name="$2" - result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" - [ -z "${result}" ] && result="0" - ;; - *) - die - ;; -esac -echo "${result} ${name} ${ttl}" >${cache_path} - diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt b/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt --- a/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt +++ b/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt @@ -1,41 +0,0 @@ -Administrative interfaces for nfsd -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Note that normally these interfaces are used only by the utilities in -nfs-utils. - -nfsd is controlled mainly by pseudofiles under the "nfsd" filesystem, -which is normally mounted at /proc/fs/nfsd/. - -The server is always started by the first write of a nonzero value to -nfsd/threads. - -Before doing that, NFSD can be told which sockets to listen on by -writing to nfsd/portlist; that write may be: - - - an ascii-encoded file descriptor, which should refer to a - bound (and listening, for tcp) socket, or - - "transportname port", where transportname is currently either - "udp", "tcp", or "rdma". - -If nfsd is started without doing any of these, then it will create one -udp and one tcp listener at port 2049 (see nfsd_init_socks). - -On startup, nfsd and lockd grace periods start. - -nfsd is shut down by a write of 0 to nfsd/threads. All locks and state -are thrown away at that point. - -Between startup and shutdown, the number of threads may be adjusted up -or down by additional writes to nfsd/threads or by writes to -nfsd/pool_threads. - -For more detail about files under nfsd/ and what they control, see -fs/nfsd/nfsctl.c; most of them have detailed comments. - -Implementation notes -^^^^^^^^^^^^^^^^^^^^ - -Note that the rpc server requires the caller to serialize addition and -removal of listening sockets, and startup and shutdown of the server. -For nfsd this is done using nfsd_mutex. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt --- a/Documentation/filesystems/nfs/nfsroot.txt +++ b/Documentation/filesystems/nfs/nfsroot.txt @@ -1,355 +0,0 @@ -Mounting the root filesystem via NFS (nfsroot) -=============================================== - -Written 1996 by Gero Kuhlmann -Updated 1997 by Martin Mares -Updated 2006 by Nico Schottelius -Updated 2006 by Horms -Updated 2018 by Chris Novakovic - - - -In order to use a diskless system, such as an X-terminal or printer server -for example, it is necessary for the root filesystem to be present on a -non-disk device. This may be an initramfs (see Documentation/filesystems/ -ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/admin-guide/initrd.rst) or a -filesystem mounted via NFS. The following text describes on how to use NFS -for the root filesystem. For the rest of this text 'client' means the -diskless system, and 'server' means the NFS server. - - - - -1.) Enabling nfsroot capabilities - ----------------------------- - -In order to use nfsroot, NFS client support needs to be selected as -built-in during configuration. Once this has been selected, the nfsroot -option will become available, which should also be selected. - -In the networking options, kernel level autoconfiguration can be selected, -along with the types of autoconfiguration to support. Selecting all of -DHCP, BOOTP and RARP is safe. - - - - -2.) Kernel command line - ------------------- - -When the kernel has been loaded by a boot loader (see below) it needs to be -told what root fs device to use. And in the case of nfsroot, where to find -both the server and the name of the directory on the server to mount as root. -This can be established using the following kernel command line parameters: - - -root=/dev/nfs - - This is necessary to enable the pseudo-NFS-device. Note that it's not a - real device but just a synonym to tell the kernel to use NFS instead of - a real device. - - -nfsroot=[:][,] - - If the `nfsroot' parameter is NOT given on the command line, - the default "/tftpboot/%s" will be used. - - Specifies the IP address of the NFS server. - The default address is determined by the `ip' parameter - (see below). This parameter allows the use of different - servers for IP autoconfiguration and NFS. - - Name of the directory on the server to mount as root. - If there is a "%s" token in the string, it will be - replaced by the ASCII-representation of the client's - IP address. - - Standard NFS options. All options are separated by commas. - The following defaults are used: - port = as given by server portmap daemon - rsize = 4096 - wsize = 4096 - timeo = 7 - retrans = 3 - acregmin = 3 - acregmax = 60 - acdirmin = 30 - acdirmax = 60 - flags = hard, nointr, noposix, cto, ac - - -ip=::::::: - :: - - This parameter tells the kernel how to configure IP addresses of devices - and also how to set up the IP routing table. It was originally called - `nfsaddrs', but now the boot-time IP configuration works independently of - NFS, so it was renamed to `ip' and the old name remained as an alias for - compatibility reasons. - - If this parameter is missing from the kernel command line, all fields are - assumed to be empty, and the defaults mentioned below apply. In general - this means that the kernel tries to configure everything using - autoconfiguration. - - The parameter can appear alone as the value to the `ip' - parameter (without all the ':' characters before). If the value is - "ip=off" or "ip=none", no autoconfiguration will take place, otherwise - autoconfiguration will take place. The most common way to use this - is "ip=dhcp". - - IP address of the client. - - Default: Determined using autoconfiguration. - - IP address of the NFS server. If RARP is used to determine - the client address and this parameter is NOT empty only - replies from the specified server are accepted. - - Only required for NFS root. That is autoconfiguration - will not be triggered if it is missing and NFS root is not - in operation. - - Value is exported to /proc/net/pnp with the prefix "bootserver " - (see below). - - Default: Determined using autoconfiguration. - The address of the autoconfiguration server is used. - - IP address of a gateway if the server is on a different subnet. - - Default: Determined using autoconfiguration. - - Netmask for local network interface. If unspecified - the netmask is derived from the client IP address assuming - classful addressing. - - Default: Determined using autoconfiguration. - - Name of the client. If a '.' character is present, anything - before the first '.' is used as the client's hostname, and anything - after it is used as its NIS domain name. May be supplied by - autoconfiguration, but its absence will not trigger autoconfiguration. - If specified and DHCP is used, the user-provided hostname (and NIS - domain name, if present) will be carried in the DHCP request; this - may cause a DNS record to be created or updated for the client. - - Default: Client IP address is used in ASCII notation. - - Name of network device to use. - - Default: If the host only has one device, it is used. - Otherwise the device is determined using - autoconfiguration. This is done by sending - autoconfiguration requests out of all devices, - and using the device that received the first reply. - - Method to use for autoconfiguration. In the case of options - which specify multiple autoconfiguration protocols, - requests are sent using all protocols, and the first one - to reply is used. - - Only autoconfiguration protocols that have been compiled - into the kernel will be used, regardless of the value of - this option. - - off or none: don't use autoconfiguration - (do static IP assignment instead) - on or any: use any protocol available in the kernel - (default) - dhcp: use DHCP - bootp: use BOOTP - rarp: use RARP - both: use both BOOTP and RARP but not DHCP - (old option kept for backwards compatibility) - - if dhcp is used, the client identifier can be used by following - format "ip=dhcp,client-id-type,client-id-value" - - Default: any - - IP address of primary nameserver. - Value is exported to /proc/net/pnp with the prefix "nameserver " - (see below). - - Default: None if not using autoconfiguration; determined - automatically if using autoconfiguration. - - IP address of secondary nameserver. - See . - - IP address of a Network Time Protocol (NTP) server. - Value is exported to /proc/net/ipconfig/ntp_servers, but is - otherwise unused (see below). - - Default: None if not using autoconfiguration; determined - automatically if using autoconfiguration. - - After configuration (whether manual or automatic) is complete, two files - are created in the following format; lines are omitted if their respective - value is empty following configuration: - - - /proc/net/pnp: - - #PROTO: (depending on configuration method) - domain (if autoconfigured, the DNS domain) - nameserver (primary name server IP) - nameserver (secondary name server IP) - nameserver (tertiary name server IP) - bootserver (NFS server IP) - - - /proc/net/ipconfig/ntp_servers: - - (NTP server IP) - (NTP server IP) - (NTP server IP) - - and (in /proc/net/pnp) and and - (in /proc/net/ipconfig/ntp_servers) are requested during autoconfiguration; - they cannot be specified as part of the "ip=" kernel command line parameter. - - Because the "domain" and "nameserver" options are recognised by DNS - resolvers, /etc/resolv.conf is often linked to /proc/net/pnp on systems - that use an NFS root filesystem. - - Note that the kernel will not synchronise the system time with any NTP - servers it discovers; this is the responsibility of a user space process - (e.g. an initrd/initramfs script that passes the IP addresses listed in - /proc/net/ipconfig/ntp_servers to an NTP client before mounting the real - root filesystem if it is on NFS). - - -nfsrootdebug - - This parameter enables debugging messages to appear in the kernel - log at boot time so that administrators can verify that the correct - NFS mount options, server address, and root path are passed to the - NFS client. - - -rdinit= - - To specify which file contains the program that starts system - initialization, administrators can use this command line parameter. - The default value of this parameter is "/init". If the specified - file exists and the kernel can execute it, root filesystem related - kernel command line parameters, including `nfsroot=', are ignored. - - A description of the process of mounting the root file system can be - found in: - - Documentation/driver-api/early-userspace/early_userspace_support.rst - - - - -3.) Boot Loader - ---------- - -To get the kernel into memory different approaches can be used. -They depend on various facilities being available: - - -3.1) Booting from a floppy using syslinux - - When building kernels, an easy way to create a boot floppy that uses - syslinux is to use the zdisk or bzdisk make targets which use zimage - and bzimage images respectively. Both targets accept the - FDARGS parameter which can be used to set the kernel command line. - - e.g. - make bzdisk FDARGS="root=/dev/nfs" - - Note that the user running this command will need to have - access to the floppy drive device, /dev/fd0 - - For more information on syslinux, including how to create bootdisks - for prebuilt kernels, see http://syslinux.zytor.com/ - - N.B: Previously it was possible to write a kernel directly to - a floppy using dd, configure the boot device using rdev, and - boot using the resulting floppy. Linux no longer supports this - method of booting. - -3.2) Booting from a cdrom using isolinux - - When building kernels, an easy way to create a bootable cdrom that - uses isolinux is to use the isoimage target which uses a bzimage - image. Like zdisk and bzdisk, this target accepts the FDARGS - parameter which can be used to set the kernel command line. - - e.g. - make isoimage FDARGS="root=/dev/nfs" - - The resulting iso image will be arch//boot/image.iso - This can be written to a cdrom using a variety of tools including - cdrecord. - - e.g. - cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso - - For more information on isolinux, including how to create bootdisks - for prebuilt kernels, see http://syslinux.zytor.com/ - -3.2) Using LILO - When using LILO all the necessary command line parameters may be - specified using the 'append=' directive in the LILO configuration - file. - - However, to use the 'root=' directive you also need to create - a dummy root device, which may be removed after LILO is run. - - mknod /dev/boot255 c 0 255 - - For information on configuring LILO, please refer to its documentation. - -3.3) Using GRUB - When using GRUB, kernel parameter are simply appended after the kernel - specification: kernel - -3.4) Using loadlin - loadlin may be used to boot Linux from a DOS command prompt without - requiring a local hard disk to mount as root. This has not been - thoroughly tested by the authors of this document, but in general - it should be possible configure the kernel command line similarly - to the configuration of LILO. - - Please refer to the loadlin documentation for further information. - -3.5) Using a boot ROM - This is probably the most elegant way of booting a diskless client. - With a boot ROM the kernel is loaded using the TFTP protocol. The - authors of this document are not aware of any no commercial boot - ROMs that support booting Linux over the network. However, there - are two free implementations of a boot ROM, netboot-nfs and - etherboot, both of which are available on sunsite.unc.edu, and both - of which contain everything you need to boot a diskless Linux client. - -3.6) Using pxelinux - Pxelinux may be used to boot linux using the PXE boot loader - which is present on many modern network cards. - - When using pxelinux, the kernel image is specified using - "kernel ". The nfsroot parameters - are passed to the kernel by adding them to the "append" line. - It is common to use serial console in conjunction with pxeliunx, - see Documentation/admin-guide/serial-console.rst for more information. - - For more information on isolinux, including how to create bootdisks - for prebuilt kernels, see http://syslinux.zytor.com/ - - - - -4.) Credits - ------- - - The nfsroot code in the kernel and the RARP support have been written - by Gero Kuhlmann . - - The rest of the IP layer autoconfiguration code has been written - by Martin Mares . - - In order to write the initial version of nfsroot I would like to thank - Jens-Uwe Mager for his help. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/nfs/pnfs-block-server.txt b/Documentation/filesystems/nfs/pnfs-block-server.txt --- a/Documentation/filesystems/nfs/pnfs-block-server.txt +++ b/Documentation/filesystems/nfs/pnfs-block-server.txt @@ -1,37 +0,0 @@ -pNFS block layout server user guide - -The Linux NFS server now supports the pNFS block layout extension. In this -case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition -to handling all the metadata access to the NFS export also hands out layouts -to the clients to directly access the underlying block devices that are -shared with the client. - -To use pNFS block layouts with with the Linux NFS server the exported file -system needs to support the pNFS block layouts (currently just XFS), and the -file system must sit on shared storage (typically iSCSI) that is accessible -to the clients in addition to the MDS. As of now the file system needs to -sit directly on the exported volume, striping or concatenation of -volumes on the MDS and clients is not supported yet. - -On the server, pNFS block volume support is automatically if the file system -support it. On the client make sure the kernel has the CONFIG_PNFS_BLOCK -option enabled, the blkmapd daemon from nfs-utils is running, and the -file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1). - -If the nfsd server needs to fence a non-responding client it calls -/sbin/nfsd-recall-failed with the first argument set to the IP address of -the client, and the second argument set to the device node without the /dev -prefix for the file system to be fenced. Below is an example file that shows -how to translate the device into a serial number from SCSI EVPD 0x80: - -cat > /sbin/nfsd-recall-failed << EOF -#!/bin/sh - -CLIENT="$1" -DEV="/dev/$2" -EVPD=`sg_inq --page=0x80 ${DEV} | \ - grep "Unit serial number:" | \ - awk -F ': ' '{print $2}'` - -echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log -EOF diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/nfs/pnfs-scsi-server.txt b/Documentation/filesystems/nfs/pnfs-scsi-server.txt --- a/Documentation/filesystems/nfs/pnfs-scsi-server.txt +++ b/Documentation/filesystems/nfs/pnfs-scsi-server.txt @@ -1,23 +0,0 @@ - -pNFS SCSI layout server user guide -================================== - -This document describes support for pNFS SCSI layouts in the Linux NFS server. -With pNFS SCSI layouts, the NFS server acts as Metadata Server (MDS) for pNFS, -which in addition to handling all the metadata access to the NFS export, -also hands out layouts to the clients so that they can directly access the -underlying SCSI LUNs that are shared with the client. - -To use pNFS SCSI layouts with with the Linux NFS server, the exported file -system needs to support the pNFS SCSI layouts (currently just XFS), and the -file system must sit on a SCSI LUN that is accessible to the clients in -addition to the MDS. As of now the file system needs to sit directly on the -exported LUN, striping or concatenation of LUNs on the MDS and clients -is not supported yet. - -On a server built with CONFIG_NFSD_SCSI, the pNFS SCSI volume support is -automatically enabled if the file system is exported using the "pnfs" -option and the underlying SCSI device support persistent reservations. -On the client make sure the kernel has the CONFIG_PNFS_BLOCK option -enabled, and the file system is mounted using the NFSv4.1 protocol -version (mount -o vers=4.1). diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst --- a/Documentation/filesystems/path-lookup.rst +++ b/Documentation/filesystems/path-lookup.rst @@ -13,6 +13,7 @@ including: - per-directory parallel name lookup. +- ``openat2()`` resolution restriction flags. Introduction to pathname lookup =============================== @@ -235,6 +236,13 @@ unsuccessfully scanned a chain in the hash table, it simply tries again. +``rename_lock`` is also used to detect and defend against potential attacks +against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where +the parent directory is moved outside the root, bypassing the ``path_equal()`` +check). If ``rename_lock`` is updated during the lookup and the path encounters +a "..", a potential attack occurred and ``handle_dots()`` will bail out with +``-EAGAIN``. + inode->i_rwsem ~~~~~~~~~~~~~~ @@ -348,6 +356,13 @@ needed to stabilize the link to the mounted-on dentry, which the refcount on the mount itself doesn't ensure. +``mount_lock`` is also used to detect and defend against potential attacks +against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where +the parent directory is moved outside the root, bypassing the ``path_equal()`` +check). If ``mount_lock`` is updated during the lookup and the path encounters +a "..", a potential attack occurred and ``handle_dots()`` will bail out with +``-EAGAIN``. + RCU ~~~ @@ -405,6 +420,10 @@ only one root is in effect for the entire path walk, even if it races with a ``chroot()`` system call. +It should be noted that in the case of ``LOOKUP_IN_ROOT`` or +``LOOKUP_BENEATH``, the effective root becomes the directory file descriptor +passed to ``openat2()`` (which exposes these ``LOOKUP_`` flags). + The root is needed when either of two conditions holds: (1) either the pathname or a symbolic link starts with a "'/'", or (2) a "``..``" component is being handled, since "``..``" from the root must always stay @@ -1149,7 +1168,7 @@ the stack frame discarded. The other case involves things in ``/proc`` that look like symlinks but -aren't really:: +aren't really (and are therefore commonly referred to as "magic-links"):: $ ls -l /proc/self/fd/1 lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4 @@ -1286,7 +1305,9 @@ A suitable way to wrap up this tour of pathname walking is to list the various flags that can be stored in the ``nameidata`` to guide the lookup process. Many of these are only meaningful on the final -component, others reflect the current state of the pathname lookup. +component, others reflect the current state of the pathname lookup, and some +apply restrictions to all path components encountered in the path lookup. + And then there is ``LOOKUP_EMPTY``, which doesn't fit conceptually with the others. If this is not set, an empty pathname causes an error very early on. If it is set, empty pathnames are not considered to be @@ -1310,13 +1331,48 @@ ``LOOKUP_JUMPED`` means that the current dentry was chosen not because it had the right name but for some other reason. This happens when following "``..``", following a symlink to ``/``, crossing a mount point -or accessing a "``/proc/$PID/fd/$FD``" symlink. In this case the -filesystem has not been asked to revalidate the name (with -``d_revalidate()``). In such cases the inode may still need to be -revalidated, so ``d_op->d_weak_revalidate()`` is called if +or accessing a "``/proc/$PID/fd/$FD``" symlink (also known as a "magic +link"). In this case the filesystem has not been asked to revalidate the +name (with ``d_revalidate()``). In such cases the inode may still need +to be revalidated, so ``d_op->d_weak_revalidate()`` is called if ``LOOKUP_JUMPED`` is set when the look completes - which may be at the final component or, when creating, unlinking, or renaming, at the penultimate component. +Resolution-restriction flags +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to allow userspace to protect itself against certain race conditions +and attack scenarios involving changing path components, a series of flags are +available which apply restrictions to all path components encountered during +path lookup. These flags are exposed through ``openat2()``'s ``resolve`` field. + +``LOOKUP_NO_SYMLINKS`` blocks all symlink traversals (including magic-links). +This is distinctly different from ``LOOKUP_FOLLOW``, because the latter only +relates to restricting the following of trailing symlinks. + +``LOOKUP_NO_MAGICLINKS`` blocks all magic-link traversals. Filesystems must +ensure that they return errors from ``nd_jump_link()``, because that is how +``LOOKUP_NO_MAGICLINKS`` and other magic-link restrictions are implemented. + +``LOOKUP_NO_XDEV`` blocks all ``vfsmount`` traversals (this includes both +bind-mounts and ordinary mounts). Note that the ``vfsmount`` which contains the +lookup is determined by the first mountpoint the path lookup reaches -- +absolute paths start with the ``vfsmount`` of ``/``, and relative paths start +with the ``dfd``'s ``vfsmount``. Magic-links are only permitted if the +``vfsmount`` of the path is unchanged. + +``LOOKUP_BENEATH`` blocks any path components which resolve outside the +starting point of the resolution. This is done by blocking ``nd_jump_root()`` +as well as blocking ".." if it would jump outside the starting point. +``rename_lock`` and ``mount_lock`` are used to detect attacks against the +resolution of "..". Magic-links are also blocked. + +``LOOKUP_IN_ROOT`` resolves all path components as though the starting point +were the filesystem root. ``nd_jump_root()`` brings the resolution back to to +the starting point, and ".." at the starting point will act as a no-op. As with +``LOOKUP_BENEATH``, ``rename_lock`` and ``mount_lock`` are used to detect +attacks against ".." resolution. Magic-links are also blocked. + Final-component flags ~~~~~~~~~~~~~~~~~~~~~ diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/vfat.rst b/Documentation/filesystems/vfat.rst --- a/Documentation/filesystems/vfat.rst +++ b/Documentation/filesystems/vfat.rst @@ -0,0 +1,387 @@ +==== +VFAT +==== + +USING VFAT +========== + +To use the vfat filesystem, use the filesystem type 'vfat'. i.e.:: + + mount -t vfat /dev/fd0 /mnt + + +No special partition formatter is required, +'mkdosfs' will work fine if you want to format from within Linux. + +VFAT MOUNT OPTIONS +================== + +**uid=###** + Set the owner of all files on this filesystem. + The default is the uid of current process. + +**gid=###** + Set the group of all files on this filesystem. + The default is the gid of current process. + +**umask=###** + The permission mask (for files and directories, see *umask(1)*). + The default is the umask of current process. + +**dmask=###** + The permission mask for the directory. + The default is the umask of current process. + +**fmask=###** + The permission mask for files. + The default is the umask of current process. + +**allow_utime=###** + This option controls the permission check of mtime/atime. + + **-20**: If current process is in group of file's group ID, + you can change timestamp. + + **-2**: Other users can change timestamp. + + The default is set from dmask option. If the directory is + writable, utime(2) is also allowed. i.e. ~dmask & 022. + + Normally utime(2) checks current process is owner of + the file, or it has CAP_FOWNER capability. But FAT + filesystem doesn't have uid/gid on disk, so normal + check is too unflexible. With this option you can + relax it. + +**codepage=###** + Sets the codepage number for converting to shortname + characters on FAT filesystem. + By default, FAT_DEFAULT_CODEPAGE setting is used. + +**iocharset=** + Character set to use for converting between the + encoding is used for user visible filename and 16 bit + Unicode characters. Long filenames are stored on disk + in Unicode format, but Unix for the most part doesn't + know how to deal with Unicode. + By default, FAT_DEFAULT_IOCHARSET setting is used. + + There is also an option of doing UTF-8 translations + with the utf8 option. + +.. note:: ``iocharset=utf8`` is not recommended. If unsure, you should consider + the utf8 option instead. + +**utf8=** + UTF-8 is the filesystem safe version of Unicode that + is used by the console. It can be enabled or disabled + for the filesystem with this option. + If 'uni_xlate' gets set, UTF-8 gets disabled. + By default, FAT_DEFAULT_UTF8 setting is used. + +**uni_xlate=** + Translate unhandled Unicode characters to special + escaped sequences. This would let you backup and + restore filenames that are created with any Unicode + characters. Until Linux supports Unicode for real, + this gives you an alternative. Without this option, + a '?' is used when no translation is possible. The + escape character is ':' because it is otherwise + illegal on the vfat filesystem. The escape sequence + that gets used is ':' and the four digits of hexadecimal + unicode. + +**nonumtail=** + When creating 8.3 aliases, normally the alias will + end in '~1' or tilde followed by some number. If this + option is set, then if the filename is + "longfilename.txt" and "longfile.txt" does not + currently exist in the directory, longfile.txt will + be the short alias instead of longfi~1.txt. + +**usefree** + Use the "free clusters" value stored on FSINFO. It will + be used to determine number of free clusters without + scanning disk. But it's not used by default, because + recent Windows don't update it correctly in some + case. If you are sure the "free clusters" on FSINFO is + correct, by this option you can avoid scanning disk. + +**quiet** + Stops printing certain warning messages. + +**check=s|r|n** + Case sensitivity checking setting. + + **s**: strict, case sensitive + + **r**: relaxed, case insensitive + + **n**: normal, default setting, currently case insensitive + +**nocase** + This was deprecated for vfat. Use ``shortname=win95`` instead. + +**shortname=lower|win95|winnt|mixed** + Shortname display/create setting. + + **lower**: convert to lowercase for display, + emulate the Windows 95 rule for create. + + **win95**: emulate the Windows 95 rule for display/create. + + **winnt**: emulate the Windows NT rule for display/create. + + **mixed**: emulate the Windows NT rule for display, + emulate the Windows 95 rule for create. + + Default setting is `mixed`. + +**tz=UTC** + Interpret timestamps as UTC rather than local time. + This option disables the conversion of timestamps + between local time (as used by Windows on FAT) and UTC + (which Linux uses internally). This is particularly + useful when mounting devices (like digital cameras) + that are set to UTC in order to avoid the pitfalls of + local time. + +**time_offset=minutes** + Set offset for conversion of timestamps from local time + used by FAT to UTC. I.e. minutes will be subtracted + from each timestamp to convert it to UTC used internally by + Linux. This is useful when time zone set in ``sys_tz`` is + not the time zone used by the filesystem. Note that this + option still does not provide correct time stamps in all + cases in presence of DST - time stamps in a different DST + setting will be off by one hour. + +**showexec** + If set, the execute permission bits of the file will be + allowed only if the extension part of the name is .EXE, + .COM, or .BAT. Not set by default. + +**debug** + Can be set, but unused by the current implementation. + +**sys_immutable** + If set, ATTR_SYS attribute on FAT is handled as + IMMUTABLE flag on Linux. Not set by default. + +**flush** + If set, the filesystem will try to flush to disk more + early than normal. Not set by default. + +**rodir** + FAT has the ATTR_RO (read-only) attribute. On Windows, + the ATTR_RO of the directory will just be ignored, + and is used only by applications as a flag (e.g. it's set + for the customized folder). + + If you want to use ATTR_RO as read-only flag even for + the directory, set this option. + +**errors=panic|continue|remount-ro** + specify FAT behavior on critical errors: panic, continue + without doing anything or remount the partition in + read-only mode (default behavior). + +**discard** + If set, issues discard/TRIM commands to the block + device when blocks are freed. This is useful for SSD devices + and sparse/thinly-provisoned LUNs. + +**nfs=stale_rw|nostale_ro** + Enable this only if you want to export the FAT filesystem + over NFS. + + **stale_rw**: This option maintains an index (cache) of directory + *inodes* by *i_logstart* which is used by the nfs-related code to + improve look-ups. Full file operations (read/write) over NFS is + supported but with cache eviction at NFS server, this could + result in ESTALE issues. + + **nostale_ro**: This option bases the *inode* number and filehandle + on the on-disk location of a file in the MS-DOS directory entry. + This ensures that ESTALE will not be returned after a file is + evicted from the inode cache. However, it means that operations + such as rename, create and unlink could cause filehandles that + previously pointed at one file to point at a different file, + potentially causing data corruption. For this reason, this + option also mounts the filesystem readonly. + + To maintain backward compatibility, ``'-o nfs'`` is also accepted, + defaulting to "stale_rw". + +**dos1xfloppy : 0,1,yes,no,true,false** + If set, use a fallback default BIOS Parameter Block + configuration, determined by backing device size. These static + parameters match defaults assumed by DOS 1.x for 160 kiB, + 180 kiB, 320 kiB, and 360 kiB floppies and floppy images. + + + +LIMITATION +========== + +The fallocated region of file is discarded at umount/evict time +when using fallocate with FALLOC_FL_KEEP_SIZE. +So, User should assume that fallocated region can be discarded at +last close if there is memory pressure resulting in eviction of +the inode from the memory. As a result, for any dependency on +the fallocated region, user should make sure to recheck fallocate +after reopening the file. + +TODO +==== +Need to get rid of the raw scanning stuff. Instead, always use +a get next directory entry approach. The only thing left that uses +raw scanning is the directory renaming code. + + +POSSIBLE PROBLEMS +================= + +- vfat_valid_longname does not properly checked reserved names. +- When a volume name is the same as a directory name in the root + directory of the filesystem, the directory name sometimes shows + up as an empty file. +- autoconv option does not work correctly. + + +TEST SUITE +========== +If you plan to make any modifications to the vfat filesystem, please +get the test suite that comes with the vfat distribution at + +``_ + +This tests quite a few parts of the vfat filesystem and additional +tests for new features or untested features would be appreciated. + +NOTES ON THE STRUCTURE OF THE VFAT FILESYSTEM +============================================= +This documentation was provided by Galen C. Hunt gchunt@cs.rochester.edu and +lightly annotated by Gordon Chaffee. + +This document presents a very rough, technical overview of my +knowledge of the extended FAT file system used in Windows NT 3.5 and +Windows 95. I don't guarantee that any of the following is correct, +but it appears to be so. + +The extended FAT file system is almost identical to the FAT +file system used in DOS versions up to and including *6.223410239847* +:-). The significant change has been the addition of long file names. +These names support up to 255 characters including spaces and lower +case characters as opposed to the traditional 8.3 short names. + +Here is the description of the traditional FAT entry in the current +Windows 95 filesystem:: + + struct directory { // Short 8.3 names + unsigned char name[8]; // file name + unsigned char ext[3]; // file extension + unsigned char attr; // attribute byte + unsigned char lcase; // Case for base and extension + unsigned char ctime_ms; // Creation time, milliseconds + unsigned char ctime[2]; // Creation time + unsigned char cdate[2]; // Creation date + unsigned char adate[2]; // Last access date + unsigned char reserved[2]; // reserved values (ignored) + unsigned char time[2]; // time stamp + unsigned char date[2]; // date stamp + unsigned char start[2]; // starting cluster number + unsigned char size[4]; // size of the file + }; + + +The lcase field specifies if the base and/or the extension of an 8.3 +name should be capitalized. This field does not seem to be used by +Windows 95 but it is used by Windows NT. The case of filenames is not +completely compatible from Windows NT to Windows 95. It is not completely +compatible in the reverse direction, however. Filenames that fit in +the 8.3 namespace and are written on Windows NT to be lowercase will +show up as uppercase on Windows 95. + +.. note:: Note that the ``start`` and ``size`` values are actually little + endian integer values. The descriptions of the fields in this + structure are public knowledge and can be found elsewhere. + +With the extended FAT system, Microsoft has inserted extra +directory entries for any files with extended names. (Any name which +legally fits within the old 8.3 encoding scheme does not have extra +entries.) I call these extra entries slots. Basically, a slot is a +specially formatted directory entry which holds up to 13 characters of +a file's extended name. Think of slots as additional labeling for the +directory entry of the file to which they correspond. Microsoft +prefers to refer to the 8.3 entry for a file as its alias and the +extended slot directory entries as the file name. + +The C structure for a slot directory entry follows:: + + struct slot { // Up to 13 characters of a long name + unsigned char id; // sequence number for slot + unsigned char name0_4[10]; // first 5 characters in name + unsigned char attr; // attribute byte + unsigned char reserved; // always 0 + unsigned char alias_checksum; // checksum for 8.3 alias + unsigned char name5_10[12]; // 6 more characters in name + unsigned char start[2]; // starting cluster number + unsigned char name11_12[4]; // last 2 characters in name + }; + + +If the layout of the slots looks a little odd, it's only +because of Microsoft's efforts to maintain compatibility with old +software. The slots must be disguised to prevent old software from +panicking. To this end, a number of measures are taken: + + 1) The attribute byte for a slot directory entry is always set + to 0x0f. This corresponds to an old directory entry with + attributes of "hidden", "system", "read-only", and "volume + label". Most old software will ignore any directory + entries with the "volume label" bit set. Real volume label + entries don't have the other three bits set. + + 2) The starting cluster is always set to 0, an impossible + value for a DOS file. + +Because the extended FAT system is backward compatible, it is +possible for old software to modify directory entries. Measures must +be taken to ensure the validity of slots. An extended FAT system can +verify that a slot does in fact belong to an 8.3 directory entry by +the following: + + 1) Positioning. Slots for a file always immediately proceed + their corresponding 8.3 directory entry. In addition, each + slot has an id which marks its order in the extended file + name. Here is a very abbreviated view of an 8.3 directory + entry and its corresponding long name slots for the file + "My Big File.Extension which is long":: + + + + + + + + + .. note:: Note that the slots are stored from last to first. Slots + are numbered from 1 to N. The Nth slot is ``or'ed`` with + 0x40 to mark it as the last one. + + 2) Checksum. Each slot has an alias_checksum value. The + checksum is calculated from the 8.3 name using the + following algorithm:: + + for (sum = i = 0; i < 11; i++) { + sum = (((sum&1)<<7)|((sum&0xfe)>>1)) + name[i] + } + + + 3) If there is free space in the final slot, a Unicode ``NULL (0x0000)`` + is stored after the final character. After that, all unused + characters in the final slot are set to Unicode 0xFFFF. + +Finally, note that the extended name is stored in Unicode. Each Unicode +character takes either two or four bytes, UTF-16LE encoded. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt --- a/Documentation/filesystems/vfat.txt +++ b/Documentation/filesystems/vfat.txt @@ -1,347 +0,0 @@ -USING VFAT ----------------------------------------------------------------------- -To use the vfat filesystem, use the filesystem type 'vfat'. i.e. - mount -t vfat /dev/fd0 /mnt - -No special partition formatter is required. mkdosfs will work fine -if you want to format from within Linux. - -VFAT MOUNT OPTIONS ----------------------------------------------------------------------- -uid=### -- Set the owner of all files on this filesystem. - The default is the uid of current process. - -gid=### -- Set the group of all files on this filesystem. - The default is the gid of current process. - -umask=### -- The permission mask (for files and directories, see umask(1)). - The default is the umask of current process. - -dmask=### -- The permission mask for the directory. - The default is the umask of current process. - -fmask=### -- The permission mask for files. - The default is the umask of current process. - -allow_utime=### -- This option controls the permission check of mtime/atime. - - 20 - If current process is in group of file's group ID, - you can change timestamp. - 2 - Other users can change timestamp. - - The default is set from `dmask' option. (If the directory is - writable, utime(2) is also allowed. I.e. ~dmask & 022) - - Normally utime(2) checks current process is owner of - the file, or it has CAP_FOWNER capability. But FAT - filesystem doesn't have uid/gid on disk, so normal - check is too unflexible. With this option you can - relax it. - -codepage=### -- Sets the codepage number for converting to shortname - characters on FAT filesystem. - By default, FAT_DEFAULT_CODEPAGE setting is used. - -iocharset= -- Character set to use for converting between the - encoding is used for user visible filename and 16 bit - Unicode characters. Long filenames are stored on disk - in Unicode format, but Unix for the most part doesn't - know how to deal with Unicode. - By default, FAT_DEFAULT_IOCHARSET setting is used. - - There is also an option of doing UTF-8 translations - with the utf8 option. - - NOTE: "iocharset=utf8" is not recommended. If unsure, - you should consider the following option instead. - -utf8= -- UTF-8 is the filesystem safe version of Unicode that - is used by the console. It can be enabled or disabled - for the filesystem with this option. - If 'uni_xlate' gets set, UTF-8 gets disabled. - By default, FAT_DEFAULT_UTF8 setting is used. - -uni_xlate= -- Translate unhandled Unicode characters to special - escaped sequences. This would let you backup and - restore filenames that are created with any Unicode - characters. Until Linux supports Unicode for real, - this gives you an alternative. Without this option, - a '?' is used when no translation is possible. The - escape character is ':' because it is otherwise - illegal on the vfat filesystem. The escape sequence - that gets used is ':' and the four digits of hexadecimal - unicode. - -nonumtail= -- When creating 8.3 aliases, normally the alias will - end in '~1' or tilde followed by some number. If this - option is set, then if the filename is - "longfilename.txt" and "longfile.txt" does not - currently exist in the directory, 'longfile.txt' will - be the short alias instead of 'longfi~1.txt'. - -usefree -- Use the "free clusters" value stored on FSINFO. It'll - be used to determine number of free clusters without - scanning disk. But it's not used by default, because - recent Windows don't update it correctly in some - case. If you are sure the "free clusters" on FSINFO is - correct, by this option you can avoid scanning disk. - -quiet -- Stops printing certain warning messages. - -check=s|r|n -- Case sensitivity checking setting. - s: strict, case sensitive - r: relaxed, case insensitive - n: normal, default setting, currently case insensitive - -nocase -- This was deprecated for vfat. Use shortname=win95 instead. - -shortname=lower|win95|winnt|mixed - -- Shortname display/create setting. - lower: convert to lowercase for display, - emulate the Windows 95 rule for create. - win95: emulate the Windows 95 rule for display/create. - winnt: emulate the Windows NT rule for display/create. - mixed: emulate the Windows NT rule for display, - emulate the Windows 95 rule for create. - Default setting is `mixed'. - -tz=UTC -- Interpret timestamps as UTC rather than local time. - This option disables the conversion of timestamps - between local time (as used by Windows on FAT) and UTC - (which Linux uses internally). This is particularly - useful when mounting devices (like digital cameras) - that are set to UTC in order to avoid the pitfalls of - local time. -time_offset=minutes - -- Set offset for conversion of timestamps from local time - used by FAT to UTC. I.e. minutes will be subtracted - from each timestamp to convert it to UTC used internally by - Linux. This is useful when time zone set in sys_tz is - not the time zone used by the filesystem. Note that this - option still does not provide correct time stamps in all - cases in presence of DST - time stamps in a different DST - setting will be off by one hour. - -showexec -- If set, the execute permission bits of the file will be - allowed only if the extension part of the name is .EXE, - .COM, or .BAT. Not set by default. - -debug -- Can be set, but unused by the current implementation. - -sys_immutable -- If set, ATTR_SYS attribute on FAT is handled as - IMMUTABLE flag on Linux. Not set by default. - -flush -- If set, the filesystem will try to flush to disk more - early than normal. Not set by default. - -rodir -- FAT has the ATTR_RO (read-only) attribute. On Windows, - the ATTR_RO of the directory will just be ignored, - and is used only by applications as a flag (e.g. it's set - for the customized folder). - - If you want to use ATTR_RO as read-only flag even for - the directory, set this option. - -errors=panic|continue|remount-ro - -- specify FAT behavior on critical errors: panic, continue - without doing anything or remount the partition in - read-only mode (default behavior). - -discard -- If set, issues discard/TRIM commands to the block - device when blocks are freed. This is useful for SSD devices - and sparse/thinly-provisoned LUNs. - -nfs=stale_rw|nostale_ro - Enable this only if you want to export the FAT filesystem - over NFS. - - stale_rw: This option maintains an index (cache) of directory - inodes by i_logstart which is used by the nfs-related code to - improve look-ups. Full file operations (read/write) over NFS is - supported but with cache eviction at NFS server, this could - result in ESTALE issues. - - nostale_ro: This option bases the inode number and filehandle - on the on-disk location of a file in the MS-DOS directory entry. - This ensures that ESTALE will not be returned after a file is - evicted from the inode cache. However, it means that operations - such as rename, create and unlink could cause filehandles that - previously pointed at one file to point at a different file, - potentially causing data corruption. For this reason, this - option also mounts the filesystem readonly. - - To maintain backward compatibility, '-o nfs' is also accepted, - defaulting to stale_rw - -dos1xfloppy -- If set, use a fallback default BIOS Parameter Block - configuration, determined by backing device size. These static - parameters match defaults assumed by DOS 1.x for 160 kiB, - 180 kiB, 320 kiB, and 360 kiB floppies and floppy images. - - -: 0,1,yes,no,true,false - -LIMITATION ---------------------------------------------------------------------- -* The fallocated region of file is discarded at umount/evict time - when using fallocate with FALLOC_FL_KEEP_SIZE. - So, User should assume that fallocated region can be discarded at - last close if there is memory pressure resulting in eviction of - the inode from the memory. As a result, for any dependency on - the fallocated region, user should make sure to recheck fallocate - after reopening the file. - -TODO ----------------------------------------------------------------------- -* Need to get rid of the raw scanning stuff. Instead, always use - a get next directory entry approach. The only thing left that uses - raw scanning is the directory renaming code. - - -POSSIBLE PROBLEMS ----------------------------------------------------------------------- -* vfat_valid_longname does not properly checked reserved names. -* When a volume name is the same as a directory name in the root - directory of the filesystem, the directory name sometimes shows - up as an empty file. -* autoconv option does not work correctly. - -BUG REPORTS ----------------------------------------------------------------------- -If you have trouble with the VFAT filesystem, mail bug reports to -chaffee@bmrc.cs.berkeley.edu. Please specify the filename -and the operation that gave you trouble. - -TEST SUITE ----------------------------------------------------------------------- -If you plan to make any modifications to the vfat filesystem, please -get the test suite that comes with the vfat distribution at - - http://web.archive.org/web/*/http://bmrc.berkeley.edu/ - people/chaffee/vfat.html - -This tests quite a few parts of the vfat filesystem and additional -tests for new features or untested features would be appreciated. - -NOTES ON THE STRUCTURE OF THE VFAT FILESYSTEM ----------------------------------------------------------------------- -(This documentation was provided by Galen C. Hunt - and lightly annotated by Gordon Chaffee). - -This document presents a very rough, technical overview of my -knowledge of the extended FAT file system used in Windows NT 3.5 and -Windows 95. I don't guarantee that any of the following is correct, -but it appears to be so. - -The extended FAT file system is almost identical to the FAT -file system used in DOS versions up to and including 6.223410239847 -:-). The significant change has been the addition of long file names. -These names support up to 255 characters including spaces and lower -case characters as opposed to the traditional 8.3 short names. - -Here is the description of the traditional FAT entry in the current -Windows 95 filesystem: - - struct directory { // Short 8.3 names - unsigned char name[8]; // file name - unsigned char ext[3]; // file extension - unsigned char attr; // attribute byte - unsigned char lcase; // Case for base and extension - unsigned char ctime_ms; // Creation time, milliseconds - unsigned char ctime[2]; // Creation time - unsigned char cdate[2]; // Creation date - unsigned char adate[2]; // Last access date - unsigned char reserved[2]; // reserved values (ignored) - unsigned char time[2]; // time stamp - unsigned char date[2]; // date stamp - unsigned char start[2]; // starting cluster number - unsigned char size[4]; // size of the file - }; - -The lcase field specifies if the base and/or the extension of an 8.3 -name should be capitalized. This field does not seem to be used by -Windows 95 but it is used by Windows NT. The case of filenames is not -completely compatible from Windows NT to Windows 95. It is not completely -compatible in the reverse direction, however. Filenames that fit in -the 8.3 namespace and are written on Windows NT to be lowercase will -show up as uppercase on Windows 95. - -Note that the "start" and "size" values are actually little -endian integer values. The descriptions of the fields in this -structure are public knowledge and can be found elsewhere. - -With the extended FAT system, Microsoft has inserted extra -directory entries for any files with extended names. (Any name which -legally fits within the old 8.3 encoding scheme does not have extra -entries.) I call these extra entries slots. Basically, a slot is a -specially formatted directory entry which holds up to 13 characters of -a file's extended name. Think of slots as additional labeling for the -directory entry of the file to which they correspond. Microsoft -prefers to refer to the 8.3 entry for a file as its alias and the -extended slot directory entries as the file name. - -The C structure for a slot directory entry follows: - - struct slot { // Up to 13 characters of a long name - unsigned char id; // sequence number for slot - unsigned char name0_4[10]; // first 5 characters in name - unsigned char attr; // attribute byte - unsigned char reserved; // always 0 - unsigned char alias_checksum; // checksum for 8.3 alias - unsigned char name5_10[12]; // 6 more characters in name - unsigned char start[2]; // starting cluster number - unsigned char name11_12[4]; // last 2 characters in name - }; - -If the layout of the slots looks a little odd, it's only -because of Microsoft's efforts to maintain compatibility with old -software. The slots must be disguised to prevent old software from -panicking. To this end, a number of measures are taken: - - 1) The attribute byte for a slot directory entry is always set - to 0x0f. This corresponds to an old directory entry with - attributes of "hidden", "system", "read-only", and "volume - label". Most old software will ignore any directory - entries with the "volume label" bit set. Real volume label - entries don't have the other three bits set. - - 2) The starting cluster is always set to 0, an impossible - value for a DOS file. - -Because the extended FAT system is backward compatible, it is -possible for old software to modify directory entries. Measures must -be taken to ensure the validity of slots. An extended FAT system can -verify that a slot does in fact belong to an 8.3 directory entry by -the following: - - 1) Positioning. Slots for a file always immediately proceed - their corresponding 8.3 directory entry. In addition, each - slot has an id which marks its order in the extended file - name. Here is a very abbreviated view of an 8.3 directory - entry and its corresponding long name slots for the file - "My Big File.Extension which is long": - - - - - - - - Note that the slots are stored from last to first. Slots - are numbered from 1 to N. The Nth slot is or'ed with 0x40 - to mark it as the last one. - - 2) Checksum. Each slot has an "alias_checksum" value. The - checksum is calculated from the 8.3 name using the - following algorithm: - - for (sum = i = 0; i < 11; i++) { - sum = (((sum&1)<<7)|((sum&0xfe)>>1)) + name[i] - } - - 3) If there is free space in the final slot, a Unicode NULL (0x0000) - is stored after the final character. After that, all unused - characters in the final slot are set to Unicode 0xFFFF. - -Finally, note that the extended name is stored in Unicode. Each Unicode -character takes either two or four bytes, UTF-16LE encoded. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/firmware-guide/acpi/enumeration.rst b/Documentation/firmware-guide/acpi/enumeration.rst --- a/Documentation/firmware-guide/acpi/enumeration.rst +++ b/Documentation/firmware-guide/acpi/enumeration.rst @@ -71,8 +71,8 @@ DMA controllers enumerated via ACPI should be registered in the system to provide generic access to their resources. For example, a driver that would like to be accessible to slave devices via generic API call -dma_request_slave_channel() must register itself at the end of the probe -function like this:: +dma_request_chan() must register itself at the end of the probe function like +this:: err = devm_acpi_dma_controller_register(dev, xlate_func, dw); /* Handle the error if it's not a case of !CONFIG_ACPI */ @@ -112,15 +112,15 @@ } #endif -dma_request_slave_channel() will call xlate_func() for each registered DMA -controller. In the xlate function the proper channel must be chosen based on +dma_request_chan() will call xlate_func() for each registered DMA controller. +In the xlate function the proper channel must be chosen based on information in struct acpi_dma_spec and the properties of the controller provided by struct acpi_dma. -Clients must call dma_request_slave_channel() with the string parameter that -corresponds to a specific FixedDMA resource. By default "tx" means the first -entry of the FixedDMA resource array, "rx" means the second entry. The table -below shows a layout:: +Clients must call dma_request_chan() with the string parameter that corresponds +to a specific FixedDMA resource. By default "tx" means the first entry of the +FixedDMA resource array, "rx" means the second entry. The table below shows a +layout:: Device (I2C0) { diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/hwmon/adm1177.rst b/Documentation/hwmon/adm1177.rst --- a/Documentation/hwmon/adm1177.rst +++ b/Documentation/hwmon/adm1177.rst @@ -0,0 +1,36 @@ +Kernel driver adm1177 +===================== + +Supported chips: + * Analog Devices ADM1177 + Prefix: 'adm1177' + Datasheet: https://www.analog.com/media/en/technical-documentation/data-sheets/ADM1177.pdf + +Author: Beniamin Bia + + +Description +----------- + +This driver supports hardware monitoring for Analog Devices ADM1177 +Hot-Swap Controller and Digital Power Monitors with Soft Start Pin. + + +Usage Notes +----------- + +This driver does not auto-detect devices. You will have to instantiate the +devices explicitly. Please see Documentation/i2c/instantiating-devices for +details. + + +Sysfs entries +------------- + +The following attributes are supported. Current maxim attribute +is read-write, all other attributes are read-only. + +in0_input Measured voltage in microvolts. + +curr1_input Measured current in microamperes. +curr1_max_alarm Overcurrent alarm in microamperes. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/hwmon/drivetemp.rst b/Documentation/hwmon/drivetemp.rst --- a/Documentation/hwmon/drivetemp.rst +++ b/Documentation/hwmon/drivetemp.rst @@ -0,0 +1,52 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Kernel driver drivetemp +======================= + + +References +---------- + +ANS T13/1699-D +Information technology - AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS) + +ANS Project T10/BSR INCITS 513 +Information technology - SCSI Primary Commands - 4 (SPC-4) + +ANS Project INCITS 557 +Information technology - SCSI / ATA Translation - 5 (SAT-5) + + +Description +----------- + +This driver supports reporting the temperature of disk and solid state +drives with temperature sensors. + +If supported, it uses the ATA SCT Command Transport feature to read +the current drive temperature and, if available, temperature limits +as well as historic minimum and maximum temperatures. If SCT Command +Transport is not supported, the driver uses SMART attributes to read +the drive temperature. + + +Sysfs entries +------------- + +Only the temp1_input attribute is always available. Other attributes are +available only if reported by the drive. All temperatures are reported in +milli-degrees Celsius. + +======================= ===================================================== +temp1_input Current drive temperature +temp1_lcrit Minimum temperature limit. Operating the device below + this temperature may cause physical damage to the + device. +temp1_min Minimum recommended continuous operating limit +temp1_max Maximum recommended continuous operating temperature +temp1_crit Maximum temperature limit. Operating the device above + this temperature may cause physical damage to the + device. +temp1_lowest Minimum temperature seen this power cycle +temp1_highest Maximum temperature seen this power cycle +======================= ===================================================== diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst --- a/Documentation/hwmon/index.rst +++ b/Documentation/hwmon/index.rst @@ -29,6 +29,7 @@ adm1025 adm1026 adm1031 + adm1177 adm1275 adm9240 ads7828 @@ -47,6 +48,7 @@ da9055 dell-smm-hwmon dme1737 + drivetemp ds1621 ds620 emc1403 @@ -106,8 +108,10 @@ max1619 max1668 max197 + max20730 max20751 max31722 + max31730 max31785 max31790 max34440 @@ -177,6 +181,7 @@ wm831x wm8350 xgene-hwmon + xdpe12284 zl6100 .. only:: subproject and html diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/hwmon/max20730.rst b/Documentation/hwmon/max20730.rst --- a/Documentation/hwmon/max20730.rst +++ b/Documentation/hwmon/max20730.rst @@ -0,0 +1,74 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Kernel driver max20730 +====================== + +Supported chips: + + * Maxim MAX20730 + + Prefix: 'max20730' + + Addresses scanned: - + + Datasheet: https://datasheets.maximintegrated.com/en/ds/MAX20730.pdf + + * Maxim MAX20734 + + Prefix: 'max20734' + + Addresses scanned: - + + Datasheet: https://datasheets.maximintegrated.com/en/ds/MAX20734.pdf + + * Maxim MAX20743 + + Prefix: 'max20743' + + Addresses scanned: - + + Datasheet: https://datasheets.maximintegrated.com/en/ds/MAX20743.pdf + +Author: Guenter Roeck + + +Description +----------- + +This driver implements support for Maxim MAX20730, MAX20734, and MAX20743 +Integrated, Step-Down Switching Regulators with PMBus support. + +The driver is a client driver to the core PMBus driver. +Please see Documentation/hwmon/pmbus.rst for details on PMBus client drivers. + + +Usage Notes +----------- + +This driver does not auto-detect devices. You will have to instantiate the +devices explicitly. Please see Documentation/i2c/instantiating-devices.rst for +details. + + +Sysfs entries +------------- + +=================== ===== ======================================================= +curr1_crit RW/RO Critical output current. Please see datasheet for + supported limits. Read-only if the chip is + write protected; read-write otherwise. +curr1_crit_alarm RO Output current critical alarm +curr1_input RO Output current +curr1_label RO 'iout1' +in1_alarm RO Input voltage alarm +in1_input RO Input voltage +in1_label RO 'vin' +in2_alarm RO Output voltage alarm +in2_input RO Output voltage +in2_label RO 'vout1' +temp1_crit RW/RO Critical temeperature. Supported values are 130 or 150 + degrees C. Read-only if the chip is write protected; + read-write otherwise. +temp1_crit_alarm RO Temperature critical alarm +temp1_input RO Chip temperature +=================== ===== ======================================================= diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/hwmon/max31730.rst b/Documentation/hwmon/max31730.rst --- a/Documentation/hwmon/max31730.rst +++ b/Documentation/hwmon/max31730.rst @@ -0,0 +1,44 @@ +Kernel driver max31790 +====================== + +Supported chips: + + * Maxim MAX31730 + + Prefix: 'max31730' + + Addresses scanned: 0x1c, 0x1d, 0x1e, 0x1f, 0x4c, 0x4d, 0x4e, 0x4f + + Datasheet: https://datasheets.maximintegrated.com/en/ds/MAX31730.pdf + +Author: Guenter Roeck + + +Description +----------- + +This driver implements support for Maxim MAX31730. + +The MAX31730 temperature sensor monitors its own temperature and the +temperatures of three external diode-connected transistors. The operating +supply voltage is from 3.0V to 3.6V. Resistance cancellation compensates +for high series resistance in circuit-board traces and the external thermal +diode, while beta compensation corrects for temperature-measurement +errors due to low-beta sensing transistors. + + +Sysfs entries +------------- + +=================== == ======================================================= +temp[1-4]_enable RW Temperature enable/disable + Set to 0 to enable channel, 0 to disable +temp[1-4]_input RO Temperature input +temp[2-4]_fault RO Fault indicator for remote channels +temp[1-4]_max RW Maximum temperature +temp[1-4]_max_alarm RW Maximum temperature alarm +temp[1-4]_min RW Minimum temperature. Common for all channels. + Only temp1_min is writeable. +temp[1-4]_min_alarm RO Minimum temperature alarm +temp[2-4]_offset RW Temperature offset for remote channels +=================== == ======================================================= diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/hwmon/pmbus.rst b/Documentation/hwmon/pmbus.rst --- a/Documentation/hwmon/pmbus.rst +++ b/Documentation/hwmon/pmbus.rst @@ -63,6 +63,16 @@ http://www.ti.com/lit/gpn/tps544c25 + * Maxim MAX20796 + + Prefix: 'max20796' + + Addresses scanned: - + + Datasheet: + + Not published + * Generic PMBus devices Prefix: 'pmbus' diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/hwmon/ucd9000.rst b/Documentation/hwmon/ucd9000.rst --- a/Documentation/hwmon/ucd9000.rst +++ b/Documentation/hwmon/ucd9000.rst @@ -3,9 +3,10 @@ Supported chips: - * TI UCD90120, UCD90124, UCD90160, UCD9090, and UCD90910 + * TI UCD90120, UCD90124, UCD90160, UCD90320, UCD9090, and UCD90910 - Prefixes: 'ucd90120', 'ucd90124', 'ucd90160', 'ucd9090', 'ucd90910' + Prefixes: 'ucd90120', 'ucd90124', 'ucd90160', 'ucd90320', 'ucd9090', + 'ucd90910' Addresses scanned: - @@ -14,6 +15,7 @@ - http://focus.ti.com/lit/ds/symlink/ucd90120.pdf - http://focus.ti.com/lit/ds/symlink/ucd90124.pdf - http://focus.ti.com/lit/ds/symlink/ucd90160.pdf + - http://focus.ti.com/lit/ds/symlink/ucd90320.pdf - http://focus.ti.com/lit/ds/symlink/ucd9090.pdf - http://focus.ti.com/lit/ds/symlink/ucd90910.pdf @@ -45,6 +47,12 @@ functions. Twelve of these pins offer PWM functionality. Using these pins, the UCD90160 offers support for margining, and general-purpose PWM functions. +The UCD90320 is a 32-rail PMBus/I2C addressable power-supply sequencer and +monitor. The 24 integrated ADC channels (AMONx) monitor the power supply +voltage, current, and temperature. Of the 84 GPIO pins, 8 can be used as +digital monitors (DMONx), 32 to enable the power supply (ENx), 24 for margining +(MARx), 16 for logical GPO, and 32 GPIs for cascading, and system function. + The UCD9090 is a 10-rail PMBus/I2C addressable power-supply sequencer and monitor. The device integrates a 12-bit ADC for monitoring up to 10 power-supply voltage inputs. Twenty-three GPIO pins can be used for power supply enables, diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/hwmon/xdpe12284.rst b/Documentation/hwmon/xdpe12284.rst --- a/Documentation/hwmon/xdpe12284.rst +++ b/Documentation/hwmon/xdpe12284.rst @@ -0,0 +1,101 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Kernel driver xdpe122 +===================== + +Supported chips: + + * Infineon XDPE12254 + + Prefix: 'xdpe12254' + + * Infineon XDPE12284 + + Prefix: 'xdpe12284' + +Authors: + + Vadim Pasternak + +Description +----------- + +This driver implements support for Infineon Multi-phase XDPE122 family +dual loop voltage regulators. +The family includes XDPE12284 and XDPE12254 devices. +The devices from this family complaint with: +- Intel VR13 and VR13HC rev 1.3, IMVP8 rev 1.2 and IMPVP9 rev 1.3 DC-DC + converter specification. +- Intel SVID rev 1.9. protocol. +- PMBus rev 1.3 interface. + +Devices support linear format for reading input voltage, input and output current, +input and output power and temperature. +Device supports VID format for reading output voltage. The below modes are +supported: +- VR12.0 mode, 5-mV DAC - 0x01. +- VR12.5 mode, 10-mV DAC - 0x02. +- IMVP9 mode, 5-mV DAC - 0x03. +- AMD mode 6.25mV - 0x10. + +Devices support two pages for telemetry. + +The driver provides for current: input, maximum and critical thresholds +and maximum and critical alarms. Critical thresholds and critical alarm are +supported only for current output. +The driver exports the following attributes for via the sysfs files, where +indexes 1, 2 are for "iin" and 3, 4 for "iout": + +**curr[3-4]_crit** + +**curr[3-4]_crit_alarm** + +**curr[1-4]_input** + +**curr[1-4]_label** + +**curr[1-4]_max** + +**curr[1-4]_max_alarm** + +The driver provides for voltage: input, critical and low critical thresholds +and critical and low critical alarms. +The driver exports the following attributes for via the sysfs files, where +indexes 1, 2 are for "vin" and 3, 4 for "vout": + +**in[1-4]_crit** + +**in[1-4_crit_alarm** + +**in[1-4]_input** + +**in[1-4_label** + +**in[1-4]_lcrit** + +**in[1-41_lcrit_alarm** + +The driver provides for power: input and alarms. Power alarm is supported only +for power input. +The driver exports the following attributes for via the sysfs files, where +indexes 1, 2 are for "pin" and 3, 4 for "pout": + +**power[1-2]_alarm** + +**power[1-4]_input** + +**power[1-4]_label** + +The driver provides for temperature: input, maximum and critical thresholds +and maximum and critical alarms. +The driver exports the following attributes for via the sysfs files: + +**temp[1-2]_crit** + +**temp[1-2]_crit_alarm** + +**temp[1-2]_input** + +**temp[1-2]_max** + +**temp[1-2]_max_alarm** diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/isdn/avmb1.rst b/Documentation/isdn/avmb1.rst --- a/Documentation/isdn/avmb1.rst +++ b/Documentation/isdn/avmb1.rst @@ -1,246 +0,0 @@ -================================ -Driver for active AVM Controller -================================ - -The driver provides a kernel capi2.0 Interface (kernelcapi) and -on top of this a User-Level-CAPI2.0-interface (capi) -and a driver to connect isdn4linux with CAPI2.0 (capidrv). -The lowlevel interface can be used to implement a CAPI2.0 -also for passive cards since July 1999. - -The author can be reached at calle@calle.in-berlin.de. -The command avmcapictrl is part of the isdn4k-utils. -t4-files can be found at ftp://ftp.avm.de/cardware/b1/linux/firmware - -Currently supported cards: - - - B1 ISA (all versions) - - B1 PCI - - T1/T1B (HEMA card) - - M1 - - M2 - - B1 PCMCIA - -Installing ----------- - -You need at least /dev/capi20 to load the firmware. - -:: - - mknod /dev/capi20 c 68 0 - mknod /dev/capi20.00 c 68 1 - mknod /dev/capi20.01 c 68 2 - . - . - . - mknod /dev/capi20.19 c 68 20 - -Running -------- - -To use the card you need the t4-files to download the firmware. -AVM GmbH provides several t4-files for the different D-channel -protocols (b1.t4 for Euro-ISDN). Install these file in /lib/isdn. - -if you configure as modules load the modules this way:: - - insmod /lib/modules/current/misc/capiutil.o - insmod /lib/modules/current/misc/b1.o - insmod /lib/modules/current/misc/kernelcapi.o - insmod /lib/modules/current/misc/capidrv.o - insmod /lib/modules/current/misc/capi.o - -if you have an B1-PCI card load the module b1pci.o:: - - insmod /lib/modules/current/misc/b1pci.o - -and load the firmware with:: - - avmcapictrl load /lib/isdn/b1.t4 1 - -if you have an B1-ISA card load the module b1isa.o -and add the card by calling:: - - avmcapictrl add 0x150 15 - -and load the firmware by calling:: - - avmcapictrl load /lib/isdn/b1.t4 1 - -if you have an T1-ISA card load the module t1isa.o -and add the card by calling:: - - avmcapictrl add 0x450 15 T1 0 - -and load the firmware by calling:: - - avmcapictrl load /lib/isdn/t1.t4 1 - -if you have an PCMCIA card (B1/M1/M2) load the module b1pcmcia.o -before you insert the card. - -Leased Lines with B1 --------------------- - -Init card and load firmware. - -For an D64S use "FV: 1" as phone number - -For an D64S2 use "FV: 1" and "FV: 2" for multilink -or "FV: 1,2" to use CAPI channel bundling. - -/proc-Interface ------------------ - -/proc/capi:: - - dr-xr-xr-x 2 root root 0 Jul 1 14:03 . - dr-xr-xr-x 82 root root 0 Jun 30 19:08 .. - -r--r--r-- 1 root root 0 Jul 1 14:03 applications - -r--r--r-- 1 root root 0 Jul 1 14:03 applstats - -r--r--r-- 1 root root 0 Jul 1 14:03 capi20 - -r--r--r-- 1 root root 0 Jul 1 14:03 capidrv - -r--r--r-- 1 root root 0 Jul 1 14:03 controller - -r--r--r-- 1 root root 0 Jul 1 14:03 contrstats - -r--r--r-- 1 root root 0 Jul 1 14:03 driver - -r--r--r-- 1 root root 0 Jul 1 14:03 ncci - -r--r--r-- 1 root root 0 Jul 1 14:03 users - -/proc/capi/applications: - applid level3cnt datablkcnt datablklen ncci-cnt recvqueuelen - level3cnt: - capi_register parameter - datablkcnt: - capi_register parameter - ncci-cnt: - current number of nccis (connections) - recvqueuelen: - number of messages on receive queue - - for example:: - - 1 -2 16 2048 1 0 - 2 2 7 2048 1 0 - -/proc/capi/applstats: - applid recvctlmsg nrecvdatamsg nsentctlmsg nsentdatamsg - recvctlmsg: - capi messages received without DATA_B3_IND - recvdatamsg: - capi DATA_B3_IND received - sentctlmsg: - capi messages sent without DATA_B3_REQ - sentdatamsg: - capi DATA_B3_REQ sent - - for example:: - - 1 2057 1699 1721 1699 - -/proc/capi/capi20: statistics of capi.o (/dev/capi20) - minor nopen nrecvdropmsg nrecvctlmsg nrecvdatamsg sentctlmsg sentdatamsg - minor: - minor device number of capi device - nopen: - number of calls to devices open - nrecvdropmsg: - capi messages dropped (messages in recvqueue in close) - nrecvctlmsg: - capi messages received without DATA_B3_IND - nrecvdatamsg: - capi DATA_B3_IND received - nsentctlmsg: - capi messages sent without DATA_B3_REQ - nsentdatamsg: - capi DATA_B3_REQ sent - - for example:: - - 1 2 18 0 16 2 - -/proc/capi/capidrv: statistics of capidrv.o (capi messages) - nrecvctlmsg nrecvdatamsg sentctlmsg sentdatamsg - nrecvctlmsg: - capi messages received without DATA_B3_IND - nrecvdatamsg: - capi DATA_B3_IND received - nsentctlmsg: - capi messages sent without DATA_B3_REQ - nsentdatamsg: - capi DATA_B3_REQ sent - - for example: - 2780 2226 2256 2226 - -/proc/capi/controller: - controller drivername state cardname controllerinfo - - for example:: - - 1 b1pci running b1pci-e000 B1 3.07-01 0xe000 19 - 2 t1isa running t1isa-450 B1 3.07-01 0x450 11 0 - 3 b1pcmcia running m2-150 B1 3.07-01 0x150 5 - -/proc/capi/contrstats: - controller nrecvctlmsg nrecvdatamsg sentctlmsg sentdatamsg - nrecvctlmsg: - capi messages received without DATA_B3_IND - nrecvdatamsg: - capi DATA_B3_IND received - nsentctlmsg: - capi messages sent without DATA_B3_REQ - nsentdatamsg: - capi DATA_B3_REQ sent - - for example:: - - 1 2845 2272 2310 2274 - 2 2 0 2 0 - 3 2 0 2 0 - -/proc/capi/driver: - drivername ncontroller - - for example:: - - b1pci 1 - t1isa 1 - b1pcmcia 1 - b1isa 0 - -/proc/capi/ncci: - apllid ncci winsize sendwindow - - for example:: - - 1 0x10101 8 0 - -/proc/capi/users: kernelmodules that use the kernelcapi. - name - - for example:: - - capidrv - capi20 - -Questions ---------- - -Check out the FAQ (ftp.isdn4linux.de) or subscribe to the -linux-avmb1@calle.in-berlin.de mailing list by sending -a mail to majordomo@calle.in-berlin.de with -subscribe linux-avmb1 -in the body. - -German documentation and several scripts can be found at -ftp://ftp.avm.de/cardware/b1/linux/ - -Bugs ----- - -If you find any please let me know. - -Enjoy, - -Carsten Paeth (calle@calle.in-berlin.de) diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/isdn/gigaset.rst b/Documentation/isdn/gigaset.rst --- a/Documentation/isdn/gigaset.rst +++ b/Documentation/isdn/gigaset.rst @@ -1,465 +0,0 @@ -========================== -GigaSet 307x Device Driver -========================== - -1. Requirements -================= - -1.1. Hardware -------------- - - This driver supports the connection of the Gigaset 307x/417x family of - ISDN DECT bases via Gigaset M101 Data, Gigaset M105 Data or direct USB - connection. The following devices are reported to be compatible: - - Bases: - - Siemens Gigaset 3070/3075 isdn - - Siemens Gigaset 4170/4175 isdn - - Siemens Gigaset SX205/255 - - Siemens Gigaset SX353 - - T-Com Sinus 45 [AB] isdn - - T-Com Sinus 721X[A] [SE] - - Vox Chicago 390 ISDN (KPN Telecom) - - RS232 data boxes: - - Siemens Gigaset M101 Data - - T-Com Sinus 45 Data 1 - - USB data boxes: - - Siemens Gigaset M105 Data - - Siemens Gigaset USB Adapter DECT - - T-Com Sinus 45 Data 2 - - T-Com Sinus 721 data - - Chicago 390 USB (KPN) - - See also http://www.erbze.info/sinus_gigaset.htm - (archived at https://web.archive.org/web/20100717020421/http://www.erbze.info:80/sinus_gigaset.htm ) and - http://gigaset307x.sourceforge.net/ - - We had also reports from users of Gigaset M105 who could use the drivers - with SX 100 and CX 100 ISDN bases (only in unimodem mode, see section 2.5.) - If you have another device that works with our driver, please let us know. - - Chances of getting an USB device to work are good if the output of:: - - lsusb - - at the command line contains one of the following:: - - ID 0681:0001 - ID 0681:0002 - ID 0681:0009 - ID 0681:0021 - ID 0681:0022 - -1.2. Software -------------- - - The driver works with the Kernel CAPI subsystem and can be used with any - software which is able to use CAPI 2.0 for ISDN connections (voice or data). - - There are some user space tools available at - https://sourceforge.net/projects/gigaset307x/ - which provide access to additional device specific functions like SMS, - phonebook or call journal. - - -2. How to use the driver -========================== - -2.1. Modules ------------- - - For the devices to work, the proper kernel modules have to be loaded. - This normally happens automatically when the system detects the USB - device (base, M105) or when the line discipline is attached (M101). It - can also be triggered manually using the modprobe(8) command, for example - for troubleshooting or to pass module parameters. - - The module ser_gigaset provides a serial line discipline N_GIGASET_M101 - which uses the regular serial port driver to access the device, and must - therefore be attached to the serial device to which the M101 is connected. - The ldattach(8) command (included in util-linux-ng release 2.14 or later) - can be used for that purpose, for example:: - - ldattach GIGASET_M101 /dev/ttyS1 - - This will open the device file, attach the line discipline to it, and - then sleep in the background, keeping the device open so that the line - discipline remains active. To deactivate it, kill the daemon, for example - with:: - - killall ldattach - - before disconnecting the device. To have this happen automatically at - system startup/shutdown on an LSB compatible system, create and activate - an appropriate LSB startup script /etc/init.d/gigaset. (The init name - 'gigaset' is officially assigned to this project by LANANA.) - Alternatively, just add the 'ldattach' command line to /etc/rc.local. - - The modules accept the following parameters: - - =============== ========== ========================================== - Module Parameter Meaning - - gigaset debug debug level (see section 3.2.) - - startmode initial operation mode (see section 2.5.): - bas_gigaset ) 1=CAPI (default), 0=Unimodem - ser_gigaset ) - usb_gigaset ) cidmode initial Call-ID mode setting (see section - 2.5.): 1=on (default), 0=off - - =============== ========== ========================================== - - Depending on your distribution you may want to create a separate module - configuration file like /etc/modprobe.d/gigaset.conf for these. - -2.2. Device nodes for user space programs ------------------------------------------ - - The device can be accessed from user space (eg. by the user space tools - mentioned in 1.2.) through the device nodes: - - - /dev/ttyGS0 for M101 (RS232 data boxes) - - /dev/ttyGU0 for M105 (USB data boxes) - - /dev/ttyGB0 for the base driver (direct USB connection) - - If you connect more than one device of a type, they will get consecutive - device nodes, eg. /dev/ttyGU1 for a second M105. - - You can also set a "default device" for the user space tools to use when - no device node is given as parameter, by creating a symlink /dev/ttyG to - one of them, eg.:: - - ln -s /dev/ttyGB0 /dev/ttyG - - The devices accept the following device specific ioctl calls - (defined in gigaset_dev.h): - - ``ioctl(int fd, GIGASET_REDIR, int *cmd);`` - - If cmd==1, the device is set to be controlled exclusively through the - character device node; access from the ISDN subsystem is blocked. - - If cmd==0, the device is set to be used from the ISDN subsystem and does - not communicate through the character device node. - - ``ioctl(int fd, GIGASET_CONFIG, int *cmd);`` - - (ser_gigaset and usb_gigaset only) - - If cmd==1, the device is set to adapter configuration mode where commands - are interpreted by the M10x DECT adapter itself instead of being - forwarded to the base station. In this mode, the device accepts the - commands described in Siemens document "AT-Kommando Alignment M10x Data" - for setting the operation mode, associating with a base station and - querying parameters like field strengh and signal quality. - - Note that there is no ioctl command for leaving adapter configuration - mode and returning to regular operation. In order to leave adapter - configuration mode, write the command ATO to the device. - - ``ioctl(int fd, GIGASET_BRKCHARS, unsigned char brkchars[6]);`` - - (usb_gigaset only) - - Set the break characters on an M105's internal serial adapter to the six - bytes stored in brkchars[]. Unused bytes should be set to zero. - - ioctl(int fd, GIGASET_VERSION, unsigned version[4]); - Retrieve version information from the driver. version[0] must be set to - one of: - - - GIGVER_DRIVER: retrieve driver version - - GIGVER_COMPAT: retrieve interface compatibility version - - GIGVER_FWBASE: retrieve the firmware version of the base - - Upon return, version[] is filled with the requested version information. - -2.3. CAPI ---------- - - The devices will show up as CAPI controllers as soon as the - corresponding driver module is loaded, and can then be used with - CAPI 2.0 kernel and user space applications. For user space access, - the module capi.ko must be loaded. - - Most distributions handle loading and unloading of the various CAPI - modules automatically via the command capiinit(1) from the capi4k-utils - package or a similar mechanism. Note that capiinit(1) cannot unload the - Gigaset drivers because it doesn't support more than one module per - driver. - -2.5. Unimodem mode ------------------- - - In this mode the device works like a modem connected to a serial port - (the /dev/ttyGU0, ... mentioned above) which understands the commands:: - - ATZ init, reset - => OK or ERROR - ATD - ATDT dial - => OK, CONNECT, - BUSY, - NO DIAL TONE, - NO CARRIER, - NO ANSWER - +++ change to command mode when connected - ATH hangup - - You can use some configuration tool of your distribution to configure this - "modem" or configure pppd/wvdial manually. There are some example ppp - configuration files and chat scripts in the gigaset-VERSION/ppp directory - in the driver packages from https://sourceforge.net/projects/gigaset307x/. - Please note that the USB drivers are not able to change the state of the - control lines. This means you must use "Stupid Mode" if you are using - wvdial or you should use the nocrtscts option of pppd. - You must also assure that the ppp_async module is loaded with the parameter - flag_time=0. You can do this e.g. by adding a line like:: - - options ppp_async flag_time=0 - - to an appropriate module configuration file, like:: - - /etc/modprobe.d/gigaset.conf. - - Unimodem mode is needed for making some devices [e.g. SX100] work which - do not support the regular Gigaset command set. If debug output (see - section 3.2.) shows something like this when dialing:: - - CMD Received: ERROR - Available Params: 0 - Connection State: 0, Response: -1 - gigaset_process_response: resp_code -1 in ConState 0 ! - Timeout occurred - - then switching to unimodem mode may help. - - If you have installed the command line tool gigacontr, you can enter - unimodem mode using:: - - gigacontr --mode unimodem - - You can switch back using:: - - gigacontr --mode isdn - - You can also put the driver directly into Unimodem mode when it's loaded, - by passing the module parameter startmode=0 to the hardware specific - module, e.g.:: - - modprobe usb_gigaset startmode=0 - - or by adding a line like:: - - options usb_gigaset startmode=0 - - to an appropriate module configuration file, like:: - - /etc/modprobe.d/gigaset.conf - -2.6. Call-ID (CID) mode ------------------------ - - Call-IDs are numbers used to tag commands to, and responses from, the - Gigaset base in order to support the simultaneous handling of multiple - ISDN calls. Their use can be enabled ("CID mode") or disabled ("Unimodem - mode"). Without Call-IDs (in Unimodem mode), only a very limited set of - functions is available. It allows outgoing data connections only, but - does not signal incoming calls or other base events. - - DECT cordless data devices (M10x) permanently occupy the cordless - connection to the base while Call-IDs are activated. As the Gigaset - bases only support one DECT data connection at a time, this prevents - other DECT cordless data devices from accessing the base. - - During active operation, the driver switches to the necessary mode - automatically. However, for the reasons above, the mode chosen when - the device is not in use (idle) can be selected by the user. - - - If you want to receive incoming calls, you can use the default - settings (CID mode). - - If you have several DECT data devices (M10x) which you want to use - in turn, select Unimodem mode by passing the parameter "cidmode=0" to - the appropriate driver module (ser_gigaset or usb_gigaset). - - If you want both of these at once, you are out of luck. - - You can also use the tty class parameter "cidmode" of the device to - change its CID mode while the driver is loaded, eg.:: - - echo 0 > /sys/class/tty/ttyGU0/cidmode - -2.7. Dialing Numbers --------------------- -provided by an application for dialing out must - be a public network number according to the local dialing plan, without - any dial prefix for getting an outside line. - - Internal calls can be made by providing an internal extension number - prefixed with ``**`` (two asterisks) as the called party number. So to dial - eg. the first registered DECT handset, give ``**11`` as the called party - number. Dialing ``***`` (three asterisks) calls all extensions - simultaneously (global call). - - Unimodem mode does not support internal calls. - -2.8. Unregistered Wireless Devices (M101/M105) ----------------------------------------------- - - The main purpose of the ser_gigaset and usb_gigaset drivers is to allow - the M101 and M105 wireless devices to be used as ISDN devices for ISDN - connections through a Gigaset base. Therefore they assume that the device - is registered to a DECT base. - - If the M101/M105 device is not registered to a base, initialization of - the device fails, and a corresponding error message is logged by the - driver. In that situation, a restricted set of functions is available - which includes, in particular, those necessary for registering the device - to a base or for switching it between Fixed Part and Portable Part - modes. See the gigacontr(8) manpage for details. - -3. Troubleshooting -==================== - -3.1. Solutions to frequently reported problems ----------------------------------------------- - - Problem: - You have a slow provider and isdn4linux gives up dialing too early. - Solution: - Load the isdn module using the dialtimeout option. You can do this e.g. - by adding a line like:: - - options isdn dialtimeout=15 - - to /etc/modprobe.d/gigaset.conf or a similar file. - - Problem: - The isdnlog program emits error messages or just doesn't work. - Solution: - Isdnlog supports only the HiSax driver. Do not attempt to use it with - other drivers such as Gigaset. - - Problem: - You have two or more DECT data adapters (M101/M105) and only the - first one you turn on works. - Solution: - Select Unimodem mode for all DECT data adapters. (see section 2.5.) - - Problem: - Messages like this:: - - usb_gigaset 3-2:1.0: Could not initialize the device. - - appear in your syslog. - Solution: - Check whether your M10x wireless device is correctly registered to the - Gigaset base. (see section 2.7.) - -3.2. Telling the driver to provide more information ---------------------------------------------------- - Building the driver with the "Gigaset debugging" kernel configuration - option (CONFIG_GIGASET_DEBUG) gives it the ability to produce additional - information useful for debugging. - - You can control the amount of debugging information the driver produces by - writing an appropriate value to /sys/module/gigaset/parameters/debug, - e.g.:: - - echo 0 > /sys/module/gigaset/parameters/debug - - switches off debugging output completely, - - :: - - echo 0x302020 > /sys/module/gigaset/parameters/debug - - enables a reasonable set of debugging output messages. These values are - bit patterns where every bit controls a certain type of debugging output. - See the constants DEBUG_* in the source file gigaset.h for details. - - The initial value can be set using the debug parameter when loading the - module "gigaset", e.g. by adding a line:: - - options gigaset debug=0 - - to your module configuration file, eg. /etc/modprobe.d/gigaset.conf - - Generated debugging information can be found - - as output of the command:: - - dmesg - - - in system log files written by your syslog daemon, usually - in /var/log/, e.g. /var/log/messages. - -3.3. Reporting problems and bugs --------------------------------- - If you can't solve problems with the driver on your own, feel free to - use one of the forums, bug trackers, or mailing lists on - - https://sourceforge.net/projects/gigaset307x - - or write an electronic mail to the maintainers. - - Try to provide as much information as possible, such as - - - distribution - - kernel version (uname -r) - - gcc version (gcc --version) - - hardware architecture (uname -m, ...) - - type and firmware version of your device (base and wireless module, - if any) - - output of "lsusb -v" (if using an USB device) - - error messages - - relevant system log messages (it would help if you activate debug - output as described in 3.2.) - - For help with general configuration problems not specific to our driver, - such as isdn4linux and network configuration issues, please refer to the - appropriate forums and newsgroups. - -3.4. Reporting problem solutions --------------------------------- - If you solved a problem with our drivers, wrote startup scripts for your - distribution, ... feel free to contact us (using one of the places - mentioned in 3.3.). We'd like to add scripts, hints, documentation - to the driver and/or the project web page. - - -4. Links, other software -========================== - - - Sourceforge project developing this driver and associated tools - https://sourceforge.net/projects/gigaset307x - - Yahoo! Group on the Siemens Gigaset family of devices - https://de.groups.yahoo.com/group/Siemens-Gigaset - - Siemens Gigaset/T-Sinus compatibility table - http://www.erbze.info/sinus_gigaset.htm - (archived at https://web.archive.org/web/20100717020421/http://www.erbze.info:80/sinus_gigaset.htm ) - - -5. Credits -============ - - Thanks to - - Karsten Keil - for his help with isdn4linux - Deti Fliegl - for his base driver code - Dennis Dietrich - for his kernel 2.6 patches - Andreas Rummel - for his work and logs to get unimodem mode working - Andreas Degert - for his logs and patches to get cx 100 working - Dietrich Feist - for his generous donation of one M105 and two M101 cordless adapters - Christoph Schweers - for his generous donation of a M34 device - - and all the other people who sent logs and other information. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/isdn/hysdn.rst b/Documentation/isdn/hysdn.rst --- a/Documentation/isdn/hysdn.rst +++ b/Documentation/isdn/hysdn.rst @@ -1,196 +0,0 @@ -============ -Hysdn Driver -============ - -The hysdn driver has been written by -Werner Cornelius (werner@isdn4linux.de or werner@titro.de) -for Hypercope GmbH Aachen Germany. Hypercope agreed to publish this driver -under the GNU General Public License. - -The CAPI 2.0-support was added by Ulrich Albrecht (ualbrecht@hypercope.de) -for Hypercope GmbH Aachen, Germany. - - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - -.. Table of contents - - 1. About the driver - - 2. Loading/Unloading the driver - - 3. Entries in the /proc filesystem - - 4. The /proc/net/hysdn/cardconfX file - - 5. The /proc/net/hysdn/cardlogX file - - 6. Where to get additional info and help - - -1. About the driver -=================== - - The drivers/isdn/hysdn subdir contains a driver for HYPERCOPEs active - PCI isdn cards Champ, Ergo and Metro. To enable support for this cards - enable ISDN support in the kernel config and support for HYSDN cards in - the active cards submenu. The driver may only be compiled and used if - support for loadable modules and the process filesystem have been enabled. - - These cards provide two different interfaces to the kernel. Without the - optional CAPI 2.0 support, they register as ethernet card. IP-routing - to a ISDN-destination is performed on the card itself. All necessary - handlers for various protocols like ppp and others as well as config info - and firmware may be fetched from Hypercopes WWW-Site www.hypercope.de. - - With CAPI 2.0 support enabled, the card can also be used as a CAPI 2.0 - compliant devices with either CAPI 2.0 applications - (check isdn4k-utils) or -using the capidrv module- as a regular - isdn4linux device. This is done via the same mechanism as with the - active AVM cards and in fact uses the same module. - - -2. Loading/Unloading the driver -=============================== - - The module has no command line parameters and auto detects up to 10 cards - in the id-range 0-9. - If a loaded driver shall be unloaded all open files in the /proc/net/hysdn - subdir need to be closed and all ethernet interfaces allocated by this - driver must be shut down. Otherwise the module counter will avoid a module - unload. - - If you are using the CAPI 2.0-interface, make sure to load/modprobe the - kernelcapi-module first. - - If you plan to use the capidrv-link to isdn4linux, make sure to load - capidrv.o after all modules using this driver (i.e. after hysdn and - any avm-specific modules). - -3. Entries in the /proc filesystem -================================== - - When the module has been loaded it adds the directory hysdn in the - /proc/net tree. This directory contains exactly 2 file entries for each - card. One is called cardconfX and the other cardlogX, where X is the - card id number from 0 to 9. - The cards are numbered in the order found in the PCI config data. - -4. The /proc/net/hysdn/cardconfX file -===================================== - - This file may be read to get by everyone to get info about the cards type, - actual state, available features and used resources. - The first 3 entries (id, bus and slot) are PCI info fields, the following - type field gives the information about the cards type: - - - 4 -> Ergo card (server card with 2 b-chans) - - 5 -> Metro card (server card with 4 or 8 b-chans) - - 6 -> Champ card (client card with 2 b-chans) - - The following 3 fields show the hardware assignments for irq, iobase and the - dual ported memory (dp-mem). - - The fields b-chans and fax-chans announce the available card resources of - this types for the user. - - The state variable indicates the actual drivers state for this card with the - following assignments. - - - 0 -> card has not been booted since driver load - - 1 -> card booting is actually in progess - - 2 -> card is in an error state due to a previous boot failure - - 3 -> card is booted and active - - And the last field (device) shows the name of the ethernet device assigned - to this card. Up to the first successful boot this field only shows a - - to tell that no net device has been allocated up to now. Once a net device - has been allocated it remains assigned to this card, even if a card is - rebooted and an boot error occurs. - - Writing to the cardconfX file boots the card or transfers config lines to - the cards firmware. The type of data is automatically detected when the - first data is written. Only root has write access to this file. - The firmware boot files are normally called hyclient.pof for client cards - and hyserver.pof for server cards. - After successfully writing the boot file, complete config files or single - config lines may be copied to this file. - If an error occurs the return value given to the writing process has the - following additional codes (decimal): - - ==== ============================================ - 1000 Another process is currently bootng the card - 1001 Invalid firmware header - 1002 Boards dual-port RAM test failed - 1003 Internal firmware handler error - 1004 Boot image size invalid - 1005 First boot stage (bootstrap loader) failed - 1006 Second boot stage failure - 1007 Timeout waiting for card ready during boot - 1008 Operation only allowed in booted state - 1009 Config line too long - 1010 Invalid channel number - 1011 Timeout sending config data - ==== ============================================ - - Additional info about error reasons may be fetched from the log output. - -5. The /proc/net/hysdn/cardlogX file -==================================== - - The cardlogX file entry may be opened multiple for reading by everyone to - get the cards and drivers log data. Card messages always start with the - keyword LOG. All other lines are output from the driver. - The driver log data may be redirected to the syslog by selecting the - appropriate bitmask. The cards log messages will always be send to this - interface but never to the syslog. - - A root user may write a decimal or hex (with 0x) value t this file to select - desired output options. As mentioned above the cards log dat is always - written to the cardlog file independent of the following options only used - to check and debug the driver itself: - - For example:: - - echo "0x34560078" > /proc/net/hysdn/cardlog0 - - to output the hex log mask 34560078 for card 0. - - The written value is regarded as an unsigned 32-Bit value, bit ored for - desired output. The following bits are already assigned: - - ========== ============================================================ - 0x80000000 All driver log data is alternatively via syslog - 0x00000001 Log memory allocation errors - 0x00000010 Firmware load start and close are logged - 0x00000020 Log firmware record parser - 0x00000040 Log every firmware write actions - 0x00000080 Log all card related boot messages - 0x00000100 Output all config data sent for debugging purposes - 0x00000200 Only non comment config lines are shown wth channel - 0x00000400 Additional conf log output - 0x00001000 Log the asynchronous scheduler actions (config and log) - 0x00100000 Log all open and close actions to /proc/net/hysdn/card files - 0x00200000 Log all actions from /proc file entries - 0x00010000 Log network interface init and deinit - ========== ============================================================ - -6. Where to get additional info and help -======================================== - - If you have any problems concerning the driver or configuration contact - the Hypercope support team (support@hypercope.de) and or the authors - Werner Cornelius (werner@isdn4linux or cornelius@titro.de) or - Ulrich Albrecht (ualbrecht@hypercope.de). diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/isdn/index.rst b/Documentation/isdn/index.rst --- a/Documentation/isdn/index.rst +++ b/Documentation/isdn/index.rst @@ -9,9 +9,6 @@ interface_capi - avmb1 - gigaset - hysdn m_isdn credits diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/isdn/interface_capi.rst b/Documentation/isdn/interface_capi.rst --- a/Documentation/isdn/interface_capi.rst +++ b/Documentation/isdn/interface_capi.rst @@ -26,13 +26,6 @@ 2. Driver and Device Registration ================================= -CAPI drivers optionally register themselves with Kernel CAPI by calling the -Kernel CAPI function register_capi_driver() with a pointer to a struct -capi_driver. This structure must be filled with the name and revision of the -driver, and optionally a pointer to a callback function, add_card(). The -registration can be revoked by calling the function unregister_capi_driver() -with a pointer to the same struct capi_driver. - CAPI drivers must register each of the ISDN devices they control with Kernel CAPI by calling the Kernel CAPI function attach_capi_ctr() with a pointer to a struct capi_ctr before they can be used. This structure must be filled with @@ -89,9 +82,6 @@ the name of the driver, as a zero-terminated ASCII string ``char revision[32]`` the revision number of the driver, as a zero-terminated ASCII string -``int (*add_card)(struct capi_driver *driver, capicardparams *data)`` - a callback function pointer (may be NULL) - 4.2 struct capi_ctr ------------------- @@ -178,12 +168,6 @@ pointer to a callback function returning the entry for the device in the CAPI controller info table, /proc/capi/controller -``const struct file_operations *proc_fops`` - pointers to callback functions for the device's proc file - system entry, /proc/capi/controllers/; pointer to the device's - capi_ctr structure is available from struct proc_dir_entry::data - which is available from struct inode. - Note: Callback functions except send_message() are never called in interrupt context. @@ -267,25 +251,10 @@ _cmsg structure members. =========== ================================================================= -Functions capi_cmsg2message() and capi_message2cmsg() are provided to convert -messages between their transport encoding described in the CAPI 2.0 standard -and their _cmsg structure representation. Note that capi_cmsg2message() does -not know or check the size of its destination buffer. The caller must make -sure it is big enough to accommodate the resulting CAPI message. - 5. Lower Layer Interface Functions ================================== -(declared in ) - -:: - - void register_capi_driver(struct capi_driver *drvr) - void unregister_capi_driver(struct capi_driver *drvr) - -register/unregister a driver with Kernel CAPI - :: int attach_capi_ctr(struct capi_ctr *ctrlr) @@ -302,13 +271,6 @@ :: - void capi_ctr_suspend_output(struct capi_ctr *ctrlr) - void capi_ctr_resume_output(struct capi_ctr *ctrlr) - -signal suspend/resume - -:: - void capi_ctr_handle_message(struct capi_ctr * ctrlr, u16 applid, struct sk_buff *skb) @@ -319,21 +281,6 @@ 6. Helper Functions and Macros ============================== -Library functions (from ): - -:: - - void capilib_new_ncci(struct list_head *head, u16 applid, - u32 ncci, u32 winsize) - void capilib_free_ncci(struct list_head *head, u16 applid, u32 ncci) - void capilib_release_appl(struct list_head *head, u16 applid) - void capilib_release(struct list_head *head) - void capilib_data_b3_conf(struct list_head *head, u16 applid, - u32 ncci, u16 msgid) - u16 capilib_data_b3_req(struct list_head *head, u16 applid, - u32 ncci, u16 msgid) - - Macros to extract/set element values from/in a CAPI message header (from ): @@ -357,24 +304,6 @@ Library functions for working with _cmsg structures (from ): -``unsigned capi_cmsg2message(_cmsg *cmsg, u8 *msg)`` - Assembles a CAPI 2.0 message from the parameters in ``*cmsg``, - storing the result in ``*msg``. - -``unsigned capi_message2cmsg(_cmsg *cmsg, u8 *msg)`` - Disassembles the CAPI 2.0 message in ``*msg``, storing the parameters - in ``*cmsg``. - -``unsigned capi_cmsg_header(_cmsg *cmsg, u16 ApplId, u8 Command, u8 Subcommand, u16 Messagenumber, u32 Controller)`` - Fills the header part and address field of the _cmsg structure ``*cmsg`` - with the given values, zeroing the remainder of the structure so only - parameters with non-default values need to be changed before sending - the message. - -``void capi_cmsg_answer(_cmsg *cmsg)`` - Sets the low bit of the Subcommand field in ``*cmsg``, thereby - converting ``_REQ`` to ``_CONF`` and ``_IND`` to ``_RESP``. - ``char *capi_cmd2str(u8 Command, u8 Subcommand)`` Returns the CAPI 2.0 message name corresponding to the given command and subcommand values, as a static ASCII string. The return value may diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst --- a/Documentation/kernel-hacking/hacking.rst +++ b/Documentation/kernel-hacking/hacking.rst @@ -601,7 +601,7 @@ This is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol namespace. Symbol Namespaces are documented in -``Documentation/kbuild/namespaces.rst``. +``Documentation/core-api/symbol-namespaces.rst``. :c:func:`EXPORT_SYMBOL_NS_GPL()` -------------------------------- @@ -610,7 +610,7 @@ This is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol namespace. Symbol Namespaces are documented in -``Documentation/kbuild/namespaces.rst``. +``Documentation/core-api/symbol-namespaces.rst``. Routines and Conventions ======================== diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/locking/locktorture.rst b/Documentation/locking/locktorture.rst --- a/Documentation/locking/locktorture.rst +++ b/Documentation/locking/locktorture.rst @@ -103,8 +103,7 @@ Number of seconds between statistics-related printk()s. By default, locktorture will report stats every 60 seconds. Setting the interval to zero causes the statistics to - be printed -only- when the module is unloaded, and this - is the default. + be printed -only- when the module is unloaded. stutter The length of time to run the test before pausing for this diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/maintainer/maintainer-entry-profile.rst b/Documentation/maintainer/maintainer-entry-profile.rst --- a/Documentation/maintainer/maintainer-entry-profile.rst +++ b/Documentation/maintainer/maintainer-entry-profile.rst @@ -99,4 +99,5 @@ .. toctree:: :maxdepth: 1 + ../doc-guide/maintainer-profile ../nvdimm/maintainer-entry-profile diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/misc-devices/xilinx_sdfec.rst b/Documentation/misc-devices/xilinx_sdfec.rst --- a/Documentation/misc-devices/xilinx_sdfec.rst +++ b/Documentation/misc-devices/xilinx_sdfec.rst @@ -1,4 +1,5 @@ .. SPDX-License-Identifier: GPL-2.0+ + ==================== Xilinx SD-FEC Driver ==================== diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/networking/device_drivers/index.rst b/Documentation/networking/device_drivers/index.rst --- a/Documentation/networking/device_drivers/index.rst +++ b/Documentation/networking/device_drivers/index.rst @@ -22,9 +22,11 @@ intel/iavf intel/ice google/gve + marvell/octeontx2 mellanox/mlx5 netronome/nfp pensando/ionic + stmicro/stmmac .. only:: subproject and html diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/networking/device_drivers/marvell/octeontx2.rst b/Documentation/networking/device_drivers/marvell/octeontx2.rst --- a/Documentation/networking/device_drivers/marvell/octeontx2.rst +++ b/Documentation/networking/device_drivers/marvell/octeontx2.rst @@ -0,0 +1,159 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) + +==================================== +Marvell OcteonTx2 RVU Kernel Drivers +==================================== + +Copyright (c) 2020 Marvell International Ltd. + +Contents +======== + +- `Overview`_ +- `Drivers`_ +- `Basic packet flow`_ + +Overview +======== + +Resource virtualization unit (RVU) on Marvell's OcteonTX2 SOC maps HW +resources from the network, crypto and other functional blocks into +PCI-compatible physical and virtual functions. Each functional block +again has multiple local functions (LFs) for provisioning to PCI devices. +RVU supports multiple PCIe SRIOV physical functions (PFs) and virtual +functions (VFs). PF0 is called the administrative / admin function (AF) +and has privileges to provision RVU functional block's LFs to each of the +PF/VF. + +RVU managed networking functional blocks + - Network pool or buffer allocator (NPA) + - Network interface controller (NIX) + - Network parser CAM (NPC) + - Schedule/Synchronize/Order unit (SSO) + - Loopback interface (LBK) + +RVU managed non-networking functional blocks + - Crypto accelerator (CPT) + - Scheduled timers unit (TIM) + - Schedule/Synchronize/Order unit (SSO) + Used for both networking and non networking usecases + +Resource provisioning examples + - A PF/VF with NIX-LF & NPA-LF resources works as a pure network device + - A PF/VF with CPT-LF resource works as a pure crypto offload device. + +RVU functional blocks are highly configurable as per software requirements. + +Firmware setups following stuff before kernel boots + - Enables required number of RVU PFs based on number of physical links. + - Number of VFs per PF are either static or configurable at compile time. + Based on config, firmware assigns VFs to each of the PFs. + - Also assigns MSIX vectors to each of PF and VFs. + - These are not changed after kernel boot. + +Drivers +======= + +Linux kernel will have multiple drivers registering to different PF and VFs +of RVU. Wrt networking there will be 3 flavours of drivers. + +Admin Function driver +--------------------- + +As mentioned above RVU PF0 is called the admin function (AF), this driver +supports resource provisioning and configuration of functional blocks. +Doesn't handle any I/O. It sets up few basic stuff but most of the +funcionality is achieved via configuration requests from PFs and VFs. + +PF/VFs communicates with AF via a shared memory region (mailbox). Upon +receiving requests AF does resource provisioning and other HW configuration. +AF is always attached to host kernel, but PFs and their VFs may be used by host +kernel itself, or attached to VMs or to userspace applications like +DPDK etc. So AF has to handle provisioning/configuration requests sent +by any device from any domain. + +AF driver also interacts with underlying firmware to + - Manage physical ethernet links ie CGX LMACs. + - Retrieve information like speed, duplex, autoneg etc + - Retrieve PHY EEPROM and stats. + - Configure FEC, PAM modes + - etc + +From pure networking side AF driver supports following functionality. + - Map a physical link to a RVU PF to which a netdev is registered. + - Attach NIX and NPA block LFs to RVU PF/VF which provide buffer pools, RQs, SQs + for regular networking functionality. + - Flow control (pause frames) enable/disable/config. + - HW PTP timestamping related config. + - NPC parser profile config, basically how to parse pkt and what info to extract. + - NPC extract profile config, what to extract from the pkt to match data in MCAM entries. + - Manage NPC MCAM entries, upon request can frame and install requested packet forwarding rules. + - Defines receive side scaling (RSS) algorithms. + - Defines segmentation offload algorithms (eg TSO) + - VLAN stripping, capture and insertion config. + - SSO and TIM blocks config which provide packet scheduling support. + - Debugfs support, to check current resource provising, current status of + NPA pools, NIX RQ, SQ and CQs, various stats etc which helps in debugging issues. + - And many more. + +Physical Function driver +------------------------ + +This RVU PF handles IO, is mapped to a physical ethernet link and this +driver registers a netdev. This supports SR-IOV. As said above this driver +communicates with AF with a mailbox. To retrieve information from physical +links this driver talks to AF and AF gets that info from firmware and responds +back ie cannot talk to firmware directly. + +Supports ethtool for configuring links, RSS, queue count, queue size, +flow control, ntuple filters, dump PHY EEPROM, config FEC etc. + +Virtual Function driver +----------------------- + +There are two types VFs, VFs that share the physical link with their parent +SR-IOV PF and the VFs which work in pairs using internal HW loopback channels (LBK). + +Type1: + - These VFs and their parent PF share a physical link and used for outside communication. + - VFs cannot communicate with AF directly, they send mbox message to PF and PF + forwards that to AF. AF after processing, responds back to PF and PF forwards + the reply to VF. + - From functionality point of view there is no difference between PF and VF as same type + HW resources are attached to both. But user would be able to configure few stuff only + from PF as PF is treated as owner/admin of the link. + +Type2: + - RVU PF0 ie admin function creates these VFs and maps them to loopback block's channels. + - A set of two VFs (VF0 & VF1, VF2 & VF3 .. so on) works as a pair ie pkts sent out of + VF0 will be received by VF1 and viceversa. + - These VFs can be used by applications or virtual machines to communicate between them + without sending traffic outside. There is no switch present in HW, hence the support + for loopback VFs. + - These communicate directly with AF (PF0) via mbox. + +Except for the IO channels or links used for packet reception and transmission there is +no other difference between these VF types. AF driver takes care of IO channel mapping, +hence same VF driver works for both types of devices. + +Basic packet flow +================= + +Ingress +------- + +1. CGX LMAC receives packet. +2. Forwards the packet to the NIX block. +3. Then submitted to NPC block for parsing and then MCAM lookup to get the destination RVU device. +4. NIX LF attached to the destination RVU device allocates a buffer from RQ mapped buffer pool of NPA block LF. +5. RQ may be selected by RSS or by configuring MCAM rule with a RQ number. +6. Packet is DMA'ed and driver is notified. + +Egress +------ + +1. Driver prepares a send descriptor and submits to SQ for transmission. +2. The SQ is already configured (by AF) to transmit on a specific link/channel. +3. The SQ descriptor ring is maintained in buffers allocated from SQ mapped pool of NPA block LF. +4. NIX block transmits the pkt on the designated channel. +5. NPC MCAM entries can be installed to divert pkt onto a different channel. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/networking/device_drivers/microsoft/netvsc.txt b/Documentation/networking/device_drivers/microsoft/netvsc.txt --- a/Documentation/networking/device_drivers/microsoft/netvsc.txt +++ b/Documentation/networking/device_drivers/microsoft/netvsc.txt @@ -82,3 +82,24 @@ contain one or more packets. The send buffer is an optimization, the driver will use slower method to handle very large packets or if the send buffer area is exhausted. + + XDP support + ----------- + XDP (eXpress Data Path) is a feature that runs eBPF bytecode at the early + stage when packets arrive at a NIC card. The goal is to increase performance + for packet processing, reducing the overhead of SKB allocation and other + upper network layers. + + hv_netvsc supports XDP in native mode, and transparently sets the XDP + program on the associated VF NIC as well. + + Setting / unsetting XDP program on synthetic NIC (netvsc) propagates to + VF NIC automatically. Setting / unsetting XDP program on VF NIC directly + is not recommended, also not propagated to synthetic NIC, and may be + overwritten by setting of synthetic NIC. + + XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO + before running XDP: + ethtool -K eth0 lro off + + XDP_REDIRECT action is not yet supported. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/networking/device_drivers/netronome/nfp.rst b/Documentation/networking/device_drivers/netronome/nfp.rst --- a/Documentation/networking/device_drivers/netronome/nfp.rst +++ b/Documentation/networking/device_drivers/netronome/nfp.rst @@ -131,3 +131,119 @@ abi_drv_load_ifc Defines a list of PF devices allowed to load FW on the device. This variable is not currently user configurable. + +Statistics +========== + +Following device statistics are available through the ``ethtool -S`` interface: + +.. flat-table:: NFP device statistics + :header-rows: 1 + :widths: 3 1 11 + + * - Name + - ID + - Meaning + + * - dev_rx_discards + - 1 + - Packet can be discarded on the RX path for one of the following reasons: + + * The NIC is not in promisc mode, and the destination MAC address + doesn't match the interfaces' MAC address. + * The received packet is larger than the max buffer size on the host. + I.e. it exceeds the Layer 3 MRU. + * There is no freelist descriptor available on the host for the packet. + It is likely that the NIC couldn't cache one in time. + * A BPF program discarded the packet. + * The datapath drop action was executed. + * The MAC discarded the packet due to lack of ingress buffer space + on the NIC. + + * - dev_rx_errors + - 2 + - A packet can be counted (and dropped) as RX error for the following + reasons: + + * A problem with the VEB lookup (only when SR-IOV is used). + * A physical layer problem that causes Ethernet errors, like FCS or + alignment errors. The cause is usually faulty cables or SFPs. + + * - dev_rx_bytes + - 3 + - Total number of bytes received. + + * - dev_rx_uc_bytes + - 4 + - Unicast bytes received. + + * - dev_rx_mc_bytes + - 5 + - Multicast bytes received. + + * - dev_rx_bc_bytes + - 6 + - Broadcast bytes received. + + * - dev_rx_pkts + - 7 + - Total number of packets received. + + * - dev_rx_mc_pkts + - 8 + - Multicast packets received. + + * - dev_rx_bc_pkts + - 9 + - Broadcast packets received. + + * - dev_tx_discards + - 10 + - A packet can be discarded in the TX direction if the MAC is + being flow controlled and the NIC runs out of TX queue space. + + * - dev_tx_errors + - 11 + - A packet can be counted as TX error (and dropped) for one for the + following reasons: + + * The packet is an LSO segment, but the Layer 3 or Layer 4 offset + could not be determined. Therefore LSO could not continue. + * An invalid packet descriptor was received over PCIe. + * The packet Layer 3 length exceeds the device MTU. + * An error on the MAC/physical layer. Usually due to faulty cables or + SFPs. + * A CTM buffer could not be allocated. + * The packet offset was incorrect and could not be fixed by the NIC. + + * - dev_tx_bytes + - 12 + - Total number of bytes transmitted. + + * - dev_tx_uc_bytes + - 13 + - Unicast bytes transmitted. + + * - dev_tx_mc_bytes + - 14 + - Multicast bytes transmitted. + + * - dev_tx_bc_bytes + - 15 + - Broadcast bytes transmitted. + + * - dev_tx_pkts + - 16 + - Total number of packets transmitted. + + * - dev_tx_mc_pkts + - 17 + - Multicast packets transmitted. + + * - dev_tx_bc_pkts + - 18 + - Broadcast packets transmitted. + +Note that statistics unknown to the driver will be displayed as +``dev_unknown_stat$ID``, where ``$ID`` refers to the second column +above. diff -U 3 -H -d -r -N -x .gitignore -x '*.local' -- a/Documentation/networking/device_drivers/stmicro/stmmac.rst b/Documentation/networking/device_drivers/stmicro/stmmac.rst --- a/Documentation/networking/device_drivers/stmicro/stmmac.rst +++ b/Documentation/networking/device_drivers/stmicro/stmmac.rst @@ -0,0 +1,697 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +============================================================== +Linux Driver for the Synopsys(R) Ethernet Controllers "stmmac" +============================================================== + +Authors: Giuseppe Cavallaro , +Alexandre Torgue , Jose Abreu + +Contents +======== + +- In This Release +- Feature List +- Kernel Configuration +- Command Line Parameters +- Driver Information and Notes +- Debug Information +- Support + +In This Release +=============== + +This file describes the stmmac Linux Driver for all the Synopsys(R) Ethernet +Controllers. + +Currently, this network device driver is for all STi embedded MAC/GMAC +(i.e. 7xxx/5xxx SoCs), SPEAr (arm), Loongson1B (mips) and XILINX XC2V3000 +FF1152AMT0221 D1215994A VIRTEX FPGA board. The Synopsys Ethernet QoS 5.0 IPK +is also supported. + +DesignWare(R) Cores Ethernet MAC 10/100/1000 Universal version 3.70a +(and older) and DesignWare(R) Cores Ethernet Quality-of-Service version 4.0 +(and upper) have been used for developing this driver as well as +DesignWare(R) Cores XGMAC - 10G Ethernet MAC. + +This driver supports both the platform bus and PCI. + +This driver includes support for the following Synopsys(R) DesignWare(R) +Cores Ethernet Controllers and corresponding minimum and maximum versions: + ++-------------------------------+--------------+--------------+--------------+ +| Controller Name | Min. Version | Max. Version | Abbrev. Name | ++===============================+==============+==============+==============+ +| Ethernet MAC Universal | N/A | 3.73a | GMAC | ++-------------------------------+--------------+--------------+--------------+ +| Ethernet Quality-of-Service | 4.00a | N/A | GMAC4+ | ++-------------------------------+--------------+--------------+--------------+ +| XGMAC - 10G Ethernet MAC | 2.10a | N/A | XGMAC2+ | ++-------------------------------+--------------+--------------+--------------+ + +For questions related to hardware requirements, refer to the documentation +supplied with your Ethernet adapter. All hardware requirements listed apply +to use with Linux. + +Feature List +============ + +The following features are available in this driver: + - GMII/MII/RGMII/SGMII/RMII/XGMII Interface + - Half-Duplex / Full-Duplex Operation + - Energy Efficient Ethernet (EEE) + - IEEE 802.3x PAUSE Packets (Flow Control) + - RMON/MIB Counters + - IEEE 1588 Timestamping (PTP) + - Pulse-Per-Second Output (PPS) + - MDIO Clause 22 / Clause 45 Interface + - MAC Loopback + - ARP Offloading + - Automatic CRC / PAD Insertion and Checking + - Checksum Offload for Received and Transmitted Packets + - Standard or Jumbo Ethernet Packets + - Source Address Insertion / Replacement + - VLAN TAG Insertion / Replacement / Deletion / Filtering (HASH and PERFECT) + - Programmable TX and RX Watchdog and Coalesce Settings + - Destination Address Filtering (PERFECT) + - HASH Filtering (Multicast) + - Layer 3 / Layer 4 Filtering + - Remote Wake-Up Detection + - Receive Side Scaling (RSS) + - Frame Preemption for TX and RX + - Programmable Burst Length, Threshold, Queue Size + - Multiple Queues (up to 8) + - Multiple Scheduling Algorithms (TX: WRR, DWRR, WFQ, SP, CBS, EST, TBS; + RX: WRR, SP) + - Flexible RX Parser + - TCP / UDP Segmentation Offload (TSO, USO) + - Split Header (SPH) + - Safety Features (ECC Protection, Data Parity Protection) + - Selftests using Ethtool + +Kernel Configuration +==================== + +The kernel configuration option is ``CONFIG_STMMAC_ETH``: + - ``CONFIG_STMMAC_PLATFORM``: is to enable the platform driver. + - ``CONFIG_STMMAC_PCI``: is to enable the pci driver. + +Command Line Parameters +======================= + +If the driver is built as a module the following optional parameters are used +by entering them on the command line with the modprobe command using this +syntax (e.g. for PCI module):: + + modprobe stmmac_pci [