2022-11-10  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* ChangeLogP700.txt, RELEASENOTES.txt, doc/Doxyfile-common,
	  man/man1/PAPI_derived_event_files.1, man/man1/papi_avail.1,
	  man/man1/papi_clockres.1, man/man1/papi_command_line.1,
	  man/man1/papi_component_avail.1, man/man1/papi_cost.1,
	  man/man1/papi_decode.1, man/man1/papi_error_codes.1,
	  man/man1/papi_event_chooser.1, man/man1/papi_hardware_avail.1,
	  man/man1/papi_hybrid_native_avail.1, man/man1/papi_mem_info.1,
	  man/man1/papi_multiplex_cost.1, man/man1/papi_native_avail.1,
	  man/man1/papi_version.1, man/man1/papi_xml_event_info.1,
	  man/man3/PAPIF_accum.3, man/man3/PAPIF_add_event.3,
	  man/man3/PAPIF_add_events.3, man/man3/PAPIF_add_named_event.3,
	  man/man3/PAPIF_assign_eventset_component.3,
	  man/man3/PAPIF_cleanup_eventset.3,
	  man/man3/PAPIF_create_eventset.3,
	  man/man3/PAPIF_destroy_eventset.3, man/man3/PAPIF_enum_dev_type.3,
	  man/man3/PAPIF_enum_event.3, man/man3/PAPIF_epc.3,
	  man/man3/PAPIF_event_code_to_name.3,
	  man/man3/PAPIF_event_name_to_code.3, man/man3/PAPIF_flips_rate.3,
	  man/man3/PAPIF_flops_rate.3, man/man3/PAPIF_get_clockrate.3,
	  man/man3/PAPIF_get_dev_attr.3, man/man3/PAPIF_get_dev_type_attr.3,
	  man/man3/PAPIF_get_dmem_info.3, man/man3/PAPIF_get_domain.3,
	  man/man3/PAPIF_get_event_info.3, man/man3/PAPIF_get_exe_info.3,
	  man/man3/PAPIF_get_granularity.3,
	  man/man3/PAPIF_get_hardware_info.3, man/man3/PAPIF_get_multiplex.3,
	  man/man3/PAPIF_get_preload.3, man/man3/PAPIF_get_real_cyc.3,
	  man/man3/PAPIF_get_real_nsec.3, man/man3/PAPIF_get_real_usec.3,
	  man/man3/PAPIF_get_virt_cyc.3, man/man3/PAPIF_get_virt_usec.3,
	  man/man3/PAPIF_ipc.3, man/man3/PAPIF_is_initialized.3,
	  man/man3/PAPIF_library_init.3, man/man3/PAPIF_lock.3,
	  man/man3/PAPIF_multiplex_init.3, man/man3/PAPIF_num_cmp_hwctrs.3,
	  man/man3/PAPIF_num_events.3, man/man3/PAPIF_num_hwctrs.3,
	  man/man3/PAPIF_perror.3, man/man3/PAPIF_query_event.3,
	  man/man3/PAPIF_query_named_event.3, man/man3/PAPIF_rate_stop.3,
	  man/man3/PAPIF_read.3, man/man3/PAPIF_read_ts.3,
	  man/man3/PAPIF_register_thread.3, man/man3/PAPIF_remove_event.3,
	  man/man3/PAPIF_remove_events.3,
	  man/man3/PAPIF_remove_named_event.3, man/man3/PAPIF_reset.3,
	  man/man3/PAPIF_set_cmp_domain.3,
	  man/man3/PAPIF_set_cmp_granularity.3, man/man3/PAPIF_set_debug.3,
	  man/man3/PAPIF_set_domain.3, man/man3/PAPIF_set_event_domain.3,
	  man/man3/PAPIF_set_granularity.3, man/man3/PAPIF_set_inherit.3,
	  man/man3/PAPIF_set_multiplex.3, man/man3/PAPIF_shutdown.3,
	  man/man3/PAPIF_start.3, man/man3/PAPIF_state.3,
	  man/man3/PAPIF_stop.3, man/man3/PAPIF_thread_id.3,
	  man/man3/PAPIF_thread_init.3, man/man3/PAPIF_unlock.3,
	  man/man3/PAPIF_unregister_thread.3, man/man3/PAPIF_write.3,
	  man/man3/PAPI_accum.3, man/man3/PAPI_add_event.3,
	  man/man3/PAPI_add_events.3, man/man3/PAPI_add_named_event.3,
	  man/man3/PAPI_addr_range_option_t.3, man/man3/PAPI_address_map_t.3,
	  man/man3/PAPI_all_thr_spec_t.3,
	  man/man3/PAPI_assign_eventset_component.3, man/man3/PAPI_attach.3,
	  man/man3/PAPI_attach_option_t.3, man/man3/PAPI_cleanup_eventset.3,
	  man/man3/PAPI_component_info_t.3, man/man3/PAPI_cpu_option_t.3,
	  man/man3/PAPI_create_eventset.3, man/man3/PAPI_debug_option_t.3,
	  man/man3/PAPI_destroy_eventset.3, man/man3/PAPI_detach.3,
	  man/man3/PAPI_disable_component.3,
	  man/man3/PAPI_disable_component_by_name.3,
	  man/man3/PAPI_dmem_info_t.3, man/man3/PAPI_domain_option_t.3,
	  man/man3/PAPI_enum_cmp_event.3, man/man3/PAPI_enum_dev_type.3,
	  man/man3/PAPI_enum_event.3, man/man3/PAPI_epc.3,
	  man/man3/PAPI_event_code_to_name.3, man/man3/PAPI_event_info_t.3,
	  man/man3/PAPI_event_name_to_code.3, man/man3/PAPI_exe_info_t.3,
	  man/man3/PAPI_flips_rate.3, man/man3/PAPI_flops_rate.3,
	  man/man3/PAPI_get_cmp_opt.3, man/man3/PAPI_get_component_index.3,
	  man/man3/PAPI_get_component_info.3, man/man3/PAPI_get_dev_attr.3,
	  man/man3/PAPI_get_dev_type_attr.3, man/man3/PAPI_get_dmem_info.3,
	  man/man3/PAPI_get_event_component.3,
	  man/man3/PAPI_get_event_info.3,
	  man/man3/PAPI_get_eventset_component.3,
	  man/man3/PAPI_get_executable_info.3,
	  man/man3/PAPI_get_hardware_info.3, man/man3/PAPI_get_multiplex.3,
	  man/man3/PAPI_get_opt.3, man/man3/PAPI_get_overflow_event_index.3,
	  man/man3/PAPI_get_real_cyc.3, man/man3/PAPI_get_real_nsec.3,
	  man/man3/PAPI_get_real_usec.3, man/man3/PAPI_get_shared_lib_info.3,
	  man/man3/PAPI_get_thr_specific.3, man/man3/PAPI_get_virt_cyc.3,
	  man/man3/PAPI_get_virt_nsec.3, man/man3/PAPI_get_virt_usec.3,
	  man/man3/PAPI_granularity_option_t.3, man/man3/PAPI_hl_read.3,
	  man/man3/PAPI_hl_region_begin.3, man/man3/PAPI_hl_region_end.3,
	  man/man3/PAPI_hl_stop.3, man/man3/PAPI_hw_info_t.3,
	  man/man3/PAPI_inherit_option_t.3, man/man3/PAPI_ipc.3,
	  man/man3/PAPI_is_initialized.3, man/man3/PAPI_itimer_option_t.3,
	  man/man3/PAPI_library_init.3, man/man3/PAPI_list_events.3,
	  man/man3/PAPI_list_threads.3, man/man3/PAPI_lock.3,
	  man/man3/PAPI_mh_cache_info_t.3, man/man3/PAPI_mh_info_t.3,
	  man/man3/PAPI_mh_level_t.3, man/man3/PAPI_mh_tlb_info_t.3,
	  man/man3/PAPI_mpx_info_t.3, man/man3/PAPI_multiplex_init.3,
	  man/man3/PAPI_multiplex_option_t.3, man/man3/PAPI_num_cmp_hwctrs.3,
	  man/man3/PAPI_num_components.3, man/man3/PAPI_num_events.3,
	  man/man3/PAPI_num_hwctrs.3, man/man3/PAPI_option_t.3,
	  man/man3/PAPI_overflow.3, man/man3/PAPI_perror.3,
	  man/man3/PAPI_preload_info_t.3, man/man3/PAPI_profil.3,
	  man/man3/PAPI_query_event.3, man/man3/PAPI_query_named_event.3,
	  man/man3/PAPI_rate_stop.3, man/man3/PAPI_read.3,
	  man/man3/PAPI_read_ts.3, man/man3/PAPI_register_thread.3,
	  man/man3/PAPI_remove_event.3, man/man3/PAPI_remove_events.3,
	  man/man3/PAPI_remove_named_event.3, man/man3/PAPI_reset.3,
	  man/man3/PAPI_set_cmp_domain.3,
	  man/man3/PAPI_set_cmp_granularity.3, man/man3/PAPI_set_debug.3,
	  man/man3/PAPI_set_domain.3, man/man3/PAPI_set_granularity.3,
	  man/man3/PAPI_set_multiplex.3, man/man3/PAPI_set_opt.3,
	  man/man3/PAPI_set_thr_specific.3, man/man3/PAPI_shlib_info_t.3,
	  man/man3/PAPI_shutdown.3, man/man3/PAPI_sprofil.3,
	  man/man3/PAPI_sprofil_t.3, man/man3/PAPI_start.3,
	  man/man3/PAPI_state.3, man/man3/PAPI_stop.3,
	  man/man3/PAPI_strerror.3, man/man3/PAPI_thread_id.3,
	  man/man3/PAPI_thread_init.3, man/man3/PAPI_unlock.3,
	  man/man3/PAPI_unregister_thread.3, man/man3/PAPI_write.3,
	  man/man3/PAPIf_hl_read.3, man/man3/PAPIf_hl_region_begin.3,
	  man/man3/PAPIf_hl_region_end.3, man/man3/PAPIf_hl_stop.3,
	  man/man3/RateInfo.3, man/man3/binary_tree_t.3,
	  man/man3/components_t.3, man/man3/local_components_t.3,
	  man/man3/reads_t.3, man/man3/regions_t.3, man/man3/threads_t.3,
	  man/man3/value_t.3, papi.spec, src/Makefile.in, src/configure,
	  src/configure.in, src/papi.h: release: preparation for release
	  commit  - Update documentation - Update version
	* src/validation_tests/papi_br_tkn.c: papi_br_tkn: add not taken
	  branch event to the right eventset  The branch not taken event is
	  added to the eventset for branch taken. Add the not taken event to
	  the right eventset.
	* man/man3/PAPIF_enum_dev_type.3, man/man3/PAPIF_get_dev_attr.3,
	  man/man3/PAPIF_get_dev_type_attr.3: sysdetect: add missing fortran
	  man pages  Man pages for PAPIF_enum_dev_type,
	  PAPIF_get_dev_type_attr and PAPIF_get_dev_attr were missing.

2022-11-08  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* .../sysdetect/tests/query_device_simple_f.F: sysdetect: update test
	  to reflect 'list' argument removal  Commit 482e8c5f1 removed the
	  'list' argument from papif_get_dev_attr fortran wrapper. However,
	  the test still passed 'dummy_list' to every call of the function.
	  This cause the len of the string to be read from the wrong argument
	  and the following 'strncpy' to segfault.

2022-11-02  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* man/man1/PAPI_derived_event_files.1, man/man1/papi_avail.1,
	  man/man1/papi_clockres.1, man/man1/papi_command_line.1,
	  man/man1/papi_component_avail.1, man/man1/papi_cost.1,
	  man/man1/papi_decode.1, man/man1/papi_error_codes.1,
	  man/man1/papi_event_chooser.1, man/man1/papi_hardware_avail.1,
	  man/man1/papi_hybrid_native_avail.1, man/man1/papi_mem_info.1,
	  man/man1/papi_multiplex_cost.1, man/man1/papi_native_avail.1,
	  man/man1/papi_version.1, man/man1/papi_xml_event_info.1,
	  man/man3/PAPIF_accum.3, man/man3/PAPIF_add_event.3,
	  man/man3/PAPIF_add_events.3, man/man3/PAPIF_add_named_event.3,
	  man/man3/PAPIF_assign_eventset_component.3,
	  man/man3/PAPIF_cleanup_eventset.3,
	  man/man3/PAPIF_create_eventset.3,
	  man/man3/PAPIF_destroy_eventset.3, man/man3/PAPIF_enum_event.3,
	  man/man3/PAPIF_epc.3, man/man3/PAPIF_event_code_to_name.3,
	  man/man3/PAPIF_event_name_to_code.3, man/man3/PAPIF_flips_rate.3,
	  man/man3/PAPIF_flops_rate.3, man/man3/PAPIF_get_clockrate.3,
	  man/man3/PAPIF_get_dmem_info.3, man/man3/PAPIF_get_domain.3,
	  man/man3/PAPIF_get_event_info.3, man/man3/PAPIF_get_exe_info.3,
	  man/man3/PAPIF_get_granularity.3,
	  man/man3/PAPIF_get_hardware_info.3, man/man3/PAPIF_get_multiplex.3,
	  man/man3/PAPIF_get_preload.3, man/man3/PAPIF_get_real_cyc.3,
	  man/man3/PAPIF_get_real_nsec.3, man/man3/PAPIF_get_real_usec.3,
	  man/man3/PAPIF_get_virt_cyc.3, man/man3/PAPIF_get_virt_usec.3,
	  man/man3/PAPIF_ipc.3, man/man3/PAPIF_is_initialized.3,
	  man/man3/PAPIF_library_init.3, man/man3/PAPIF_lock.3,
	  man/man3/PAPIF_multiplex_init.3, man/man3/PAPIF_num_cmp_hwctrs.3,
	  man/man3/PAPIF_num_events.3, man/man3/PAPIF_num_hwctrs.3,
	  man/man3/PAPIF_perror.3, man/man3/PAPIF_query_event.3,
	  man/man3/PAPIF_query_named_event.3, man/man3/PAPIF_rate_stop.3,
	  man/man3/PAPIF_read.3, man/man3/PAPIF_read_ts.3,
	  man/man3/PAPIF_register_thread.3, man/man3/PAPIF_remove_event.3,
	  man/man3/PAPIF_remove_events.3,
	  man/man3/PAPIF_remove_named_event.3, man/man3/PAPIF_reset.3,
	  man/man3/PAPIF_set_cmp_domain.3,
	  man/man3/PAPIF_set_cmp_granularity.3, man/man3/PAPIF_set_debug.3,
	  man/man3/PAPIF_set_domain.3, man/man3/PAPIF_set_event_domain.3,
	  man/man3/PAPIF_set_granularity.3, man/man3/PAPIF_set_inherit.3,
	  man/man3/PAPIF_set_multiplex.3, man/man3/PAPIF_shutdown.3,
	  man/man3/PAPIF_start.3, man/man3/PAPIF_state.3,
	  man/man3/PAPIF_stop.3, man/man3/PAPIF_thread_id.3,
	  man/man3/PAPIF_thread_init.3, man/man3/PAPIF_unlock.3,
	  man/man3/PAPIF_unregister_thread.3, man/man3/PAPIF_write.3,
	  man/man3/PAPI_accum.3, man/man3/PAPI_add_event.3,
	  man/man3/PAPI_add_events.3, man/man3/PAPI_add_named_event.3,
	  man/man3/PAPI_addr_range_option_t.3, man/man3/PAPI_address_map_t.3,
	  man/man3/PAPI_all_thr_spec_t.3,
	  man/man3/PAPI_assign_eventset_component.3, man/man3/PAPI_attach.3,
	  man/man3/PAPI_attach_option_t.3, man/man3/PAPI_cleanup_eventset.3,
	  man/man3/PAPI_component_info_t.3, man/man3/PAPI_cpu_option_t.3,
	  man/man3/PAPI_create_eventset.3, man/man3/PAPI_debug_option_t.3,
	  man/man3/PAPI_destroy_eventset.3, man/man3/PAPI_detach.3,
	  man/man3/PAPI_disable_component.3,
	  man/man3/PAPI_disable_component_by_name.3,
	  man/man3/PAPI_dmem_info_t.3, man/man3/PAPI_domain_option_t.3,
	  man/man3/PAPI_enum_cmp_event.3, man/man3/PAPI_enum_dev_type.3,
	  man/man3/PAPI_enum_event.3, man/man3/PAPI_epc.3,
	  man/man3/PAPI_event_code_to_name.3, man/man3/PAPI_event_info_t.3,
	  man/man3/PAPI_event_name_to_code.3, man/man3/PAPI_exe_info_t.3,
	  man/man3/PAPI_flips_rate.3, man/man3/PAPI_flops_rate.3,
	  man/man3/PAPI_get_cmp_opt.3, man/man3/PAPI_get_component_index.3,
	  man/man3/PAPI_get_component_info.3, man/man3/PAPI_get_dev_attr.3,
	  man/man3/PAPI_get_dev_type_attr.3, man/man3/PAPI_get_dmem_info.3,
	  man/man3/PAPI_get_event_component.3,
	  man/man3/PAPI_get_event_info.3,
	  man/man3/PAPI_get_eventset_component.3,
	  man/man3/PAPI_get_executable_info.3,
	  man/man3/PAPI_get_hardware_info.3, man/man3/PAPI_get_multiplex.3,
	  man/man3/PAPI_get_opt.3, man/man3/PAPI_get_overflow_event_index.3,
	  man/man3/PAPI_get_real_cyc.3, man/man3/PAPI_get_real_nsec.3,
	  man/man3/PAPI_get_real_usec.3, man/man3/PAPI_get_shared_lib_info.3,
	  man/man3/PAPI_get_thr_specific.3, man/man3/PAPI_get_virt_cyc.3,
	  man/man3/PAPI_get_virt_nsec.3, man/man3/PAPI_get_virt_usec.3,
	  man/man3/PAPI_granularity_option_t.3, man/man3/PAPI_hl_read.3,
	  man/man3/PAPI_hl_region_begin.3, man/man3/PAPI_hl_region_end.3,
	  man/man3/PAPI_hl_stop.3, man/man3/PAPI_hw_info_t.3,
	  man/man3/PAPI_inherit_option_t.3, man/man3/PAPI_ipc.3,
	  man/man3/PAPI_is_initialized.3, man/man3/PAPI_itimer_option_t.3,
	  man/man3/PAPI_library_init.3, man/man3/PAPI_list_events.3,
	  man/man3/PAPI_list_threads.3, man/man3/PAPI_lock.3,
	  man/man3/PAPI_mh_cache_info_t.3, man/man3/PAPI_mh_info_t.3,
	  man/man3/PAPI_mh_level_t.3, man/man3/PAPI_mh_tlb_info_t.3,
	  man/man3/PAPI_mpx_info_t.3, man/man3/PAPI_multiplex_init.3,
	  man/man3/PAPI_multiplex_option_t.3, man/man3/PAPI_num_cmp_hwctrs.3,
	  man/man3/PAPI_num_components.3, man/man3/PAPI_num_events.3,
	  man/man3/PAPI_num_hwctrs.3, man/man3/PAPI_option_t.3,
	  man/man3/PAPI_overflow.3, man/man3/PAPI_perror.3,
	  man/man3/PAPI_preload_info_t.3, man/man3/PAPI_profil.3,
	  man/man3/PAPI_query_event.3, man/man3/PAPI_query_named_event.3,
	  man/man3/PAPI_rate_stop.3, man/man3/PAPI_read.3,
	  man/man3/PAPI_read_ts.3, man/man3/PAPI_register_thread.3,
	  man/man3/PAPI_remove_event.3, man/man3/PAPI_remove_events.3,
	  man/man3/PAPI_remove_named_event.3, man/man3/PAPI_reset.3,
	  man/man3/PAPI_set_cmp_domain.3,
	  man/man3/PAPI_set_cmp_granularity.3, man/man3/PAPI_set_debug.3,
	  man/man3/PAPI_set_domain.3, man/man3/PAPI_set_granularity.3,
	  man/man3/PAPI_set_multiplex.3, man/man3/PAPI_set_opt.3,
	  man/man3/PAPI_set_thr_specific.3, man/man3/PAPI_shlib_info_t.3,
	  man/man3/PAPI_shutdown.3, man/man3/PAPI_sprofil.3,
	  man/man3/PAPI_sprofil_t.3, man/man3/PAPI_start.3,
	  man/man3/PAPI_state.3, man/man3/PAPI_stop.3,
	  man/man3/PAPI_strerror.3, man/man3/PAPI_thread_id.3,
	  man/man3/PAPI_thread_init.3, man/man3/PAPI_unlock.3,
	  man/man3/PAPI_unregister_thread.3, man/man3/PAPI_write.3,
	  man/man3/PAPIf_hl_read.3, man/man3/PAPIf_hl_region_begin.3,
	  man/man3/PAPIf_hl_region_end.3, man/man3/PAPIf_hl_stop.3,
	  man/man3/RateInfo.3, man/man3/binary_tree_t.3,
	  man/man3/components_t.3, man/man3/local_components_t.3,
	  man/man3/reads_t.3, man/man3/regions_t.3, man/man3/threads_t.3,
	  man/man3/value_t.3: sysdetect: regenerate man pages for updated
	  attributes
	* src/papi.c: sysdetect: remove unused attributes from doc
	* src/components/sysdetect/tests/query_device_mpi.c: sysdetect: white
	  space cleanup

2022-11-02  John Rodgers <john.rodgers@hpe.com>

	* src/components/cuda/linux-cuda.c: CUDA: Align memory zero with pad
	  Update logic in `cuda11_makeRoomAllEvents` to ensure the memory
	  zero'ing operation covers the amount expanded by the `realloc`
	  operation.
	* src/components/cuda/linux-cuda.c: CUDA: CUPTI11 Sporadic Memory
	  Failures  The CUPTI11 portion of the cuda component has exhibited
	  sporadic memory failures for applications compiled against
	  MVAPICH's libmpi.so. Specifically, the realloc operation in
	  `cuda11_makeRoomAllEvents`, called in `_cuda11_add_native_events`,
	  would fail even when there was sufficient memory to complete the
	  requested allocation.  As a workaround, this patch prevents the
	  failure by allocating the expected memory up front prior to the
	  device loop in `_cuda11_add_native_events`.
	* src/components/cuda/linux-cuda.c: CUDA: Prevent memory leak
	  Prevent memory leak by freeing `firstLast` buffer in
	  `_cuda11_add_native_events`.
	* src/components/cuda/linux-cuda.c: CUDA: Remove unnecessary code
	  Remove logic only necessary when trying to resolve counters without
	  an active profiling session. Given that a profiling session is
	  created and active (see: _cuda11_add_native_events ->
	  _cuda11_init_profiler) creation and usage of
	  `cuda11_CounterAvailabilityImage` is unnecessary.
	* src/components/cuda/linux-cuda.c: CUDA: Prevent component deadlock
	  Add missing component unlock to `_cuda_update_control_state` to
	  prevent deadlocks encountered when adding multiple events
	  sequentially.  Patch resolves issue #121
	* src/components/cuda/linux-cuda.c, src/components/nvml/linux-nvml.c,
	  src/components/rocm/rocm.c, src/components/rocm_smi/linux-rocm-
	  smi.c: DELAY_INIT: Set disabled for delay init comps  Ensure
	  components that leverage the delayed initialization scheme, namely
	  cuda, nvml, rocm, and rocm_smi, set thier respective <papi-
	  vector>.cmp_info.disabled flag with `PAPI_EDELAY_INIT` when
	  completing the standard component initialization.  Update necessary
	  to conform with PR: 328

2022-10-24  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/pcp/linux-pcp.c, src/papi.h: pcp: warning instead of
	  error when 'reason' string truncated  When the hostname is too
	  long, there is not enough memory allocated for the error 'reason'
	  string. This caused the component to prematurely exit
	  initialization when the PM daemon is not active. Instead, a warning
	  is now issued, and the initialization exits appropriately.
	  Additionally, the size of the 'reason' string has been increased to
	  accommodate longer host names.  These changes have been tested on
	  the IBM POWER9 architecture.

2022-10-28  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/main.c: cat: support to comment-out
	  lines in input file  These changes add support for users to
	  comment-out lines in the input file. This allows users to more
	  flexibly take measurements without having to remove lines or use
	  multiple input files.  These changes have been tested on the AMD
	  Zen3 architecture.

2022-10-27  Anthony Danalis <adanalis@icl.utk.edu>

	* src/validation_tests/branches_testcode.c,
	  src/validation_tests/papi_br_msp.c: Improved the branch
	  misprediction validation test.  The previous version of the branch
	  misprediction validation test relied on the libc function random()
	  to generate entropy. However, this function introduced 15x more
	  branches than the number of branches in the code of the validation
	  test, polluting the results. The new code uses an inline Xorshift
	  pseudo-random number generator which is more than sufficient to
	  confuse the branch predictor, and does not contain any branch
	  instructions so it does not pollute the event measurement. Also,
	  the logic of the test has been simplified.

2022-10-27  Anthony <adanalis@icl.utk.edu>

	* src/sde_lib/sde_lib_datastructures.c: Removed unneeded NULL pointer
	  checks in libsde.

2022-09-14  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* man/man1/PAPI_derived_event_files.1, man/man1/papi_avail.1,
	  man/man1/papi_clockres.1, man/man1/papi_command_line.1,
	  man/man1/papi_component_avail.1, man/man1/papi_cost.1,
	  man/man1/papi_decode.1, man/man1/papi_error_codes.1,
	  man/man1/papi_event_chooser.1, man/man1/papi_hardware_avail.1,
	  man/man1/papi_hybrid_native_avail.1, man/man1/papi_mem_info.1,
	  man/man1/papi_multiplex_cost.1, man/man1/papi_native_avail.1,
	  man/man1/papi_version.1, man/man1/papi_xml_event_info.1,
	  man/man3/PAPIF_accum.3, man/man3/PAPIF_add_event.3,
	  man/man3/PAPIF_add_events.3, man/man3/PAPIF_add_named_event.3,
	  man/man3/PAPIF_assign_eventset_component.3,
	  man/man3/PAPIF_cleanup_eventset.3,
	  man/man3/PAPIF_create_eventset.3,
	  man/man3/PAPIF_destroy_eventset.3, man/man3/PAPIF_enum_event.3,
	  man/man3/PAPIF_epc.3, man/man3/PAPIF_event_code_to_name.3,
	  man/man3/PAPIF_event_name_to_code.3, man/man3/PAPIF_flips_rate.3,
	  man/man3/PAPIF_flops_rate.3, man/man3/PAPIF_get_clockrate.3,
	  man/man3/PAPIF_get_dmem_info.3, man/man3/PAPIF_get_domain.3,
	  man/man3/PAPIF_get_event_info.3, man/man3/PAPIF_get_exe_info.3,
	  man/man3/PAPIF_get_granularity.3,
	  man/man3/PAPIF_get_hardware_info.3, man/man3/PAPIF_get_multiplex.3,
	  man/man3/PAPIF_get_preload.3, man/man3/PAPIF_get_real_cyc.3,
	  man/man3/PAPIF_get_real_nsec.3, man/man3/PAPIF_get_real_usec.3,
	  man/man3/PAPIF_get_virt_cyc.3, man/man3/PAPIF_get_virt_usec.3,
	  man/man3/PAPIF_ipc.3, man/man3/PAPIF_is_initialized.3,
	  man/man3/PAPIF_library_init.3, man/man3/PAPIF_lock.3,
	  man/man3/PAPIF_multiplex_init.3, man/man3/PAPIF_num_cmp_hwctrs.3,
	  man/man3/PAPIF_num_events.3, man/man3/PAPIF_num_hwctrs.3,
	  man/man3/PAPIF_perror.3, man/man3/PAPIF_query_event.3,
	  man/man3/PAPIF_query_named_event.3, man/man3/PAPIF_rate_stop.3,
	  man/man3/PAPIF_read.3, man/man3/PAPIF_read_ts.3,
	  man/man3/PAPIF_register_thread.3, man/man3/PAPIF_remove_event.3,
	  man/man3/PAPIF_remove_events.3,
	  man/man3/PAPIF_remove_named_event.3, man/man3/PAPIF_reset.3,
	  man/man3/PAPIF_set_cmp_domain.3,
	  man/man3/PAPIF_set_cmp_granularity.3, man/man3/PAPIF_set_debug.3,
	  man/man3/PAPIF_set_domain.3, man/man3/PAPIF_set_event_domain.3,
	  man/man3/PAPIF_set_granularity.3, man/man3/PAPIF_set_inherit.3,
	  man/man3/PAPIF_set_multiplex.3, man/man3/PAPIF_shutdown.3,
	  man/man3/PAPIF_start.3, man/man3/PAPIF_state.3,
	  man/man3/PAPIF_stop.3, man/man3/PAPIF_thread_id.3,
	  man/man3/PAPIF_thread_init.3, man/man3/PAPIF_unlock.3,
	  man/man3/PAPIF_unregister_thread.3, man/man3/PAPIF_write.3,
	  man/man3/PAPI_accum.3, man/man3/PAPI_add_event.3,
	  man/man3/PAPI_add_events.3, man/man3/PAPI_add_named_event.3,
	  man/man3/PAPI_addr_range_option_t.3, man/man3/PAPI_address_map_t.3,
	  man/man3/PAPI_all_thr_spec_t.3,
	  man/man3/PAPI_assign_eventset_component.3, man/man3/PAPI_attach.3,
	  man/man3/PAPI_attach_option_t.3, man/man3/PAPI_cleanup_eventset.3,
	  man/man3/PAPI_component_info_t.3, man/man3/PAPI_cpu_option_t.3,
	  man/man3/PAPI_create_eventset.3, man/man3/PAPI_debug_option_t.3,
	  man/man3/PAPI_destroy_eventset.3, man/man3/PAPI_detach.3,
	  man/man3/PAPI_disable_component.3,
	  man/man3/PAPI_disable_component_by_name.3,
	  man/man3/PAPI_dmem_info_t.3, man/man3/PAPI_domain_option_t.3,
	  man/man3/PAPI_enum_cmp_event.3, man/man3/PAPI_enum_dev_type.3,
	  man/man3/PAPI_enum_event.3, man/man3/PAPI_epc.3,
	  man/man3/PAPI_event_code_to_name.3, man/man3/PAPI_event_info_t.3,
	  man/man3/PAPI_event_name_to_code.3, man/man3/PAPI_exe_info_t.3,
	  man/man3/PAPI_flips_rate.3, man/man3/PAPI_flops_rate.3,
	  man/man3/PAPI_get_cmp_opt.3, man/man3/PAPI_get_component_index.3,
	  man/man3/PAPI_get_component_info.3, man/man3/PAPI_get_dev_attr.3,
	  man/man3/PAPI_get_dev_type_attr.3, man/man3/PAPI_get_dmem_info.3,
	  man/man3/PAPI_get_event_component.3,
	  man/man3/PAPI_get_event_info.3,
	  man/man3/PAPI_get_eventset_component.3,
	  man/man3/PAPI_get_executable_info.3,
	  man/man3/PAPI_get_hardware_info.3, man/man3/PAPI_get_multiplex.3,
	  man/man3/PAPI_get_opt.3, man/man3/PAPI_get_overflow_event_index.3,
	  man/man3/PAPI_get_real_cyc.3, man/man3/PAPI_get_real_nsec.3,
	  man/man3/PAPI_get_real_usec.3, man/man3/PAPI_get_shared_lib_info.3,
	  man/man3/PAPI_get_thr_specific.3, man/man3/PAPI_get_virt_cyc.3,
	  man/man3/PAPI_get_virt_nsec.3, man/man3/PAPI_get_virt_usec.3,
	  man/man3/PAPI_granularity_option_t.3, man/man3/PAPI_hl_read.3,
	  man/man3/PAPI_hl_region_begin.3, man/man3/PAPI_hl_region_end.3,
	  man/man3/PAPI_hl_stop.3, man/man3/PAPI_hw_info_t.3,
	  man/man3/PAPI_inherit_option_t.3, man/man3/PAPI_ipc.3,
	  man/man3/PAPI_is_initialized.3, man/man3/PAPI_itimer_option_t.3,
	  man/man3/PAPI_library_init.3, man/man3/PAPI_list_events.3,
	  man/man3/PAPI_list_threads.3, man/man3/PAPI_lock.3,
	  man/man3/PAPI_mh_cache_info_t.3, man/man3/PAPI_mh_info_t.3,
	  man/man3/PAPI_mh_level_t.3, man/man3/PAPI_mh_tlb_info_t.3,
	  man/man3/PAPI_mpx_info_t.3, man/man3/PAPI_multiplex_init.3,
	  man/man3/PAPI_multiplex_option_t.3, man/man3/PAPI_num_cmp_hwctrs.3,
	  man/man3/PAPI_num_components.3, man/man3/PAPI_num_events.3,
	  man/man3/PAPI_num_hwctrs.3, man/man3/PAPI_option_t.3,
	  man/man3/PAPI_overflow.3, man/man3/PAPI_perror.3,
	  man/man3/PAPI_preload_info_t.3, man/man3/PAPI_profil.3,
	  man/man3/PAPI_query_event.3, man/man3/PAPI_query_named_event.3,
	  man/man3/PAPI_rate_stop.3, man/man3/PAPI_read.3,
	  man/man3/PAPI_read_ts.3, man/man3/PAPI_register_thread.3,
	  man/man3/PAPI_remove_event.3, man/man3/PAPI_remove_events.3,
	  man/man3/PAPI_remove_named_event.3, man/man3/PAPI_reset.3,
	  man/man3/PAPI_set_cmp_domain.3,
	  man/man3/PAPI_set_cmp_granularity.3, man/man3/PAPI_set_debug.3,
	  man/man3/PAPI_set_domain.3, man/man3/PAPI_set_granularity.3,
	  man/man3/PAPI_set_multiplex.3, man/man3/PAPI_set_opt.3,
	  man/man3/PAPI_set_thr_specific.3, man/man3/PAPI_shlib_info_t.3,
	  man/man3/PAPI_shutdown.3, man/man3/PAPI_sprofil.3,
	  man/man3/PAPI_sprofil_t.3, man/man3/PAPI_start.3,
	  man/man3/PAPI_state.3, man/man3/PAPI_stop.3,
	  man/man3/PAPI_strerror.3, man/man3/PAPI_thread_id.3,
	  man/man3/PAPI_thread_init.3, man/man3/PAPI_unlock.3,
	  man/man3/PAPI_unregister_thread.3, man/man3/PAPI_write.3,
	  man/man3/PAPIf_hl_read.3, man/man3/PAPIf_hl_region_begin.3,
	  man/man3/PAPIf_hl_region_end.3, man/man3/PAPIf_hl_stop.3,
	  man/man3/RateInfo.3, man/man3/binary_tree_t.3,
	  man/man3/components_t.3, man/man3/local_components_t.3,
	  man/man3/reads_t.3, man/man3/regions_t.3, man/man3/threads_t.3,
	  man/man3/value_t.3: doc: regenerate man pages

2022-10-25  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/utils/papi_hardware_avail.c: papi_hardware_avail: print thread
	  affinity list for numas

2022-10-26  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/tests/query_device_mpi.c: sysdetect: add
	  GPU affinity example in tests  The GPU affinity example utilizes
	  MPI shared memory windows to workout the GPU affinity of every MPI
	  rank. The first rank in every GPU rank list prints the list of rank
	  for the give GPU.
	* src/components/Makefile_comp_tests.target.in,
	  src/components/sysdetect/tests/Makefile: sysdetect: hook mpi tests
	  to NO_MPI_TESTS  The configure step in PAPI checks whether MPI
	  tests can be enabled or not. If not it sets NO_MPI_TESTS to yes.
	  This variable is then used in ctests/Makefile.recipies to enable or
	  disable MPI tests. The sysdetect tests were not relying on this
	  variable. Instead sysdetect relied on MPICC being set which is no
	  accurate. This patch make the MPI checks more uniform across the
	  code by adding NO_MPI_TESTS checks in sysdetect tests too.

2022-10-25  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/sysdetect.c: sysdetect: add
	  PAPI_DEV_ATTR__CPU_UINT_THR_NUMA_AFFINITY
	  PAPI_DEV_ATTR__CPU_UINT_THR_NUMA_AFFINITY was missing in sysdetect.
	  This attribute can be used to discover the numa affinity of every
	  hardware thread in the system.
	* src/components/sysdetect/Rules.sysdetect,
	  src/components/sysdetect/amd_gpu.c,
	  src/components/sysdetect/nvidia_gpu.c,
	  src/components/sysdetect/shm.c, src/components/sysdetect/shm.h,
	  src/components/sysdetect/sysdetect.c,
	  src/components/sysdetect/sysdetect.h,
	  src/components/sysdetect/tests/query_device_mpi.c,
	  .../sysdetect/tests/query_device_simple.c,
	  .../sysdetect/tests/query_device_simple_f.F, src/configure,
	  src/configure.in, src/genpapifdef.c, src/papi.h,
	  src/papi_fwrappers.c, src/utils/papi_hardware_avail.c: sysdetect:
	  remove builtin support for numa and GPU affinity  Numa and GPU
	  affinity of threads and MPI ranks adds an MPI dependency to PAPI
	  that may cause problems (link time unresolved MPI symbols) if the
	  application using PAPI does not link against MPI. Most of the work
	  that sysdetect currently does to provide affinity lists to the
	  users can be easily done by the users themselves. Thus, sysdetect
	  will no longer support them.

2022-10-12  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/Makefile,
	  src/counter_analysis_toolkit/vec.c: cat: ifdefs for AVX
	  availability  Utilize ifdefs so that the build can be more flexible
	  between systems with different AVX vector-width availability.
	* src/counter_analysis_toolkit/Makefile,
	  src/counter_analysis_toolkit/vec_arch.h,
	  src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_hp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_nonfma_dp.c,
	  src/counter_analysis_toolkit/vec_nonfma_hp.c,
	  src/counter_analysis_toolkit/vec_nonfma_sp.c: cat: specify
	  architecture in macros  Rename VEC_WIDTH_[128|256|512] to
	  X86_VEC_WIDTH_[128|256|512]B to be more specific.
	* src/counter_analysis_toolkit/vec_arch.h: cat: remove unused
	  typedef; add used typedef  Typedef 'half' since this type is
	  actually used in the code, and remove HP_SCALAR_TYPE.

2022-10-11  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/Makefile,
	  src/counter_analysis_toolkit/vec.c,
	  src/counter_analysis_toolkit/vec_arch.h,
	  src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_hp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_nonfma_dp.c,
	  src/counter_analysis_toolkit/vec_nonfma_hp.c,
	  src/counter_analysis_toolkit/vec_nonfma_sp.c: cat: rename macros
	  for POWER architecture  For the sake of consistency, use "POWER"
	  instead of "IBM."
	* src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_scalar_verify.c: cat: remove
	  unused code  Remove unused AMD Bulldozer intrinsics.

2022-09-19  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/Makefile,
	  src/counter_analysis_toolkit/vec.c,
	  src/counter_analysis_toolkit/vec_arch.h: cat: consolidate 'INTEL'
	  and 'AMD' flags for vector FLOPs benchmark  Since the ifdefs which
	  check whether "INTEL" is defined also check whether "AMD" is
	  defined, use "X86" for both.  These changes have been tested on the
	  Intel Ice Lake architecture.
	* src/counter_analysis_toolkit/vec.c,
	  src/counter_analysis_toolkit/vec_arch.h,
	  src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_hp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_nonfma_dp.c,
	  src/counter_analysis_toolkit/vec_nonfma_hp.c,
	  src/counter_analysis_toolkit/vec_nonfma_sp.c: cat: specify
	  architecture vector FLOPs benchmark function names  Include the
	  architecture names in the function names for consistency.  These
	  changes have been tested on the IBM POWER9 architecture.

2022-09-07  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/vec.c,
	  src/counter_analysis_toolkit/vec_arch.h,
	  src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_hp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_nonfma_dp.c,
	  src/counter_analysis_toolkit/vec_nonfma_hp.c,
	  src/counter_analysis_toolkit/vec_nonfma_sp.c: cat: vector FLOPs
	  benchmark for non-x86 architectures bug fix  The driver code for
	  the vector benchmark could not call the functions for the vector
	  FLOPs kernels because they were declared 'static'. For builds which
	  use either the NEON or ALTIVEC intrinsics, these static functions
	  are now wrapped, so they can be called by the driver.  These
	  changes have been tested on the IBM POWER9 architecture.

2022-10-13  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/powercap/tests/Makefile,
	  .../powercap/tests/powercap_basic_read.c,
	  .../powercap/tests/powercap_basic_readwrite.c: powercap: add new
	  component tests  This adds a new component test for each of the
	  following:  (1) add one event to an event set at a time and read it
	  (2) add one event at a time, read it, write it, read the new value,
	  restore the original value  These changes have been tested on the
	  Intel Ice Lake architecture.

2022-10-26  AnustuvICL <anustuv@icl.utk.edu>

	* src/components/perf_event/pe_libpfm4_events.c: perf_event: Free
	  allocated string in function allocate_native_event

2022-10-18  Peinan Zhang <peinan.zhang@intel.com>

	* src/components/intel_gpu/README,
	  src/components/intel_gpu/README.md,
	  .../intel_gpu/internal/inc/GPUMetricHandler.h,
	  .../intel_gpu/internal/inc/GPUMetricInterface.h,
	  .../intel_gpu/internal/src/GPUMetricHandler.cpp,
	  .../intel_gpu/internal/src/GPUMetricInterface.cpp,
	  src/components/intel_gpu/internal/src/Makefile,
	  src/components/intel_gpu/linux_intel_gpu_metrics.c,
	  src/components/intel_gpu/linux_intel_gpu_metrics.h,
	  src/components/intel_gpu/tests/Makefile,
	  src/components/intel_gpu/tests/gemm.spv,
	  src/components/intel_gpu/tests/gpu_common_utils.c,
	  src/components/intel_gpu/tests/gpu_common_utils.h,
	  src/components/intel_gpu/tests/gpu_metric_list.c,
	  src/components/intel_gpu/tests/gpu_metric_read.c,
	  src/components/intel_gpu/tests/gpu_query_gemm.cc,
	  src/components/intel_gpu/tests/gpu_thread_read.c,
	  src/components/intel_gpu/tests/readme.txt: Added support for
	  multiple Intel GPU devices and multiple-tiles per device.  Allow
	  query performance metrics on multiple Intel GPUs and multiple tiles
	  per GPU. Support Intel GPU Arctic Sound and Ponte Sound. Update
	  test cases for taking metrics from input, so to work with different
	  platforms. Update conponent README.md file

2022-10-24  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/utils/papi_native_avail.c: sde: make '-sde' option always
	  visible in papi_native_avail  The '-sde' option was not visible in
	  papi_native_avail unless the SDE component was configured in PAPI.
	  Now we always have the option visible but return an error if the
	  SDE component is not configured.

2022-10-25  Anthony <adanalis@icl.utk.edu>

	* src/configure, src/configure.in: Make papi_native_avail support the
	  "-sde" flag only if *both* libsde and the SDE component are
	  configured in.
	* src/components/sde/tests/Makefile,
	  src/components/sde/tests/README.txt: Added path to libpfm4 in the
	  SDE tests Makefile, and further instructions for users in the
	  README.

2022-10-24  AnustuvICL <anustuv@icl.utk.edu>

	* src/papi.h: papi.h: Update bit field post removal of members from
	  struct _papi_component_option

2022-10-23  William Cohen <wcohen@redhat.com>

	* src/components/sysdetect/linux_cpu_utils.c, src/linux-memory.c: Use
	  fgets in place of fscanf functions to avoid possible buffer
	  overflows  There were several locations in the PAPI code that used
	  fscanf calls like the following statement to read in information:
	  result=fscanf(fff,"%s",allocation_policy_string);  The problem with
	  this statement is that the fscanf could possibly write past the end
	  of allocation_policy_string.  To limit the write to the size of the
	  allocation_policy_string an fgets like the following is used in its
	  place:  str_result=fgets(allocation_policy_string, BUFSIZ, fff);
	  One set of fscanf were for the generic memory information code
	  reading the cache characteristics.  Another fscanf was in the
	  sysdetect component reading of cache characteristics.

2022-10-10  AnustuvICL <anustuv@icl.utk.edu>

	* src/genpapifdef.c, src/papi.h: Remove C++ style commented code

2022-08-31  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/perfctr/perfctr.c: perfctr: set disabled flag in cmp
	* src/components/perfmon2/perfmon.c: perfmon2: set disabled flag in
	  cmp
	* src/papi_internal.c: papi: do not set disabled flag in framework
	* src/components/vmware/vmware.c: vmware: set disabled flag in cmp
	* src/components/stealtime/linux-stealtime.c: stealtime: set disabled
	  flag in cmp
	* src/components/sensors_ppc/linux-sensors-ppc.c: sensors_ppc: set
	  disabled flag in cmp
	* src/components/rapl/linux-rapl.c: rapl: set disabled flag in cmp
	* src/components/powercap_ppc/linux-powercap-ppc.c: powercap_ppc: set
	  disabled flag in cmp
	* src/components/powercap/linux-powercap.c: powercap: set disabled
	  flag in cmp
	* src/components/perf_event_uncore/perf_event_uncore.c: perf_event_u:
	  set disabled flag in cmp
	* src/components/perf_event/perf_event.c: perf_event: set disabled
	  flag in cmp
	* src/components/pcp/linux-pcp.c: pcp: set disabled flag in cmp
	* src/components/net/linux-net.c: net: set disabled flag in cmp
	* src/components/mx/linux-mx.c: mx: set disabled flag in cmp
	* src/components/lustre/linux-lustre.c: lustre: set disabled flag in
	  cmp
	* src/components/lmsensors/linux-lmsensors.c: lmsensors: set disabled
	  flag in cmp
	* src/components/libmsr/linux-libmsr.c: libmsr: set disabled flag in
	  cmp
	* src/components/io/linux-io.c: io: set disabled flag in cmp
	* src/components/intel_gpu/linux_intel_gpu_metrics.c: intel_gpu: set
	  disabled flag in cmp
	* src/components/infiniband/linux-infiniband.c: infiniband: set
	  disabled flag in cmp
	* src/components/micpower/linux-micpower.c: micpower: set disabled
	  flag in cmp
	* src/components/host_micpower/linux-host_micpower.c: host_micpower:
	  set disabled flag in cmp
	* src/components/example/example.c: example: set disabled flag in cmp
	* src/components/coretemp_freebsd/coretemp_freebsd.c:
	  coretemp_freebsd: set disabled flag in cmp
	* src/components/coretemp/linux-coretemp.c: coretemp: set disabled
	  flag in cmp
	* src/components/appio/appio.c: appio: set disabled flag in cmp

2022-10-18  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/rocm.c: rocm: return PAPI_ENOEVNT if event not
	  found

2022-10-17  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* .../rocm/tests/intercept_single_kernel_monitoring.cpp,
	  .../rocm/tests/intercept_single_thread_monitoring.cpp,
	  src/components/rocm/tests/multi_kernel_monitoring.cpp,
	  src/components/rocm/tests/multi_thread_monitoring.cpp,
	  .../rocm/tests/sample_single_kernel_monitoring.cpp,
	  src/components/rocm/tests/single_thread_monitoring.cpp: rocm:
	  SQ_WAVES does not reflect logical waves  SQ_WAVES counts the number
	  of logical waves, plus the waves that are restored due to context
	  switching. This patch computes the logical number of waves as
	  SQ_WAVES - SQ_WAVES_RESTORED. For those architectures that do not
	  support SQ_WAVES_RESTORED (preceeding MI200) the tests return with
	  a warning and the number of waves check is ignored.

2022-10-20  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/main.c: cat: fix memory leak from
	  hw_desc alloc  Free the dynamically allocated memory used by the
	  hardware description feature of CAT.  These changes have been
	  tested on the Intel Westmere EP architecture.

2022-10-19  Anthony <adanalis@icl.utk.edu>

	* src/Makefile.in, src/Makefile.inc,
	  src/components/Makefile_comp_tests.target.in,
	  src/components/sde/tests/Makefile, src/configure, src/configure.in,
	  src/sde_lib/Makefile: Make static libsde.a optional.  We build the
	  static sde library 'libsde.a' only if libpapi.a is also built,
	  based on the configure flags provided by the user (i.e., --with-
	  static-lib). Also, the linking of the relevant tests and utilities
	  depends on the existence or not of the static sde library.

2022-10-20  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/hw_desc.h,
	  src/counter_analysis_toolkit/main.c: cat: define default number of
	  OMP threads  Using the PAPI_hw_info_t structure, define the default
	  number of threads as the number of CPUs per socket.  These changes
	  have been tested on the Intel Westmere EP architecture.

2022-10-18  William Cohen <wcohen@redhat.com>

	* src/components/sysdetect/tests/query_device_simple_f.F: Removed
	  unused label and variable from query_device_simple_f.F  Clean up
	  query_device_simple_f.F to eliminate the following warnings:
	  query_device_simple_f.F:142:12:  142 |           10 format(9I5) |
	  1 Warning: Label 10 at (1) defined but not used [-Wunused-label]
	  query_device_simple_f.F:7:41:  7 |           integer :: i, j,
	  ret_val, error, handle, modifier, id, vendor_id |
	  1 Warning: Unused variable 'error' declared at (1) [-Wunused-
	  variable]
	* src/papi_preset.c: Correctly size papi_preset.c array to avoid
	  possible overflow  Uped the work array size to avoid the following
	  warnings:  papi_preset.c: In function 'update_ops_string':
	  papi_preset.c:336:50: warning: '%d' directive writing between 1 and
	  11 bytes into a region of size 9 [-Wformat-overflow=] 336 |
	  sprintf (work, "N%d", cur_index-1); |
	  ^~ papi_preset.c:336:48: note: directive argument in the range
	  [-2147483648, 2147483646] 336 |
	  sprintf (work, "N%d", cur_index-1); |
	  ^~~~~ In file included from /usr/include/stdio.h:906, from
	  papi_debug.h:23, from papi_internal.h:24, from papi_preset.c:18: In
	  function 'sprintf', inlined from 'update_ops_string' at
	  papi_preset.c:336:5: /usr/include/bits/stdio2.h:30:10: note:
	  '__builtin___sprintf_chk' output between 3 and 13 bytes into a
	  destination of size 10 30 |   return __builtin___sprintf_chk (__s,
	  __USE_FORTIFY_LEVEL - 1, |
	  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 31 |
	  __glibc_objsize (__s), __fmt, |
	  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 32 |
	  __va_arg_pack ()); |
	  ~~~~~~~~~~~~~~~~~

2022-10-13  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/powercap/tests/powercap_basic.c: powercap: fix
	  memory leak in test  The component test 'powercap_basic' now frees
	  the dynamically allocated memory used to store counter readings.
	  These changes have been tested on the Intel Cascade Lake
	  architecture.

Sat Oct 1 23:04:01 2022 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/lib/events/intel_icl_events.h,
	  src/libpfm4/lib/events/intel_skl_events.h,
	  src/libpfm4/lib/events/intel_spr_events.h,
	  src/libpfm4/lib/pfmlib_amd64.c: libpfm4: update to commit 8aaaf17
	  Original commits:  commit 8aaaf1747e96031a47ed6bd9337ff61a21f8cc64
	  add missing break in amd64_get_revision()  Fixed bug introduced by:
	  commit 79031f76f8a1 ("fix amd_get_revision() to identify AMD Zen3
	  uniquely")  Must have a break statment for AMD Zen3 (model 1) to
	  avoid errors later.  Reported-by: Steve Kaufmann
	  <steve.kaufmann@hpe.com>  commit
	  bc4233d35418788423e8442395c7920eb156589d  update Intel Skylake
	  event table  Based on download.01.or version 1.28.   commit
	  4c0bc1c8ae06abd5f876657888b88aaf9c9530e6  Fix typos in Intel
	  Icelake event table  Based on download.01.org version 1.16.
	  commit b6f86fb0d8eae38d65d4394e3ed82f528b10bebf  Update Intel
	  SapphireRapid event table  Based on download.01.org release 1.06.
	  Minor changes to ASSITS and DECODE events.   Untested

2022-10-13  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/powercap/tests/powercap_basic.c: powercap: ensure
	  proper string format in test  Ensure that the proper string is
	  null-terminated.

2022-10-11  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/powercap/linux-powercap.c,
	  src/components/powercap/tests/Makefile,
	  src/components/powercap/tests/powercap_basic.c: powercap: fix
	  formatting  Replace tabs with appropriate amounts of spaces.  These
	  changes have been tested on the Intel Cascade Lake architecture.

2022-10-10  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/powercap/tests/powercap_basic.c: powercap: fix
	  compiler warnings for component test powercap_basic  The warnings
	  for the powercap component test can also be squelched by replacing
	  sizeof() with the actual buffer sizes.  These changes have been
	  tested on the Intel Cascade Lake architecture.
	* src/components/powercap/linux-powercap.c: powercap: fix compiler
	  warnings for component  The warnings for the powercap component can
	  be squelched by replacing sizeof() with the actual size of the
	  destination buffer.  These changes have been tested on the Intel
	  Cascade Lake architecture.

2022-10-10  AnustuvICL <anustuv@icl.utk.edu>

	* src/aix.c, src/components/bgpm/IOunit/linux-IOunit.c,
	  src/components/bgpm/L2unit/linux-L2unit.c,
	  src/components/perf_event/perf_event.c,
	  src/components/perf_event/perf_helpers.h,
	  src/components/perfctr/perfctr.c,
	  src/components/perfmon2/perfmon.c, src/components/perfmon_ia64
	  /perfmon-ia64.c, src/components/perfnec/perfmon.c,
	  src/components/rocm/rocm.c, src/components/sde/sde.c,
	  src/ctests/attach2.c, src/ctests/attach3.c,
	  src/ctests/attach_validate.c, src/ctests/byte_profile.c,
	  src/ctests/data_range.c, src/ctests/earprofile.c,
	  src/ctests/prof_utils.c, src/ctests/prof_utils.h,
	  src/ctests/profile.c, src/ctests/profile_pthreads.c,
	  src/ctests/profile_twoevents.c, src/ctests/sprofile.c,
	  src/examples/PAPI_profil.c, src/examples/sprofile.c, src/extras.c,
	  src/extras.h, src/linux-bgp.c, src/linux-bgq.c, src/linux-
	  context.h, src/linux-memory.c, src/papi.c, src/papi.h,
	  src/papi_fwrappers.c, src/papi_internal.h, src/papivi.h, src
	  /solaris-common.c, src/solaris-common.h, src/solaris-niagara2.c,
	  src/solaris-ultra.c, src/solaris-ultra.h: Refactor caddr_t to void*
	  vptr_t

2022-10-11  Anthony <adanalis@icl.utk.edu>

	* src/counter_analysis_toolkit/params.h: Missing file that should
	  have been included in PR 349 (commit 89c0f19).

2022-10-12  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/papi_fwrappers.c: sysdetect: fix warning in papi_fwrappers.c
	  papi_fwrappers.c is used to generate multiple wrapper versions for
	  fortran. Because of a global variable not declared static the
	  different versions cause a redefinition of the symbols when used
	  with recent versions of the gcc compiler (as the compiler does link
	  time optimizations). Declaring the variable static should fix the
	  problem.

2022-09-07  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/Makefile,
	  src/counter_analysis_toolkit/driver.h,
	  src/counter_analysis_toolkit/main.c: cat: add MPI support  Add MPI
	  support to accelerate the collection of event data. This works by
	  splitting up the list of events to be monitored among the MPI
	  ranks.  These changes have been tested on the IBM POWER9
	  architecture.

2022-09-28  Anthony <adanalis@icl.utk.edu>

	* src/counter_analysis_toolkit/scripts/README.txt,
	  src/counter_analysis_toolkit/scripts/default.gnp,
	  .../scripts/multi_plot.gnp, .../scripts/process_dcache_output.sh,
	  .../L2_RQSTS:ALL_DEMAND_REFERENCES.data.reads.stat,
	  .../L2_RQSTS:DEMAND_DATA_RD_HIT.data.reads.stat,
	  .../L2_RQSTS:DEMAND_DATA_RD_MISS.data.reads.stat,
	  .../scripts/single_plot.gnp: Scripts and sample data for viewing
	  CAT's dcache output.

2022-09-21  Anthony <adanalis@icl.utk.edu>

	* src/counter_analysis_toolkit/driver.h,
	  src/counter_analysis_toolkit/main.c: Removed redundant latency
	  step.
	* src/counter_analysis_toolkit/main.c: Added support for "-quick"
	  flag which skips the latency tests.
	* src/counter_analysis_toolkit/eventstock.c: Force the CPU component
	  to initialize itself.
	* src/counter_analysis_toolkit/branch.c,
	  src/counter_analysis_toolkit/branch.h,
	  src/counter_analysis_toolkit/dcache.c,
	  src/counter_analysis_toolkit/dcache.h,
	  src/counter_analysis_toolkit/driver.h,
	  src/counter_analysis_toolkit/icache.c,
	  src/counter_analysis_toolkit/main.c: Cleaned up the way we handle
	  the parameters specified via the command line arguments.

2022-09-04  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/tests/Makefile,
	  .../sysdetect/tests/query_device_simple_f.F, src/genpapifdef.c,
	  src/papi_fwrappers.c: sysdetect: add fortran bindings and test  Add
	  fortran bindings for PAPI sysdetect interface and tests.

2022-09-13  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/powercap/linux-powercap.c: powercap: fix wrap-around
	  arithmetic  When the energy counters reach the maximum value (given
	  by '/sys/class/powercap/intel-rapl*/max_energy_range_uj'), they
	  wrap around to zero.  There is arithmetic in the powercap component
	  to account for this case, but it previously used the maximum value
	  for an unsigned int, which is not necessarily the value given by
	  'max_energy_range_uj'.  Thus, the arithmetic has been modified to
	  now use the values given in the appropriate 'max_energy_range_uj'
	  files.  These changes have been tested on the Intel Cascade Lake
	  architecture.

2022-10-07  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/infiniband/linux-infiniband.c: infiniband: fix
	  warning in snprintf  Instead of using FILENAME_MAX as the length of
	  the string to be copied over to ev_file use the sum of the
	  substrings and account for the extra '/'.

2022-08-31  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/perfmon2/perfmon.c: perfmon2: funnel init_component
	  failures  init_component failures are handled locally to the
	  failure. Instead, funnel all error handling code paths through a
	  single exit point. This makes the code more robust to bugs and also
	  makes it easier to read.
	* src/components/perfctr/perfctr.c: perfctr: funnel init_component
	  failures  init_component failures are handled locally to the
	  failure. Instead, funnel all error handling code paths through a
	  single exit point. This makes the code more robust to bugs and also
	  makes it easier to read.

2022-08-28  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/host_micpower/linux-host_micpower.c: host_micpower:
	  funnel PAPI_ENOMEM through fn_fail  untested due to lack of
	  hardware
	* src/components/host_micpower/linux-host_micpower.c: host_micpower:
	  rework error handling in init_component
	* src/components/host_micpower/linux-host_micpower.c: host_micpower:
	  delete empty line
	* src/components/host_micpower/linux-host_micpower.c: host_micpower:
	  add fn_exit point
	* src/components/host_micpower/linux-host_micpower.c: host_micpower:
	  rename disable_me to fn_fail
	* src/components/vmware/vmware.c: vmware: funnel init_component
	  failures  init_component failures are handled locally to the
	  failure. Instead, funnel all error handling code paths through a
	  single exit point. This makes the code more robust to bugs and also
	  makes it easier to read.
	* src/components/stealtime/linux-stealtime.c: stealtime: funnel
	  init_component failures  init_component failures are handled
	  locally to the failure. Instead, funnel all error handling code
	  paths through a single exit point. This makes the code more robust
	  to bugs and also makes it easier to read.
	* src/components/sensors_ppc/linux-sensors-ppc.c: sensors_ppc: funnel
	  init_component failures  init_component failures are handled
	  locally to the failure. Instead, funnel all error handling code
	  paths through a single exit point. This makes the code more robust
	  to bugs and also makes it easier to read.
	* src/components/rapl/linux-rapl.c: rapl: funnel init_component
	  failures  init_component failures are handled locally to the
	  failure. Instead, funnel all error handling code paths through a
	  single exit point. This makes the code more robust to bugs and also
	  makes it easier to read.
	* src/components/powercap_ppc/linux-powercap-ppc.c: powercap_ppc:
	  funnel init_component failures  init_component failures are handled
	  locally to the failure. Instead, funnel all error handling code
	  paths through a single exit point. This makes the code more robust
	  to bugs and also makes it easier to read.
	* src/components/powercap/linux-powercap.c: powercap: funnel
	  init_component failures  init_component failures are handled
	  locally to the failure. Instead, funnel all error handling code
	  paths through a single exit point. This makes the code more robust
	  to bugs and also makes it easier to read.
	* src/components/pcp/linux-pcp.c: pcp: funnel init_component failures
	  init_component failures are handled locally to the failure.
	  Instead, funnel all error handling code paths through a single exit
	  point. This makes the code more robust to bugs and also makes it
	  easier to read.
	* src/components/pcp/linux-pcp.c: pcp: return PAPI_ECMP on error
	  instead of ctxHandle
	* src/components/net/linux-net.c: net: return PAPI_ECMP on error
	  instead of num_events
	* src/components/net/linux-net.c: net: funnel init_component failures
	  init_component failures are handled locally to the failure.
	  Instead, funnel all error handling code paths through a single exit
	  point. This makes the code more robust to bugs and also makes it
	  easier to read.
	* src/components/mx/linux-mx.c: mx: funnel init_component failures
	  init_component failures are handled locally to the failure.
	  Instead, funnel all error handling code paths through a single exit
	  point. This makes the code more robust to bugs and also makes it
	  easier to read.
	* src/components/micpower/linux-micpower.c: micpower: funnel
	  init_component failures  init_component failures are handled
	  locally to the failure. Instead, funnel all error handling code
	  paths through a single exit point. This makes the code more robust
	  to bugs and also makes it easier to read.  untested due to lack of
	  hardware
	* src/components/micpower/linux-micpower.c: micpower: replace
	  PAPI_ENOCMP with PAPI_ECMP  PAPI_ENOCMP should be used to indicate
	  that the requested component is not available in the component
	  index (e.g. because it wasn't initialized). PAPI_ECMP, on the other
	  hand, should be used when the component is initialized but some
	  requested feature is not supported by the component (e.g. the
	  component is not compatible with the feature). By its own
	  definition no component can return PAPI_ENOCMP.
	* src/components/lustre/linux-lustre.c: lustre: funnel init_component
	  failures  init_component failures are handled locally to the
	  failure. Instead, funnel all error handling code paths through a
	  single exit point. This makes the code more robust to bugs and also
	  makes it easier to read.
	* src/components/lmsensors/linux-lmsensors.c: lmsensors: funnel
	  init_component failures  init_component failures are handled
	  locally to the failure. Instead, funnel all error handling code
	  paths through a single exit point. This makes the code more robust
	  to bugs and also makes it easier to read.
	* src/components/libmsr/linux-libmsr.c: libmsr: funnel init_component
	  failures  init_component failures are handled locally to the
	  failure. Instead, funnel all error handling code paths through a
	  single exit point. This makes the code more robust to bugs and also
	  makes it easier to read.
	* src/components/io/linux-io.c: io: funnel init_component failures
	  init_component failures are handled locally to the failure.
	  Instead, funnel all error handling code paths through a single exit
	  point. This makes the code more robust to bugs and also makes it
	  easier to read.
	* src/components/intel_gpu/linux_intel_gpu_metrics.c: intel_gpu:
	  funnel init_component failures  init_component failures are handled
	  locally to the failure. Instead, funnel all error handling code
	  paths through a single exit point. This makes the code more robust
	  to bugs and also makes it easier to read.
	* src/components/example/example.c: example: funnel init_component
	  failures  init_component failures are handled locally to the
	  failure. Instead, funnel all error handling code paths through a
	  single exit point. This makes the code more robust to bugs and also
	  makes it easier to read.
	* src/components/coretemp/linux-coretemp.c: coretemp: replace
	  PAPI_ENOCMP with PAPI_ECMP  PAPI_ENOCMP should be used to indicate
	  that the requested component is not available in the component
	  index (e.g. because it wasn't initialized). PAPI_ECMP, on the other
	  hand, should be used when the component is initialized but some
	  requested feature is not supported by the component (e.g. the
	  component is not compatible with the feature). By its own
	  definition no component can return PAPI_ENOCMP.
	* src/components/coretemp_freebsd/coretemp_freebsd.c:
	  coretemp_freebsd: funnel init_component failures  init_component
	  failures are handled locally to the failure. Instead, funnel all
	  error handling code paths through a single exit point. This makes
	  the code more robust to bugs and also makes it easier to read.
	  untested
	* src/components/coretemp/linux-coretemp.c: coretemp: funnel
	  init_component failures  init_component failures are handled
	  locally to the failure. Instead, funnel all error handling code
	  paths through a single exit point. This makes the code more robust
	  to bugs and also makes it easier to read.
	* src/components/appio/appio.c: appio: funnel init_component failures
	  init_component failures are handled locally to the failure.
	  Instead, funnel all error handling code paths through a single exit
	  point. This makes the code more robust to bugs and also makes it
	  easier to read.

2022-08-27  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/perf_event_uncore/perf_event_uncore.c: perf_event_u:
	  replace PAPI_ENOCMP with PAPI_ECMP  PAPI_ENOCMP should be used to
	  indicate that the requested component is not available in the
	  component index (e.g. because it wasn't initialized). PAPI_ECMP, on
	  the other hand, should be used when the component is initialized
	  but some requested feature is not supported by the component (e.g.
	  the component is not compatible with the feature). By its own
	  definition no component can return PAPI_ENOCMP.
	* src/components/perf_event/perf_event.c: perf_event: replace
	  PAPI_ENOCMP with PAPI_ECMP  PAPI_ENOCMP should be used to indicate
	  that the requested component is not available in the component
	  index (e.g. because it wasn't initialized). PAPI_ECMP, on the other
	  hand, should be used when the component is initialized but some
	  requested feature is not supported by the component (e.g. the
	  component is not compatible with the feature). By its own
	  definition no component can return PAPI_ENOCMP.
	* src/components/perf_event_uncore/perf_event_uncore.c: perf_event_u:
	  funnel init_component failures  init_component failures are handled
	  locally to the failure. Instead, funnel all error handling code
	  paths through a single exit point. This makes the code more robust
	  to bugs and also makes it easier to read.
	* src/components/perf_event/perf_event.c: perf_event: funnel
	  init_component failures  init_component failures are handled
	  locally to the failure. Instead, funnel all error handling code
	  paths through a single exit point. This makes the code more robust
	  to bugs and also makes it easier to read.

2022-09-22  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm_smi/tests/Makefile: rocm_smi: add default rocm
	  path for tests  When PAPI_ROCM_ROOT is not defined it expands to
	  the empty string during compilation. Thus, many of the rocm flags
	  used by the compiler are incomplete and might cause problems. This
	  patch makes sure that PAPI_ROCM_ROOT always falls back to a default
	  if not defined.

2022-09-29  Anthony <adanalis@icl.utk.edu>

	* src/Makefile.inc, src/sde_lib/Makefile: libsde: Passing the CC
	  Makefile variable to the sub-make.

2022-09-22  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/tests/hl_intercept_multi_thread_monitoring.cpp,
	  src/components/rocm/tests/hl_intercept_single_thread_monitoring.cpp
	  , src/components/rocm/tests/hl_sample_single_thread_monitoring.cpp:
	  rocm: skip multi-threaded high-level API tests  PAPI high-level API
	  tests in rocm require user intervention to set the LD_LIBRARY_PATH
	  to the path of libpapi.so and librocprofiler64.so, required,
	  respectively, to set ROCP_TOOL_LIB and HSA_TOOLS_LIB.  Skip these
	  tests as they would fail without LD_LIBRARY_PATH being properly
	  set.

2022-09-21  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/tests/Makefile: rocm: use static libpapi for
	  tests  Instead of linking to libpapi.so, which requires additional
	  environment variables (i.e. LD_LIBRARY_PATH) to be set for the test
	  to work, build tests with libpapi.a.
	* src/components/rocm/tests/Makefile: rocm: add default rocm path for
	  tests  When PAPI_ROCM_ROOT is not defined it expands to the empty
	  string during compilation. Thus, many of the rocm flags used by the
	  compiler are incomplete and might cause problems. This patch makes
	  sure that PAPI_ROCM_ROOT always falls back to a default if not
	  defined.

2022-07-11  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/tests/Makefile: rocm: account for PAPI defined
	  compilation flags  Makefile_comp_tests.target already contains all
	  the variables needed to compile tests in various components.
	  Instead of hard coding compile flags all over again, use the
	  available variables.

2022-06-02  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/tests/Makefile: rocm: add
	  Makefile_comp_tests.target dependency in tests  Currently, the
	  install target is missing in rocm tests. Including
	  Makefile_comp_tests.target fixes the problem

2022-09-24  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/linux-memory.c: meminfo: support POWER10 cache information  Add
	  support for IBM POWER10 information in meminfo.
	* src/components/sysdetect/powerpc_cpu_utils.c: sysdetect: add
	  POWER10 cache info support
	* src/components/sysdetect/powerpc_cpu_utils.c: sysdetect: update
	  power9 L1 cache info  The number of lines in L1 cache is wrong. It
	  was set to 64 for a 32KB cache with 128B line size, while is should
	  be 32K/128 = 256.

2022-09-04  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/linux-memory.c: meminfo: add power9 cache info

2022-09-21  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/configure, src/configure.in: configure: fix tls check logic

2022-09-23  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/papi.c: papi_get_opt: update documentation for PAPI_LIB_VERSION
	  The PAPI_LIB_VERSION option in PAPI_get_opt() no longer requires
	  PAPI to be successfully initialized first.
	* src/ctests/version.c: ctests/version: PAPI_library_init does not
	  fail test  PAPI_library_init can fail now and PAPI_get_opt will
	  return the runtime version for the user to compare with the linked
	  version.

2022-09-22  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/papi.c: Return PAPI library version even if PAPI is not
	  initialized  Currently, there is no way for the user to compare his
	  version of the PAPI library with the version of the library being
	  loaded at runtime. PAPI_library_init() takes the version of the
	  user library and compares it with the version of the library
	  loaded. If the two don't match, it returns an error.
	  PAPI_get_opt() also provides access to some library information,
	  like the version, but this only works if PAPI has been correctly
	  initialized. This patch extends PAPI_get_opt() to provide the
	  library version regardless PAPI being initialized at all.

2021-12-02  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/ctests/all_native_events.c: ctests: improve all_native_events.
	  all_native_events does not test for event names only. Therefore,
	  you cannot test that specifying only the event name in an uncore
	  PMU event results in an error. Add a test for all_native_events
	  with only the event name.

Tue Sep 20 00:46:19 2022 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/config.mk, src/libpfm4/debian/changelog,
	  src/libpfm4/debian/control, src/libpfm4/debian/rules,
	  src/libpfm4/include/perfmon/perf_event.h,
	  src/libpfm4/lib/pfmlib_amd64.c: libpfm4: update to commit 8c606bc
	  Original commits:  commit 8c606bc2f2d186c2797d9f013283c9150f594f93
	  update perf_event.h to Linux 5.18  The perf_events interface for
	  directly accessing PMU registers from userspace for arm64 has been
	  formally implemented in the kernel v5.18.  Update perf_event.h
	  header used to build perf_event based examples.   commit
	  79031f76f8a1af7d3c83ae3c4363d32cfb5dadc6  fix amd_get_revision() to
	  identify AMD Zen3 uniquely  Make sure we handle the model number
	  properly for AMD Zen3. Right now, it would consider any family 19h
	  as Zen3.   commit 56f6a05d46b7592ddf81d77f4714dfc9b4c975e5  Update
	  to version 12.1  To fix some debian control files issues   commit
	  e1c16c829abc86a4e9547f4518d7834fcbd0a603  fix debian rules to build
	  again  Can now build using: $ debuild -i -us -uc -b -d   commit
	  19b784d3404fda20e27b30473804ff3a3a14f4d5  fix debian changelog for
	  4.12 release  changelog entry was added after the previous 11.1
	  release instead of at the top of the file

2022-09-22  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/sysdetect.c: sysdetect: fix problem with
	  missing shutdown_thread implementation  Originally, init_thread and
	  shutdown_thread were not implemented in the sysdetect component.
	  However, this causes issues when PAPI_register_thread and
	  PAPI_unregister_thread are used. In the case of unregister, the
	  framework will go through all the enabled components and call
	  shutdown_thread for each of them. Since the sysdetect did not
	  implement these functions a default (PAPI_ECMP) error would be
	  returned.  This patch adds the missing functions to sysdetect.
	  Solves issue #116

2022-09-21  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/configure, src/configure.in: configure: add support for
	  automatic ARM cpu detection  The configure script should be able to
	  detect the cpu architecture and enable the building of the
	  corresponding source code support it. However, the configure only
	  does this for x86_64 and power architectures. This patch add
	  support for ARM architectures as well.

2022-09-20  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/arm_cpu_utils.c: sysdetect: fix cache info
	  data structure name  With commit number 9f8e6b0 the sysdetect data
	  structures, that were originally hosted in papi.h, are moved to the
	  sysdetect.h instead. To account for this, the data structures
	  prefix was changed from PAPI_ to _sysdetect_. This change, however,
	  was erroneously skipped for the arm files. This patch fixes this
	  problem reflecting the name change in arm files.

2022-09-02  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/tests/Makefile: sysdetect: conditionally
	  build mpi tests

2022-09-13  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/tests/Makefile: Add the tests in the "clean"
	  target in the makefile.
	* src/configure, src/configure.in, src/utils/Makefile: More proper
	  handling of special compilation flags for papi_native_avail.

2022-04-20  Anthony <adanalis@icl.utk.edu>

	* src/Makefile.in, src/Makefile.inc,
	  src/components/Makefile_comp_tests.target.in,
	  src/components/sde/Rules.sde, src/components/sde/sde.c,
	  src/components/sde/sde_F.F90, src/components/sde/sde_internal.h,
	  src/components/sde/sde_lib/sde_lib.h,
	  .../sde/tests/Advanced_C+FORTRAN/sde_test_f08.F90,
	  .../sde/tests/Counting_Set/CountingSet_Lib++.cpp,
	  .../sde/tests/Counting_Set/CountingSet_Lib.c,
	  .../MemoryLeak_CountingSet_Driver++.cpp,
	  .../Counting_Set/MemoryLeak_CountingSet_Driver.c,
	  .../Counting_Set/Simple_CountingSet_Driver++.cpp,
	  .../tests/Counting_Set/Simple_CountingSet_Driver.c,
	  src/components/sde/tests/Counting_Set/cset_lib.hpp,
	  .../Created_Counter/Lib_With_Created_Counter++.cpp,
	  src/components/sde/tests/Makefile,
	  .../sde/tests/Minimal/Minimal_Test++.cpp,
	  .../sde/tests/Recorder/Lib_With_Recorder++.cpp,
	  src/components/sde/tests/Simple2/Simple2_Lib++.cpp,
	  src/components/sde/tests/Simple2/Simple2_Lib.c, src/configure,
	  src/configure.in, src/sde_lib/Makefile, src/sde_lib/sde_lib.c,
	  src/sde_lib/sde_lib.h, src/sde_lib/sde_lib.hpp,
	  src/sde_lib/sde_lib_datastructures.c,
	  src/sde_lib/sde_lib_internal.h, src/sde_lib/sde_lib_lock.h,
	  src/sde_lib/sde_lib_misc.c, src/sde_lib/sde_lib_ti.c,
	  src/sde_lib/sde_lib_ti.h, src/utils/Makefile,
	  src/utils/Makefile.target.in, src/utils/papi_native_avail.c:
	  libsde: Refactoring the sde code in a standalond library with a
	  clean API.  The libsde library is built and installed along with
	  libpapi (unless the user specifies --with-libsde=no at configure).
	  Clean seperation between the PAPI SDE component and libsde. Now
	  PAPI invokes the "tools interface" of libsde. Added missing
	  functions for symmetry, such as papi_sde_shutdown() and
	  papi_delete_counting_set(), and
	  papi_sde_enabled()/papi_sde_disable().

2022-05-17  AnustuvICL <anustuv@icl.utk.edu>

	* src/components/cuda/linux-cuda.c,
	  src/components/cuda/tests/Makefile,
	  .../cuda/tests/test_multipass_event_fail.c, src/genpapifdef.c,
	  src/papi.c, src/papi.h: cuda: Raise error when adding metrics that
	  need multiple pass  - Add new error code `PAPI_EMULPASS` - Updated
	  docs for `PAPI_add_event()` and `PAPI_add_named_event()` - Add test
	  program in `components/cuda/test/test_multipass_event_fail.c`

Fri Sep 16 22:37:40 2022 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/config.mk, src/libpfm4/debian/changelog,
	  src/libpfm4/docs/man3/libpfm_intel_bdw.3,
	  src/libpfm4/docs/man3/libpfm_intel_hsw.3,
	  src/libpfm4/docs/man3/libpfm_intel_icl.3,
	  src/libpfm4/docs/man3/libpfm_intel_icx.3,
	  src/libpfm4/docs/man3/libpfm_intel_ivb.3,
	  src/libpfm4/docs/man3/libpfm_intel_nhm.3,
	  src/libpfm4/docs/man3/libpfm_intel_skl.3,
	  src/libpfm4/docs/man3/libpfm_intel_snb.3,
	  src/libpfm4/docs/man3/libpfm_intel_spr.3,
	  src/libpfm4/docs/man3/libpfm_intel_wsm.3,
	  src/libpfm4/lib/pfmlib_intel_x86.c,
	  src/libpfm4/tests/validate_x86.c: libpfm4: update to commit 11f2d6c
	  Original commits:  commit 11f2d6c70a8b353e80eee55e9a2011c27c82398e
	  update to version 4.12.0  Update to 4.12.0 revision to prepare for
	  release   commit 471fe633ae01a636b78481b8030a1f922c9d24d2  fix
	  minimal ldlat latency for Intel Load Latency  SDM lists 1 cycle as
	  the lowest possible, adjust code to reflect spec. Adjust validation
	  test suite accordingly. Adjust man pages accordingly.

2022-09-03  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/utils/papi_native_avail.c: papi_native_avail: fix typo in
	  doxygen comment

2021-07-13  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/ctests/memory.c, src/linux-memory.c, src/papi.h: meminfo: Add
	  alloc/write policy for generic_get_memory_info  On arm64, if the
	  firmware supports ACPI PPTT (Processor Properties Topology Table),
	  the generic_get_memory_info function references the file locate in
	  the "/sys/devices/system/cpu/cpu*/cache/index*" directory and sets
	  the cache information.  In the arm64 environment, the following
	  cache information files are available from the kernel:.  index0:
	  allocation_policy    number_of_sets   size    ways_of_associativity
	  coherency_line_size  shared_cpu_list  type    write_policy level
	  shared_cpu_map   uevent  Currently, the papi library does not
	  reference two files, "allocation_policy" and "write_policy".  Add
	  allocation_policy/write_policy support for the
	  generic_get_memory_info function.
	  /sys/devices/system/cpu/cpu*/cache/index*/allocation_policy  -
	  ReadAllocate: allocate a memory location to a cache line on a cache
	  miss because of a read - WriteAllocate: allocate a memory location
	  to a cache line on a cache miss because of a write -
	  ReadWriteAllocate: both writeallocate and readallocate
	  /sys/devices/system/cpu/cpu*/cache/index*/write_policy  -
	  WriteThrough: data is written to both the cache line and to the
	  block in the lower-level memory - WriteBack: data is written only
	  to the cache line and the modified cache line is written to main
	  memory only when it is replaced

2022-09-02  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/libmsr/linux-libmsr.c: libmsr: improve disabled
	  reason string  Instead of returning a generic error message if
	  libmsr.so cannot be dlopen'ed return the dlerror() string.
	* src/components/sysdetect/powerpc_cpu_utils.c: sysdetect: update
	  power9 cache info

2022-08-24  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/ctests/all_native_events.c, src/ctests/get_event_component.c:
	  ctests: allow access to PAPI_EDELAY_INIT components  Some of the
	  tests, such as all_native_events and get_event_component, check the
	  'disabled' flag of the component before accessing it. Device
	  components, such as cuda and rocm, set the 'disabled' flag to
	  PAPI_EDELAY_INIT, which signifies the component is a delayed
	  initialization one. Thus, the event table in the component is
	  initialized only when events are accessed (e.g.
	  PAPI_enum_cmp_event).

2022-09-01  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/README.md: sysdetect: update README.md

2022-08-31  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/amd_gpu.c,
	  src/components/sysdetect/nvidia_gpu.c,
	  src/components/sysdetect/shm.c: sysdetect: warning fix in snprintf

2022-09-01  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/papi_internal.c: errcode: add PAPI_EMULPASS to error codes
	* src/papi.h: errcode: add comment for PAPI_EDELAY_INIT in papi.h
	* src/genpapifdef.c: errcode: add PAPI_EDELAY_INIT to genpapifdef

2022-08-31  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/Rules.sysdetect: sysdetect: add support
	  for Power8 through 10
	* src/configure, src/configure.in: configure: add support for POWER8
	  through 10  The configuration script only supported Power
	  architectures up to version 7. This patch adds Power8, Power9, and
	  Power10 as well.

2022-07-17  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/rocm.c: rocm: refactor ntv_name_to_code and
	  ntv_code_to_name

2022-07-16  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/Rules.rocm, src/components/rocm/rocm.c: rocm:
	  add name_to_code implementation
	* src/components/rocm/Rules.rocm, src/components/rocm/htable.c,
	  src/components/rocm/htable.h: rocm: add hash table for name_to_code
	  fast conversions  Add hash table implementation that can be used by
	  components to convert event names into their corresponding codes.
	  The hash function used by the hash table is djb2 by Dan Bernstein.

2022-07-11  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/tests/query_device_mpi.c,
	  .../sysdetect/tests/query_device_simple.c: sysdetect: update tests

2022-06-03  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/amd_gpu.c: sysdetect: fix warning in amd
	  gpu probe

2022-04-20  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/amd_gpu.c,
	  src/components/sysdetect/amd_gpu.h, src/components/sysdetect/cpu.c,
	  src/components/sysdetect/cpu.h,
	  src/components/sysdetect/cpu_utils.c,
	  src/components/sysdetect/cpu_utils.h,
	  src/components/sysdetect/linux_cpu_utils.c,
	  src/components/sysdetect/nvidia_gpu.c,
	  src/components/sysdetect/nvidia_gpu.h,
	  src/components/sysdetect/powerpc_cpu_utils.c,
	  src/components/sysdetect/shm.c,
	  src/components/sysdetect/sysdetect.c,
	  src/components/sysdetect/sysdetect.h,
	  src/components/sysdetect/x86_cpu_utils.c, src/configure,
	  src/configure.in, src/papi.c, src/papi.h, src/papi_internal.c,
	  src/papi_internal.h, src/utils/papi_hardware_avail.c: sysdetect:
	  extend PAPI with system detection and querying APIs  Queries
	  performed by accessing PAPI internal data structures are not easily
	  maintainable. Once an internal data structure is exposed to users
	  these rely on it not being changed and ties our hands from an
	  implementation standpoint. This patch introduces a new set of APIs
	  that can be used to query system attributes for different devices
	  through the system detection component. The APIs are generic enough
	  to allow extending the capabilities with new hardware devices, as
	  they become available, and allow to separate user interfaces from
	  the implementation of the underlying functionality.  Because the
	  new APIs have to be always functional we configure the sysdetect
	  component by default and initialize it lazily, i.e., only when the
	  corresponding APIs are called by the user.

2022-08-30  Daniel Barry <dbarry@vols.utk.edu>

	* src/papi_events.csv: papi_avail: add presets for Intel Ice Lake SP
	  Define preset events for the Intel Ice Lake SP processor. These
	  presets have been verified using the Counter Analysis Toolkit
	  benchmarks.  These changes have been tested on the Intel Ice Lake
	  architecture.

2022-08-25  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_hp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_nonfma_hp.c,
	  src/counter_analysis_toolkit/vec_scalar_verify.c: cat: remove
	  unused code from vector benchmark  There were several outdated and
	  unused lines of code in the CAT vector FLOPs benchmark.  These
	  changes have been tested on the Intel Ice Lake (ICX) architecture.
	* src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_hp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_nonfma_dp.c,
	  src/counter_analysis_toolkit/vec_nonfma_hp.c,
	  src/counter_analysis_toolkit/vec_nonfma_sp.c,
	  src/counter_analysis_toolkit/vec_scalar_verify.c: cat: format
	  changes to vector benchmark comments  Make comment style consistent
	  across the various source files.  These changes have been tested on
	  the Intel Ice Lake (ICX) architecture.

2022-08-24  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/Makefile,
	  src/counter_analysis_toolkit/vec.c,
	  src/counter_analysis_toolkit/vec_arch.h,
	  src/counter_analysis_toolkit/vec_fma.h,
	  src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_hp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_nonfma.h,
	  src/counter_analysis_toolkit/vec_nonfma_dp.c,
	  src/counter_analysis_toolkit/vec_nonfma_hp.c,
	  src/counter_analysis_toolkit/vec_nonfma_sp.c,
	  src/counter_analysis_toolkit/vec_scalar_verify.c,
	  src/counter_analysis_toolkit/vec_scalar_verify.h: cat: extend
	  vector FLOPs benchmark to 128-bit and 512-bit instrinsics
	  Previously, kernels within the vector FLOPs benchmark of the
	  Counter Analysis Toolkit used only 256-bit intrinsics. But to
	  accurately identify native events for 128-bit and 512-bit vector
	  widths, the corresponding intrinsics need to be included in the
	  benchmark.  These changes have been tested on the Intel Ice Lake
	  (ICX) architecture.

2022-06-03  Daniel Barry <dbarry@vols.utk.edu>

	* src/configure, src/configure.in: sysdetect: modify configure.in
	  logic to parse '--with-CPU=x86'  When the configuration is invoked
	  with "--with-CPU=x86", it should add 'x86_cpuid_info.c' to
	  MISCSRCS.  When the configuration is invoked on an architecture of
	  the x86_64 family, and "--with-CPU=x86" is not specified, the build
	  will proceed normally. However, the build will fail on x86_64 when
	  "--with-CPU=x86" is specified because this flag bypasses the check
	  for the x86_64 family but does not add 'x86_cpuid_info.c' to
	  MISCSRCS. Thus, if "--with-CPU=x86" is specified, "x86" must be
	  included in the list of CPUs for which 'x86_cpuid_info.c' is added
	  as a source file.  These changes have been tested on the Intel
	  Cascade Lake and Skylake architectures.

2022-08-22  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/libpfm4/lib/events/power10_events.h,
	  src/libpfm4/lib/pfmlib_power10.c: libpfm4: fix broken update  This
	  patch fixes an error in the the previous libpfm4 update (c340321).
	  The update in question was missing the expected power10 files.
	  This patch adds the missing libpfm4 commit including those files.

Wed Jul 27 03:58:13 2022 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/README, src/libpfm4/include/perfmon/pfmlib.h,
	  src/libpfm4/lib/Makefile,
	  src/libpfm4/lib/events/intel_icl_events.h,
	  src/libpfm4/lib/events/intel_spr_events.h,
	  src/libpfm4/lib/pfmlib_common.c,
	  src/libpfm4/lib/pfmlib_power_priv.h, src/libpfm4/lib/pfmlib_priv.h,
	  src/libpfm4/tests/validate_power.c: libpfm4: update libpfm4 to
	  commit 5140ce5  Original commits:  commit
	  5140ce5fe28a7d595eb0a3a906445d0deeb2c53c  Add IBM Power10 core PMU
	  support  Adds support for IBM Power 10 core PMU.  Documentation on
	  the PMU events for Power10 can be found in Appendix E of the
	  Power10 Users Manual.  The Power10 manual is at:
	  https://ibm.ent.box.com/v/power10usermanual  This and other PowerPC
	  related documents can be found at:
	  https://www-50.ibm.com/systems/power/openpower/   commit
	  c88fd465519ae6e96105efe19a06f64b3daa16af  More Intel SapphirRapids
	  updates  Based on download.01.org: sapphire_rapids_core_v1.04.json
	  commit 77711b23c5c2124c45d35f61a4b7edce7824ba53  Update Intel
	  SapphireRapids event table  Based on official event table at
	  download.01.org:  sapphirerapids_core_v1.04.json  Event RS_EMPTY
	  deprecated in favor of RS event. Updated OCR  umasks.   commit
	  391d20ec0a7d53bf5d7b39888734ba6fa716df3f  Update Icelake and
	  IcelakeX event tables  Based on official event tables from
	  downoad.01.org: icelakex_core_v1.15.json icelake_core_v1.14.json
	  Mostly updating the OCR events.   Tested: Power10        : No
	  Sapphire Rapids: No Icelake        : No IcelakeX       : No

2022-05-26  John Rodgers <john.rodgers@hpe.com>

	* src/components/cuda/linux-cuda.c: CUDA: Add compile/runtime version
	  debug msgs  In `linux-cuda.c::_cuda_linkCudaLibraries`, added debug
	  messages to report the compile/runtime versions for the driver,
	  runtime API, and CUPTI API.

2022-07-12  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/rocm.c: rocm: fix assign eventset to component
	  PAPI_assign_eventset_component() assigns an eventset to a component
	  of certain index. The function relies on component information to
	  allocate data structures for the component. One such parameter is
	  num_mpx_cntrs. The component was setting this to -1 in
	  rocm_init_component() causing any malloc to fail. Additionally,
	  when rocm_init_private() is finally called, it also resets
	  num_mpx_cntrs to the number of native events detected for the
	  device. This is wrong as the framework relies on this parameter
	  when freeing allocated data structures, e.g., EventInfoArray.

2022-06-11  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm_smi/linux-rocm-smi.c: rsmi: ignore not yet
	  implemented function  Calls to rsmi_dev_pci_bandwidth_get()
	  currently return a RSMI_STATUS_NOT_YET_IMPLEMENTED error. This
	  results in the rocm_smi component to be disabled. To avoid this we
	  allow the component to still work even without functioning
	  rsmi_dev_pci_bandwidth_get() function.

2022-05-30  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm_smi/tests/force_init.h,
	  src/components/rocm_smi/tests/power_monitor_rocm.cpp,
	  src/components/rocm_smi/tests/rocm_smi_writeTests.cpp: rocm_smi:
	  account for PAPI_EDELAY_INIT in tests

2022-05-28  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/nvml/tests/force_init.h,
	  src/components/nvml/tests/nvml_power_limit_read_test.cu,
	  src/components/nvml/tests/nvml_power_limiting_test.cu: nvml:
	  account for PAPI_EDELAY_INIT in tests  Currently, nvml tests check
	  for disable state of the component. Tests do not allow for
	  PAPI_EDELAY_INIT error, introduced in commit 1f44a36, however.
	  Thus, tests fail spuriously. This patch adds a force_nvml_init
	  function that accesses the nvml events, forcing the component to
	  init.

2022-06-02  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/powercap/linux-powercap.c: powercap: add wrapper
	  function to map event-set entry to counter  Created a wrapper
	  function to map the event-set index to the appropriate counter
	  index.  These changes have been tested on the Intel Cascade Lake
	  architecture.

2022-05-17  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/powercap/linux-powercap.c: powercap: fix event
	  lookup in _powercap_read()  The function read_powercap_value()
	  should be given the index of the powercap event, not the position
	  of that event in the event set. The event-set index maps to the
	  powercap-event index via the 'which_counter' array, which is
	  already used in the function _powercap_write().  These changes were
	  tested on the Intel Skylake and Cascade Lake architectures.

2022-07-06  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/sysdetect/arm_cpu_utils.c: sysdetect: add support
	  for Fujitsu A64FX  This enables 'papi_hardware_avail' utility
	  support for the A64FX processor. TLB and cache information were
	  obtained from the A64FX Microarchitecture Manual. (https://github.c
	  om/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_
	  1.6.pdf)  These changes were tested on the A64FX and ThunderX2
	  processors.

2022-06-03  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/linux-memory.c: papi_mem_info: modify aarch64_get_memory_info
	  by hw_info  Reference hw_info->vendor and hw_info->cpuid_model and
	  modify the aarch64_get_memory_info function to determine the
	  processor by the combination of Implementer and PartNum.  These
	  changes were tested on the ThunderX2 and Fujitsu A64FX
	  architectures.

2022-05-17  Daniel Barry <dbarry@vols.utk.edu>

	* src/linux-memory.c: papi_mem_info: add back support for ARM64
	  processors  Cache information for ARM64 processors other than the
	  Fujitsu A64FX is available in the /sys/ directory. Therefore, these
	  changes utilize the generic_get_memory_info() function for non-
	  A64FX ARM64 processors.  These changes were tested on the ThunderX2
	  and Fujitsu A64FX architectures.

2022-04-22  Daniel Barry <dbarry@vols.utk.edu>

	* src/linux-memory.c, src/papi.h: papi_mem_info: add support for
	  Fujitsu A64FX  This enables 'papi_mem_info' utility support for the
	  A64FX processor. TLB and cache information were obtained from the
	  A64FX Microarchitecture Manual. (https://github.com/fujitsu/A64FX/b
	  lob/master/doc/A64FX_Microarchitecture_Manual_en_1.6.pdf)  These
	  changes were tested on the Fujitsu A64FX, IBM POWER9, AMD Zen 2,
	  and Intel Haswell architectures.

2022-06-11  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/rocp.c: rocm: rocp_pool_close double free
	  comment
	* src/components/rocm/rocp.c: rocm: remove useless comments
	* src/components/rocm/rocp.c: rocm: adjust for 5.2.0 change of
	  directory structure
	* src/components/rocm/rocp.c: rocm: check config files are regular

2022-06-10  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/rocp.c: rocm: rename dispatch counter functions
	  increment/decrement_dispatch_counter also return the value of the
	  counter after increment/decrement. To make more clear what the
	  functions do rename them.
	* src/components/rocm/rocp.c: rocm: fix bug in sampling read
	* src/components/rocm/rocp.c: rocm: cleanup function signature
	* src/components/rocm/rocp.c: rocm: wrap sampling/intercept_ctx_init

2022-06-11  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/pcp/README.md: pcp: add how to run on Summit in
	  readme

2022-06-10  John Rodgers <john.rodgers@hpe.com>

	* .../cuda/tests/cupti_multi_kernel_launch_monitoring.cu: CUDA:
	  Update Multi-Kernel Launch Test  PR 298 toggled back the CUDA
	  profiling API support ranges to include CC 7.0 when built against
	  CUDA11+. As a result, the `cupti_multi_kernel_launch_monitoring.cu`
	  test needs to be updated to account for this new support range
	  behavior.

2022-06-09  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sensors_ppc/tests/sensors_ppc_basic.c: sensors_ppc:
	  fix test  sensor_ppc_basic does not call PAPI_stop before cleaning
	  up and destroying the eventset. This causes the test to return
	  PAPI_EISRUN error. Replace PAPI_read with PAPI_stop as fix.

Fri Jun 3 06:15:02 2022 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/lib/events/intel_spr_events.h,
	  src/libpfm4/lib/pfmlib_perf_event_pmu.c,
	  src/libpfm4/tests/validate_x86.c: libpfm4: update libpfm4 to commit
	  322e66c  Original commits:  commit
	  322e66c6463d6ff4035a751843dbce2ee83b6663  fix validate for
	  CPU_CLK_UNHALTED:REF_DISTRIBUTED for SapphireRapids  Was using the
	  bogus evnet code following the change in:  44a62a52e4e5 ("fix
	  CPU_CLK_UNHALTED.REF_DISTRIBUTED encoding for Intel
	  SapphireRapids")  Correct event code is 0x3c.   commit
	  a7b26272d8327ad1c001456a18518a0ac65dc2bb  avoid GCC-12 use-after-
	  free warnings  gcc-12 seems to complain about bogus use-after-free
	  situations in the libpfm4 code:  p = realloc(q, ...) if (!p) return
	  NULL  s = p + (q - z)  It complains because of the use of q after
	  realloc in this case. Yet  q - z is just pointer artihmetic and is
	  not dereferencing any memory through the pointer q which may have
	  been freed by realloc.  Fix is to pre-computer the delta before
	  realloc to avoid using the pointer after the call.  Reported-by:
	  Vitaly Chikunov <vt@altlinux.org>  commit
	  44a62a52e4e554cad7971b79770e03ae880336ce  fix
	  CPU_CLK_UNHALTED.REF_DISTRIBUTED encoding for Intel SapphireRapids
	  Was using 0x8ec instead 0x 0x83c.   commit
	  b28625959098b3889f5ffe1d209b5da196b959e1  update Intel
	  SapphireRapids core PMU events  Based on
	  download.01.org/perfmon/SPR/sapphire_rapids_v1.02.json Mostly
	  updates to the OCR event.   Testing: SapphireRapids commits
	  untested due to lack of hardware.

2022-06-02  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm_smi/tests/Makefile: rocm_smi: remove duplicate
	  include Makefile_comp_tests.target
	* src/components/rocm/rocp.c: rocm: fix rocprofiler load logic  The
	  load_rocp_sym function in rocp.c should not use PAPI_ROCM_ROOT to
	  calculate the pathname of the rocprofiler library. init_rocp_env
	  already sets up HSA_TOOLS_LIB to point to the right pathname based
	  on PAPI_ROCM_ROOT (or the pathname explicitly defined by users).
	  Thus, instead, load_rocp_sym should simply use HSA_TOOLS_LIB.

2022-06-03  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* doc/Makefile: sysdetect: add papi_hardware_avail to man pages

2022-06-02  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/x86_cpu_utils.c: sysdetect: fix a 'may be
	  used uninitialized' warning
	* src/components/cuda/linux-cuda.c: cuda: allow for CC 7.0 to use the
	  profiler API  Currently, the cuda component enforce the cupti event
	  API to be used if the compute capability (CC) of the device is 7.0.
	  This patch allows users to select the cupti profiler API instead by
	  using a cuda11 installation and exposing this to PAPI through the
	  PAPI_CUDA_ROOT environment variable.

Tue May 31 05:47:23 2022 -0700  Thomas Richter <tmricht@linux.ibm.com>

	* src/libpfm4/lib/events/s390x_cpumf_events.h,
	  src/libpfm4/lib/pfmlib_s390x_cpumf.c: libpfm4: update libpfm4 to
	  commit b03a81e  Original Commit:   s390: Update counter definition
	  for IBM z16  This patch updates the libpfm4 s390 counter
	  definitions to the latest documentation:  SA23-2261-07:The CPU-
	  Measurement Facility Extended Counters Definition for z10,
	  z196/z114, zEC12/zBC12, z13/z13s, z14, z15 and z16 April 29, 2022
	  https://www.ibm.com/support/pages/cpu-measurement-facility-
	  extended-counters-
	  definition-z10-z196z114-zec12zbc12-z13z13s-z14-z15-and-z16  This
	  includes updated counter description for existing counters and the
	  complete counter definition for IBM z16.  Acked-by: Sumanth
	  Korikkar <sumanthk@linux.ibm.com>  Testing: not tested

2022-05-26  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/x86_cpu_utils.c: sysdetect: fix warning in
	  cpu probe

2022-01-12  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/README.md, src/components/rocm/Rules.rocm,
	  src/components/rocm/common.h, src/components/rocm/linux-rocm.c,
	  src/components/rocm/rocm.c, src/components/rocm/rocm_IncDirs.awk,
	  src/components/rocm/rocp.c, src/components/rocm/rocp.h,
	  src/components/rocm/tests/Makefile,
	  src/components/rocm/tests/ROCM_SA_Makefile,
	  src/components/rocm/tests/common.h,
	  .../tests/hl_intercept_multi_thread_monitoring.cpp,
	  .../hl_intercept_single_kernel_monitoring.cpp,
	  .../hl_intercept_single_thread_monitoring.cpp,
	  .../tests/hl_sample_single_kernel_monitoring.cpp,
	  .../tests/hl_sample_single_thread_monitoring.cpp,
	  .../tests/intercept_multi_kernel_monitoring.cpp,
	  .../tests/intercept_multi_thread_monitoring.cpp,
	  .../tests/intercept_single_kernel_monitoring.cpp,
	  .../tests/intercept_single_thread_monitoring.cpp,
	  src/components/rocm/tests/matmul.cpp,
	  src/components/rocm/tests/matmul.h,
	  .../rocm/tests/multi_kernel_monitoring.cpp,
	  .../rocm/tests/multi_kernel_monitoring.h,
	  .../rocm/tests/multi_thread_monitoring.cpp,
	  .../rocm/tests/multi_thread_monitoring.h,
	  src/components/rocm/tests/rocm_all.cpp,
	  src/components/rocm/tests/rocm_command_line.c,
	  src/components/rocm/tests/rocm_example.cpp,
	  src/components/rocm/tests/rocm_failure_demo.cpp,
	  src/components/rocm/tests/rocm_standalone.cpp,
	  src/components/rocm/tests/run_papi.sh,
	  .../rocm/tests/sample_multi_kernel_monitoring.cpp,
	  .../rocm/tests/sample_multi_thread_monitoring.cpp,
	  .../rocm/tests/sample_overflow_monitoring.cpp,
	  .../rocm/tests/sample_single_kernel_monitoring.cpp,
	  .../rocm/tests/sample_single_thread_monitoring.cpp,
	  .../rocm/tests/single_thread_monitoring.cpp,
	  .../rocm/tests/single_thread_monitoring.h: rocm: component rewrite
	  The new rocm component implementation supports rocprofiler sampling
	  as well as intercepting mode. In sampling mode the new rocm
	  component assigns each eventset one or more GPU devices (depending
	  on the events requested by the PAPI user). Two separate threads can
	  thus create two eventsets and have them monitor a separate device
	  (N to N), or one thread can create a single eventset and have it
	  monitor all devices (1 to N). Sampling mode concerns with whatever
	  happens at the device level, not the kernel level. Thus, a section
	  of code instrumented with PAPI_start and PAPI_stop might measure
	  the activity of whatever kernel the current thread has launched on
	  a device, plus whatever kernels other threads may have launched on
	  the same device at the same time.  In intercepting mode the new
	  rocm component assigns each eventset one kernel at the time
	  (kernels are serialized by rocm). Intercepting mode concerns with
	  whatever happens at the kernel level (inside the device). Thus, a
	  section of code instrumented with PAPI_start and PAPI_stop might
	  measure the activity of whatever kernel the current thread launched
	  on a device. If the instrumented section contains multiple kernel
	  launches the component will accumulate the counters of those into a
	  single counter value.  The component also support software emulated
	  counters sampling through PAPI_overflow.

2022-04-18  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/high-level/papi_hl.c: high-level: use _papi_getpid instead of
	  getpid
	* src/threads.c, src/threads.h: thread: add _papi_getpid() function

2022-01-27  Anthony <adanalis@icl.utk.edu>

	* src/atomic_ops.h, src/atomic_ops/ao_version.h, src/atomic_ops
	  /generalize-arithm.h, src/atomic_ops/generalize-arithm.template,
	  src/atomic_ops/generalize-small.h, src/atomic_ops/generalize-
	  small.template, src/atomic_ops/generalize.h,
	  src/atomic_ops/sysdeps/README,
	  .../sysdeps/all_acquire_release_volatile.h,
	  .../sysdeps/all_aligned_atomic_load_store.h,
	  src/atomic_ops/sysdeps/all_atomic_load_store.h,
	  src/atomic_ops/sysdeps/all_atomic_only_load.h,
	  src/atomic_ops/sysdeps/ao_t_is_int.h,
	  src/atomic_ops/sysdeps/ao_t_is_int.template,
	  src/atomic_ops/sysdeps/armcc/arm_v6.h,
	  src/atomic_ops/sysdeps/emul_cas.h,
	  src/atomic_ops/sysdeps/gcc/aarch64.h,
	  src/atomic_ops/sysdeps/gcc/alpha.h,
	  src/atomic_ops/sysdeps/gcc/arm.h,
	  src/atomic_ops/sysdeps/gcc/avr32.h,
	  src/atomic_ops/sysdeps/gcc/cris.h,
	  src/atomic_ops/sysdeps/gcc/e2k.h, src/atomic_ops/sysdeps/gcc
	  /generic-arithm.h, src/atomic_ops/sysdeps/gcc/generic-
	  arithm.template, src/atomic_ops/sysdeps/gcc/generic-small.h,
	  src/atomic_ops/sysdeps/gcc/generic-small.template,
	  src/atomic_ops/sysdeps/gcc/generic.h,
	  src/atomic_ops/sysdeps/gcc/hexagon.h,
	  src/atomic_ops/sysdeps/gcc/hppa.h,
	  src/atomic_ops/sysdeps/gcc/ia64.h,
	  src/atomic_ops/sysdeps/gcc/m68k.h,
	  src/atomic_ops/sysdeps/gcc/mips.h,
	  src/atomic_ops/sysdeps/gcc/powerpc.h,
	  src/atomic_ops/sysdeps/gcc/riscv.h,
	  src/atomic_ops/sysdeps/gcc/s390.h, src/atomic_ops/sysdeps/gcc/sh.h,
	  src/atomic_ops/sysdeps/gcc/sparc.h,
	  src/atomic_ops/sysdeps/gcc/tile.h,
	  src/atomic_ops/sysdeps/gcc/x86.h,
	  src/atomic_ops/sysdeps/generic_pthread.h,
	  src/atomic_ops/sysdeps/hpc/hppa.h,
	  src/atomic_ops/sysdeps/hpc/ia64.h,
	  src/atomic_ops/sysdeps/ibmc/powerpc.h,
	  src/atomic_ops/sysdeps/icc/ia64.h,
	  .../sysdeps/loadstore/acquire_release_volatile.h,
	  .../loadstore/acquire_release_volatile.template,
	  src/atomic_ops/sysdeps/loadstore/atomic_load.h,
	  .../sysdeps/loadstore/atomic_load.template,
	  src/atomic_ops/sysdeps/loadstore/atomic_store.h,
	  .../sysdeps/loadstore/atomic_store.template,
	  .../loadstore/char_acquire_release_volatile.h,
	  .../sysdeps/loadstore/char_atomic_load.h,
	  .../sysdeps/loadstore/char_atomic_store.h,
	  .../sysdeps/loadstore/double_atomic_load_store.h,
	  .../loadstore/int_acquire_release_volatile.h,
	  src/atomic_ops/sysdeps/loadstore/int_atomic_load.h,
	  .../sysdeps/loadstore/int_atomic_store.h,
	  .../sysdeps/loadstore/ordered_loads_only.h,
	  .../sysdeps/loadstore/ordered_loads_only.template,
	  .../sysdeps/loadstore/ordered_stores_only.h,
	  .../sysdeps/loadstore/ordered_stores_only.template,
	  .../loadstore/short_acquire_release_volatile.h,
	  .../sysdeps/loadstore/short_atomic_load.h,
	  .../sysdeps/loadstore/short_atomic_store.h,
	  src/atomic_ops/sysdeps/msftc/arm.h,
	  src/atomic_ops/sysdeps/msftc/arm64.h,
	  src/atomic_ops/sysdeps/msftc/common32_defs.h,
	  src/atomic_ops/sysdeps/msftc/x86.h,
	  src/atomic_ops/sysdeps/msftc/x86_64.h,
	  src/atomic_ops/sysdeps/ordered.h,
	  src/atomic_ops/sysdeps/ordered_except_wr.h,
	  src/atomic_ops/sysdeps/read_ordered.h,
	  src/atomic_ops/sysdeps/standard_ao_double_t.h,
	  src/atomic_ops/sysdeps/sunc/sparc.S,
	  src/atomic_ops/sysdeps/sunc/sparc.h,
	  src/atomic_ops/sysdeps/sunc/x86.h,
	  src/atomic_ops/sysdeps/test_and_set_t_is_ao_t.h,
	  src/atomic_ops/sysdeps/test_and_set_t_is_char.h, src/linux-
	  common.c, src/linux-lock.h: Integrate the atomic operations of the
	  libatomic_ops library into PAPI.

2022-05-23  John Rodgers <john.rodgers@hpe.com>

	* src/components/cuda/linux-cuda.c: CUDA11 Start Variable Update  In
	  `_cuda11_start`, change `userContext` to `userCtx` to be consistent
	  with rest of the component.
	* src/components/cuda/linux-cuda.c: CUDA11 Profiler Active Context
	  Sensitivity  The CUPTI11 portion of the `cuda` component has shown
	  sensitivities for a calling threads active context when using
	  CUDA11 versions < 11.2. Specifically, it was found that not pushing
	  the session context onto the stack prior to calling
	  `cuptiProfilerSetConfig` would result in a failure.  This change
	  set addresses this issue by leveraging the same mechanics that are
	  used when calling other CUPTI routines, namely pushing the session
	  context onto the stack and popping it off once we are done with it.

2022-05-05  John Rodgers <john.rodgers@hpe.com>

	* src/components/cuda/linux-cuda.c: Allow context creation for CUDA11
	  For the CUPTI11 portion of `cuda` component, adopt logic from the
	  legacy version of the component to allow creation of contexts
	  should one not exist.  Update enables simple single GPU, as well as
	  target offload codes to be profiled.  Note: Update also resolves
	  issue with `papi_command_line` (issue 92)
	* src/components/cuda/README.md, src/components/cuda/linux-cuda.c,
	  src/components/cuda/tests/HelloWorld_CUPTI11.cu,
	  src/components/cuda/tests/simpleMultiGPU.cu,
	  .../cuda/tests/simpleMultiGPU_CUPTI11.cu: Issue102: Remove CUDA11
	  callback subscriber  The CUPTI callback subscriber introduced to
	  monitor contexts created a problem for packages that use PAPI for
	  CUDA performance counter collection.  Specifically, it prevented
	  registering of custom profiling/tracing callbacks, resulting in
	  `CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED` when attempting to
	  register one.  This changeset is effectively a targeted reversion
	  of the callback subscriber logic from the commit that introduced
	  it.  Commit that introduced subscriber:
	  9ff1d73dae9a7b297a54a77fac5fdb3957041452  With this update in
	  place, the `cuda` component context capturing behavior is
	  consistent between the legacy and update CUPTI11 version of the
	  code.  Requiring that applications create and set the context used
	  to run the kernels prior to calling `PAPI_add_events()`.

2022-05-16  John Rodgers <john.rodgers@hpe.com>

	* src/components/cuda/linux-cuda.c,
	  src/components/cuda/tests/Makefile,
	  .../tests/cupti_multi_kernel_launch_monitoring.cu: Issue 105:
	  CUDA11 Multi Read Error  The `cuda` component generated erroneous
	  values when multiple `PAPI_read` operations were called.  Testing
	  via direct usage of the CUPTI API revealed that the CUDA11
	  profiling image (`cuda11_CounterDataImage`) needed to be reset
	  after each read operation to prevent this behavior.  To enable
	  resetting, the initialization parameters for the profiling image
	  and scratch buffer are now stored along with the other profiling
	  parameters in `cuda_device_desc_t`. These newly stored parameters
	  are then used in each read operation to re-initialize the profiling
	  images after the counter results have been resolved.  A new test,
	  `cupti_multi_kernel_launch_monitoring`, has been introduced to the
	  `components/cuda/tests/` directory and was used in the validation
	  of this changeset.

2022-04-29  John Rodgers <john.rodgers@hpe.com>

	* src/components/cuda/linux-cuda.c: Remove unnecessary calls to
	  `cuptiProfiler{Enable,Disable}Profiling` in CUDA11 `PAPI_read`
	  operations.  Starting and stopping of profiling session now handled
	  in appropriate CUDA11 `PAPI_{start,stop}` operations.

2022-05-16  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: cuda: remove leftover cupti11
	  switching logic  For cupti 11 devices, i.e. devices that are
	  compatible with cupti 11 profiler interface, the component
	  overrides the vector function pointers with cupti 11 variants.
	  However, the _cuda_update_control_state function still contains a
	  switch for cupti 11 version. This should not happen and is
	  therefore removed by this patch.

2022-05-10  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/cuda/linux-cuda.c, src/configure, src/configure.in:
	  cuda: replace cupti_profiler with cupti_api_version  The cupti.h
	  header exports a CUPTI_API_VERSION macro that can be used for
	  checking whether the profiler API is supported or not. The macro
	  can assume the following values (associated to the corresponding
	  CUDA version, and compute capability):  CUPTI_API_VERSION   |
	  CUDA_VERSION   | COMPUTE CAPABILITY
	  --------------------+----------------+------------------- V1
	  | 4.0            | V2                  | 4.1            | V3
	  | 5.0            | V4                  | 5.5            | V5
	  | 6.0            | V6                  | 6.5            | V7
	  | 6.5            | V8                  | 7.0            | V9
	  | 8.0            | V10                 | 9.0            | V11
	  | 9.1            | V12                 | 10.0,10.1,10.2 | <  7.5
	  V13                 | 11.0           | >= 7.0 V14                 |
	  11.1           |  This patch replaces the CUPTI_PROFILER
	  preprocessor flag, previously set in configure if the
	  cupti_profiler_target.h header was found in PAPI_CUDA_ROOT, with
	  the CUPTI_API_VERSION in linux-cuda.c

2022-04-28  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: cuda: pre-cupti11 backward
	  compatibility fix  Cuda devices with compute capability <= 7.0
	  should be able to use the event API provided by cuda toolkits with
	  version >= 11. The cuda component selection logic however causes
	  cuda devices with such compute capabilities to fail when cuda
	  toolkits >= 11 are used. This patch fixes the selection logic.

2022-05-23  AnustuvICL <anustuv@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: cuda: Added debug messages to
	  indicate the locations of loaded dynamic CUDA libraries.

2022-03-21  Anustuv Pal <anustuv@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: cuda: Add support for CUDA
	  version >11.2  - CUDA 11.0 deprecated the use of
	  NVPA_RawMetricsConfig_Create and replaced it with
	  NVPW_CUDA_RawMetricsConfig_Create. This patch replaces the
	  deprecated functions with the NVIDIA recommended substitute.

2022-05-10  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/Rules.sysdetect: sysdetect: fix include
	  path of nvidia GPUs

2022-04-30  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/amd_gpu.c: sysdetect: fix amd gpu product
	  name  The HSA_AMD_AGENT_INFO_PRODUCT_NAME attribute in
	  hsa_agent_get_info seems to be broken in latest version of ROCm.
	  Replace this with the HSA attribute HSA_AGENT_INFO_NAME instead.
	  This reports the device compute architecture rather than the
	  product name.

2022-05-18  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/perf_event/perf_event.c: perf_event: fix typo in
	  error code handling  _pe_libpfm4_init() returns PAPI_ECMP error and
	  not PAPI_ENOCMP like currently handled by the caller (i.e.
	  _pe_init_component). Change PAPI_ENOCMP into PAPI_ECMP.
	* src/components/perf_event/pe_libpfm4_events.c: perf_event: do not
	  set disable string in _pe_libpfm4_init  _pe_libpfm4_init() returns
	  an error code that is used by the called (i.e.
	  _pe_init_component()) to set the disabled_reason string to the
	  appropriate error message, overwriting whatever was set by
	  _pe_libpfm4_init().

Thu Apr 21 15:01:07 2022 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/lib/events/amd64_events_fam19h_zen3.h: libpfm4: update
	  libpfm4 to commit c779846  Original commit:  commit
	  c7798469063288ca5829ab96c7c174dad5a08e74  Rename OP_QUEUE_EMPTY to
	  UOPS_QUEUE_EMPTY on AMD Zen3  To be comptible with AMD Zen2.

2022-04-07  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/amd_gpu.c: sysdetect: use PAPI_ROCM_ROOT
	  for rocmsmi dlopen path

2022-04-19  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/run_tests_exclude.txt: intel_gpu: remove test/readme.txt from
	  test list  Currently the run_tests_exclude.txt does not list
	  intel_gpu readme.txt file in the tests directory. This causes the
	  run_test.sh script to try execute such file. Black list the file to
	  skip execution.

Wed Apr 20 19:56:03 2022 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/docs/man3/libpfm_intel_spr.3,
	  src/libpfm4/lib/events/amd64_events_fam19h_zen3.h,
	  .../lib/events/amd64_events_fam19h_zen3_l3.h,
	  src/libpfm4/lib/events/intel_spr_events.h,
	  src/libpfm4/lib/pfmlib_amd64.c, src/libpfm4/tests/validate_x86.c:
	  Update libpfm4 Current with commit
	  9580a003d83900569db3f2c7bc41e0e2ea7b88ef  Fix amd64 duplicate event
	  detection logic  Must check flags as well as code otherwise false
	  positive duplicate are detected on AMD Fam10h Barcelona where some
	  events appears as duplicate when in fact they are for different
	  revisions of the CPU.

2022-04-14  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/sde_lib/sde_lib.h: Refactored unlocking to the
	  end of each function, and replaced tabs with spaces.

2022-04-13  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/sde_lib/sde_lib.h: Cleaning up error messages.
	* src/components/sde/sde.c, src/components/sde/sde_lib/sde_lib.h,
	  src/components/sde/tests/Makefile: Counting Set introduced to
	  sde_lib. Both C and C++ APIs along with tests.

2022-04-12  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/high-level/papi_hl.c: high-level: replace flock with fcntl
	  flock is not POSIX compliant. Replace it with fcntl instead.

2022-04-05  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/high-level/papi_hl.c: high-level: use variable to select
	  between single and multi-thread mode  Currently, the HL API always
	  assumes multi-thread mode. This means that multiple threads in the
	  program can create and manage PAPI event sets. This is not a valid
	  assumption as the behavior is different for single-thread
	  monitoring programs. This patch, introduces a new environment
	  variable named PAPI_HL_THREAD_MULTIPLE that allows to select single
	  and multi-thread mode explicitly in the HL API. To avoid affecting
	  existing applications and tests the default is multi-thread
	  monitoring. If the variable is set to "0" single-thread mode is
	  selected instead. For explicitly setting multi-thread monitoring
	  the variable has to be set to "1".

Mon Apr 11 15:19:40 2022 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/README, src/libpfm4/docs/Makefile,
	  src/libpfm4/include/perfmon/pfmlib.h, src/libpfm4/lib/Makefile,
	  src/libpfm4/lib/events/intel_icl_events.h,
	  src/libpfm4/lib/events/intel_spr_events.h,
	  src/libpfm4/lib/pfmlib_common.c,
	  src/libpfm4/lib/pfmlib_intel_spr.c, src/libpfm4/lib/pfmlib_priv.h,
	  src/libpfm4/tests/validate_x86.c: Update libpfm4 Current with
	  commit eca0a1f2d274ba26e6c24231fdf61b1407e3ed03  add Intel
	  SapphireRapid core PMU support  This patch adds Intel SapphireRapid
	  core PMU support to libpfm4. It is based on the public event list
	  from:
	  https://download.01.org/perfmon/SPR/sapphirerapids_core_v1.00.json

2022-04-13  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_hp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_nonfma_dp.c,
	  src/counter_analysis_toolkit/vec_nonfma_hp.c,
	  src/counter_analysis_toolkit/vec_nonfma_sp.c,
	  src/counter_analysis_toolkit/vec_scalar_verify.c,
	  src/counter_analysis_toolkit/vec_scalar_verify.h: CAT: Rename
	  printing functions in vector benchmark  The previous names for the
	  functions which print to a file the results of the CAT vector
	  benchmark did not indicate their purpose. These new function names
	  better describe what they do.  These changes were tested on the
	  Fujitsu A64FX architecture.

2022-04-08  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/vec_arch.h,
	  src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_hp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_nonfma_dp.c,
	  src/counter_analysis_toolkit/vec_nonfma_hp.c,
	  src/counter_analysis_toolkit/vec_nonfma_sp.c,
	  src/counter_analysis_toolkit/vec_scalar_verify.c: CAT: Added scalar
	  instrinsics to validate half-precision vector benchmark accuracy
	  Previously, the half-precision kernels' numerical results were not
	  checked against results computed using only scalar quantities. The
	  GCC documentation states that "The __fp16 type may only be used as
	  an argument to intrinsics defined in <arm_fp16.h>, or as a storage
	  format." (https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html)
	  Thus, these intrinsics are now in use to verify the accuracy of the
	  vector benchmarks.  These changes have been tested on the Fujitsu
	  A64FX architecture.

2022-03-31  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/sde_lib/sde_lib.h: Fixed a potential deadlock in
	  the SDE component.

2022-03-30  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/sde.c: Fixed bug in terminating a string in the
	  SDE component.

2022-03-30  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/Makefile,
	  src/counter_analysis_toolkit/README,
	  src/counter_analysis_toolkit/driver.h,
	  src/counter_analysis_toolkit/main.c,
	  src/counter_analysis_toolkit/vec.c,
	  src/counter_analysis_toolkit/vec.h,
	  src/counter_analysis_toolkit/vec_arch.h,
	  src/counter_analysis_toolkit/vec_fma.h,
	  src/counter_analysis_toolkit/vec_fma_dp.c,
	  src/counter_analysis_toolkit/vec_fma_hp.c,
	  src/counter_analysis_toolkit/vec_fma_sp.c,
	  src/counter_analysis_toolkit/vec_nonfma.h,
	  src/counter_analysis_toolkit/vec_nonfma_dp.c,
	  src/counter_analysis_toolkit/vec_nonfma_hp.c,
	  src/counter_analysis_toolkit/vec_nonfma_sp.c,
	  src/counter_analysis_toolkit/vec_scalar_verify.c,
	  src/counter_analysis_toolkit/vec_scalar_verify.h,
	  src/counter_analysis_toolkit/weak_symbols.c: CAT: Added vector
	  FLOPs benchmarks to identify related events  Hardware events in
	  certain architectures account for floating-point operations
	  incurred by vector instructions. This new benchmark category allows
	  for these events to be more easily identified by using the vector
	  instrinsics available on a given architecture. This benchmark
	  includes kernels for fused multiply-add (FMA) vector instructions.
	  These changes have been tested on the IBM POWER9, Fujitsu A64FX
	  (ARM), and AMD Zen2 architectures.

2022-03-22  Anthony Danalis <adanalis@icl.utk.edu>

	* src/counter_analysis_toolkit/timing_kernels.c,
	  src/counter_analysis_toolkit/timing_kernels.h: Updated the data
	  cache write benchmark to make it cause one read and one write more
	  reliably.

2022-01-23  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/configure, src/configure.in, src/papi_lock.h: configure: add
	  one lock per components  Currently components do not have dedicated
	  locks. This patch adds support for one lock per component so that
	  two components do not have to share the same lock.

2022-03-28  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/components/sysdetect/arm_cpu_utils.c: sysdetect: improve
	  processor name for ARM processors  On ARM processors, Raspbian OS
	  can get the processor name from "model name" in /proc/cpuinfo. On
	  non-Raspbian OS, the processor name cannot be retrieved from
	  /proc/cpuinfo. Therefore, the processor name is generated based on
	  the information of "CPU implementer" and "CPU part" in
	  /proc/cpuinfo.

2022-03-15  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/arm_cpu_utils.c,
	  src/components/sysdetect/linux_cpu_utils.c: sysdetect: fix vendor
	  codes for ARM

Fri Mar 18 12:25:26 2022 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/lib/events/intel_icl_events.h,
	  src/libpfm4/tests/validate_x86.c: Update libpfm4 Current with
	  commit ad5c64e1ac2f177e2166bedfd7b679e49017cb55  fix Intel Icelake
	  TOPDOWN.SLOTS_P encoding  Was using the 0x00 (fixed counter)
	  encoding instead of 0xa4 for the generic counter.  Add validation
	  tests for SLOTS AND SLOTS_P.

2022-03-14  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/linux_cpu_utils.c: sysdetect: system
	  information queries treat all ARM CPUs the same  When querying for
	  system information all ARM CPUs are treated the same regardless the
	  vendor.

2022-03-17  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/components/sysdetect/README.md: sysdetect: configure --with-CPU
	  required.  To enable sysdetect, use the following command:.
	  `./configure --with-CPU=$CPU --with-components="sysdetect"` $CPU
	  can have the following values:. x86, POWER5, POWER5+, POWER6,
	  POWER7, PPC970, arm

2022-03-16  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/components/sysdetect/cpu_utils.c: sysdetect: Adding Macro
	  Definitions for arm64  On ARM processors, for arm32, the macro
	  definition for compilation is "defined (__arm__)", for arm64, the
	  macro definition for compilation is "defined (__aarch64__)".
	  Therefore, on ARM processors, you need to enable both arm32 and
	  arm64, so you need to set "defined (__arm__) || defined
	  (__aarch64__)".

2022-02-20  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/arm_cpu_utils.c: sysdetect: patch arm to
	  convert vendor id into vendor string
	* src/components/sysdetect/linux_cpu_utils.c: sysdetect: get rid of
	  VENDOR_ARM

2022-02-16  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/cpu.c,
	  src/components/sysdetect/cpu_utils.h,
	  src/components/sysdetect/linux_cpu_utils.c,
	  src/components/sysdetect/x86_cpu_utils.c, src/papi.h,
	  src/utils/papi_hardware_avail.c: sysdetect: add vendor id field for
	  ARM processors

2022-02-17  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/linux_cpu_utils.c: sysdetect: update
	  vendor id codes in cpu probe

2022-02-20  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/utils/papi_hardware_avail.c: papi_hardware_avail: only print
	  numa node memory if greater than 0
	* src/components/sysdetect/linux_cpu_utils.c: sysdetect: replace
	  assignment to atoi with sscanf  Assigning a model number using atoi
	  and assignment operator might be ineffective. If the string is
	  expressed in hex atoi will fail the conversion. Instead replace the
	  atoi assignment with sscanf.
	* src/components/sysdetect/cpu.c: sysdetect: fix typo in cpu probe
	* src/components/sysdetect/linux_cpu_utils.c: sysdetect: there is at
	  least one numa node in SMPs

2022-03-04  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/papi_events.csv: Add PAPI idle-related preset events for a64fx
	  For a64fx, add four PAPI idle-related preset events
	  (PAPI_BRU_IDL/PAPI_FXU_IDL/PAPI_FPU_IDL/PAPI_LSU_IDL).
	  PAPI_BRU_IDL = BR_COMP_WAIT PAPI_FXU_IDL = EU_COMP_WAIT -
	  FL_COMP_WAIT PAPI_FPU_IDL = FL_COMP_WAIT PAPI_LSU_IDL =
	  LD_COMP_WAIT  The specifications of BR_COMP_WAIT, EU_COMP_WAIT,
	  FL_COMP_WAIT, and LD_COMP_WAIT can be found in the "14.4. Cycle
	  Accounting" on A64FX_Microarchitecture_Manual_en_1.5.pdf at the
	  following URL:. https://github.com/fujitsu/A64FX/blob/master/doc
	* src/components/perf_event/pe_libpfm4_events.c,
	  src/components/perf_event/perf_event.c, src/linux-common.c,
	  src/papi.h: PAPI_get_hardware_info: improve PAPI_hw_info_t for ARM
	  processors  Currently, it is not possible to determine which
	  company the ARM processor was designed by from the PAPI_hw_info_t
	  available in PAPI_get_hardware_info(). On ARM processors, the
	  PAPI_hw_info_t obtained with PAPI_get_hardware_info() does not
	  contain information indicating which company was designed. For ARM
	  processors, improve the vendor and vendor_string entries in
	  PAPI_hw_info_t, which can be retrieved with
	  PAPI_get_hardware_info(), to include information indicating which
	  company was designed.

2022-02-20  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/Rules.sysdetect: sysdetect/amd: add
	  CPPFLAGS to Rules file

2022-02-19  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/amd_gpu.c: sysdetect/amd: use snprintf
	  instead of assignment operator
	* src/components/sysdetect/amd_gpu.c: sysdetect/amd: fix
	  hsa_error_string definition and usage
	* src/utils/papi_component_avail.c: papi_component_avail: rename
	  force_lazy_init to force_cmp_init

2022-02-18  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/utils/papi_component_avail.c: papi_component_avail: force
	  component init only when necessary
	* src/utils/papi_native_avail.c: papi_native_avail: allow for delayed
	  init components

Mon Feb 28 11:55:13 2022 -0800  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/README, src/libpfm4/docs/Makefile,
	  src/libpfm4/include/perfmon/pfmlib.h, src/libpfm4/lib/Makefile,
	  .../lib/events/arm_hisilicon_kunpeng_events.h,
	  .../lib/events/arm_hisilicon_kunpeng_unc_events.h,
	  src/libpfm4/lib/events/arm_neoverse_n1_events.h,
	  src/libpfm4/lib/events/arm_neoverse_n2_events.h,
	  src/libpfm4/lib/events/intel_icl_events.h,
	  src/libpfm4/lib/events/intel_skl_events.h,
	  src/libpfm4/lib/events/perf_events.h, src/libpfm4/lib/pfmlib_arm.c,
	  src/libpfm4/lib/pfmlib_arm_armv8.c,
	  src/libpfm4/lib/pfmlib_arm_priv.h, src/libpfm4/lib/pfmlib_common.c,
	  src/libpfm4/lib/pfmlib_intel_rapl.c,
	  src/libpfm4/lib/pfmlib_kunpeng_unc_perf_event.c,
	  src/libpfm4/lib/pfmlib_perf_event_pmu.c,
	  src/libpfm4/lib/pfmlib_perf_event_raw.c,
	  src/libpfm4/lib/pfmlib_priv.h, src/libpfm4/tests/validate_arm64.c,
	  src/libpfm4/tests/validate_x86.c: Update libpfm4. Tested on
	  orbitty.icl.utk.edu, ARMv8 Processor rev 1 (v8l) Tested on
	  methane.icl.utk.edu, Intel Skylake Xeon(R) Gold 6140 CPU Tested on
	  dopamine.icl.utk.edu, AMD Zen3 EPYC 7413 CPU Current with commit
	  58efe1f26fe1ca82f8b25b83c1089c5f9eac0f1b  add Intel SapphireRapid
	  RAPL support  Add CPU model number for SapphireRapid based on Linux
	  kernel information.

2022-02-20  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/linux_cpu_utils.c: sysdetect: fix infinite
	  loop bug  When there is no node is /sys/devices/system/cpu/
	  sysdetect will loop indefinitely looking for the node affine to the
	  thread specified. Instead we should look for existence of node0 for
	  cpu0. If this is present then other nodes will likely to be in the
	  file system tree. Otherwise there is no point looking further and
	  we just assume there is only one numa node.

2022-02-19  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: cuda: fix overcounting of cuda
	  devices  Cuda device counting is done by scanning the file system
	  for the related device number information. There are two places in
	  the file system where this information can be found: in /sys and in
	  /proc. The second being a fallback in case the first does not
	  contain the desired information. The /proc based device counting
	  goes through all the directories in the /proc/driver/nvidia/gpus,
	  including '.' and '..', thus overcounting the number of devices.
	  This patch fixes the problem by filtering such directories.

2022-02-16  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm_smi/linux-rocm-smi.c: rocm_smi: change
	  disabled_reason for delayed init
	* src/components/nvml/linux-nvml.c: nvml: change disabled_reason for
	  delayed init
	* src/components/rocm/linux-rocm.c: rocm: change disabled_reason for
	  delayed init
	* src/components/cuda/linux-cuda.c: cuda: change disabled_reason for
	  delayed init

2022-02-15  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/papi_internal.c: papi_errno: add delay init string error
	* src/cpus.c, src/papi.c, src/papi_internal.c, src/threads.c: papi:
	  handle delay init for GPU components  The PAPI_EDELAY_INIT error
	  code is handled by PAPI as if the component was enabled. This
	  allows PAPI to kick off delayed init by calling any of the internal
	  component functions that access its events.
	* src/papi.h: papi_errno: add PAPI_EDELAY_INIT error for delayed init
	  components  Delayed initialization components need a way for
	  distinguishing delayed initialization from disabled state.

2021-12-12  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/papi.c: papi_get_component_info: remove init_private support
	* src/papi_vector.c, src/papi_vector.h: papi_vector: remove
	  init_private support
	* src/utils/papi_component_avail.c: papi_component_avail: add lazy
	  init code for components  Previously we had an init_private()
	  function added to papi_vector and implemented by those components
	  that needed delayed (lazy) initialization, such as rocm, cuda,
	  rocm_smi, nvml.  Such init_private() delayed initialization was
	  mainly used by papi_component_avail to read the number of events
	  and hardware counters for reporting purposes to the user of the
	  utility. Applications are free to ignore init_private() and avoid
	  PAPI to call the GPU runtime init functions.  The same result can
	  be achieved by forcing lazy init in papi_component_avail by
	  accessing the events in the component. If no such access happens
	  the init_component now disables the components mentioned above and
	  sets the disabled status to the following message:  "Not
	  initialized, call PAPI_enum_cmp_event or any other component event
	  access function to force lazy init". "
	* src/components/nvml/linux-nvml.c: nvml: remove init_private for
	  lazy init
	* src/components/rocm_smi/linux-rocm-smi.c: rocm_smi: remove
	  init_private for lazy init
	* src/components/cuda/linux-cuda.c: cuda: remove init_private for
	  lazy init
	* src/components/rocm/linux-rocm.c: rocm: remove init_private for
	  lazy init  init_private is a hack that causes inconsistency in the
	  component interface. Such inconsistency can cause bugs. This patch
	  removes the init_private interface.

2022-02-03  Anthony Danalis <adanalis@dopamine.icl.utk.edu>

	* src/counter_analysis_toolkit/dcache.c,
	  src/counter_analysis_toolkit/dcache.h,
	  src/counter_analysis_toolkit/timing_kernels.c: Improved error
	  handling and reporting.

2022-01-31  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: cuda: rename device count routine
	  looking into /sys  Rename _cuda_count_nvidia_devices to
	  _cuda_count_dev_sys to reflect the source of the information. This
	  is to distinguish this routine from the other counting routine
	  _cuda_count_dev_proc, which looks into the /proc file system
	  instead.

2022-01-28  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: cuda: extend device counting with
	  /proc filesystem  /sys based device counting for cuda relies on the
	  linux display rendering manager populating the corresponding
	  entries in the filesystem. This is not always the case and depends
	  on the specific linux configuration. Thus, this method might cause
	  the component wrongly detecting no cuda devices in a system that
	  has some. Another source of information for cuda devices is /proc
	  file system. This patch extends the current /sys functionality with
	  /proc information.
	* src/high-level/scripts/papi_hl_output_writer.py: high-level: make
	  output writer script python2/3 compatible  The
	  papi_hl_output_writer.py script uses 'long' which is no longer
	  supported in python2/3. Replace 'long' with 'int'.

2022-01-27  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/shm.c: sysdetect: fix load_mpi_sym
	  signature

2022-01-22  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: cuda: fix bug in shut down
	  sequence  When the cuda component is configured but not used, there
	  is no dlopen() of nvidia libraries. Still, when the component is
	  shutdown, dlclose() of these libraries is unconditionally called,
	  causing a segmentation fault. This patch adds guards around
	  dlclose() so that every dlopen() is always paired with a dlclose().

2022-01-17  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/sysdetect/Rules.sysdetect,
	  src/components/sysdetect/amd_gpu.c: sysdetect/rocm: explicitly look
	  for rocm in PAPI_ROCM_ROOT  Similarly to the ROCm component
	  sysdetect also requires PAPI_ROCM_ROOT to be defined so that the
	  user is explicitly forced to define where in the file system tree
	  the ROCm installation to be used is located. This prevents from
	  situations in which the user believes she is using a certain ROCm
	  version while in reality the component is picking up the system
	  environment defined one.
	* src/components/sysdetect/amd_gpu.c: sysdetect/rocm: fix
	  hsa_status_string arguments  hsa_status_string takes an
	  hsa_status_t code and returns a pointer to const char with the
	  error message associated to the status code. In the code we were
	  passing a pointer to a status array of chars rather than a pointer
	  to a const char. This is fixed by this patch.

2021-12-08  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/utils/papi_xml_event_info.c: Improve the papi_xml_event_info
	  command.  Modify the papi_xml_event_info command as follows:. -
	  Test only the event name even if the event has a unit mask. - Test
	  other unit masks in the event even if there is an error in one unit
	  mask in the event.

2021-12-03  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm_smi/linux-rocm-smi.c: rocm_smi: fix bug in
	  event reporting while running papi_component_avail  This fix is
	  similar to fixes: - e646d570 for cuda component & - 8e2f725  for
	  rocm component

2021-11-08  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/papi_vector.c: init_private: fix indentation in order to
	  silence compiler warning  Most recent versions of gcc complain when
	  if statements, not followed by curly braces, are not indented using
	  tabs. Fix the warning by replacing white spaces with tabs.

2021-07-29  Giuseppe Congiu <gcongiu@localhost.localdomain>

	* src/components/sysdetect/README.md,
	  src/components/sysdetect/Rules.sysdetect,
	  src/components/sysdetect/amd_gpu.c,
	  src/components/sysdetect/amd_gpu.h,
	  src/components/sysdetect/arm_cpu_utils.c,
	  src/components/sysdetect/arm_cpu_utils.h,
	  src/components/sysdetect/cpu.c, src/components/sysdetect/cpu.h,
	  src/components/sysdetect/cpu_utils.c,
	  src/components/sysdetect/cpu_utils.h,
	  src/components/sysdetect/linux_cpu_utils.c,
	  src/components/sysdetect/linux_cpu_utils.h,
	  src/components/sysdetect/nvidia_gpu.c,
	  src/components/sysdetect/nvidia_gpu.h,
	  src/components/sysdetect/os_cpu_utils.c,
	  src/components/sysdetect/os_cpu_utils.h,
	  src/components/sysdetect/powerpc_cpu_utils.c,
	  src/components/sysdetect/powerpc_cpu_utils.h,
	  src/components/sysdetect/shm.c, src/components/sysdetect/shm.h,
	  src/components/sysdetect/sysdetect.c,
	  src/components/sysdetect/sysdetect.h,
	  src/components/sysdetect/tests/Makefile,
	  src/components/sysdetect/tests/query_device_mpi.c,
	  .../sysdetect/tests/query_device_simple.c,
	  src/components/sysdetect/x86_cpu_utils.c,
	  src/components/sysdetect/x86_cpu_utils.h, src/configure,
	  src/configure.in, src/papi.h, src/utils/Makefile,
	  src/utils/papi_hardware_avail.c: Sysdetect: system information
	  detection component  The SYSDETECT component allows PAPI users to
	  query comprehensive system information. The information is gathered
	  at PAPI_library_init() time and presented to the user through
	  appropriate APIs. The component works similarly to other
	  components, which means that hardware information for a specific
	  device might not be available at runtime if, e.g., the device
	  runtime software is not installed.  At the moment the
	  infrastructure defines the following device types:  -
	  PAPI_DEV_TYPE_ID__CPU        : for all CPU devices from any vendor
	  - PAPI_DEV_TYPE_ID__NVIDIA_GPU : for all GPU devices from NVIDIA -
	  PAPI_DEV_TYPE_ID__AMD_GPU    : for all GPU devices from AMD  Every
	  device is scanned to gather information when the component is
	  initialized. If there is no installed hardware for the considered
	  device type, is not found, the corresponding information is filled
	  with zeros.  This patch also adds a new utility program, called
	  papi_hardware_avail, that prints out to the command line what
	  hardware is installed for each type and the specifications of each
	  device.

2021-10-28  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/papi.c, src/papi_vector.c: init: make init_private exposed by
	  every component  Lack of uniformity across components burdens
	  front-end code with additional checks. One example is
	  init_private(). This function is implemented only by those
	  components that need delayed initialization due to the high cost of
	  parsing a large number of events from the hardware (e.g. rocm and
	  cuda components). However, this also means that front-end code has
	  to check whether such init_private() function is implemented by
	  other components in order to avoid dereferencing NULL function
	  pointers. A better solution is to implement init_private() in every
	  component and simply make the function return PAPI_OK if the
	  component does not need delayed initialization.

2021-10-18  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/papi.c: papi_lock: fix bug in PAPI_lock and PAPI_unlock

2021-10-16  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/components/rocm/linux-rocm.c: RocmCmp: fix bug in event
	  reporting while running papi_component_avail  In the
	  papi_component_avail() utility program the component private init
	  function is called multiple times. The first time the component
	  events are initialized as expected and the disable flag is set to
	  the error code returned while performing the process, along with
	  the reason the component might be disabled. The second time the
	  private init function of the component is called is when trying
	  listing the supported events. In this case the init does not
	  remember the error code previously returned and sets the disabled
	  flag to PAPI_OK instead. This same bug was already fixed by Tony
	  Castaldo in patch e646d570 for the cuda component.

2021-10-13  Anthony Danalis <adanalis@icl.utk.edu>

	* src/components/rocm/linux-rocm.c: Added code to set the environment
	  variable "ROCP_HSA_INTERCEPT" which is needed since rocm-4.1, and
	  removed spurious whitespace.

2021-10-03  Giuseppe Congiu <gcongiu@icl.utk.edu>

	* src/papi.c: PAPI_accum: documentation bug fix

2021-09-01  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/pcp/linux-pcp.c: Modified the PCP component to use
	  the local host as the PMAPI context. These changes are compatible
	  with both RHEL 7 and 8.  These changes were tested on the IBM
	  POWER9 architecture.

2021-08-30  Vince Weaver <vincent.weaver@maine.edu>

	* src/linux-memory.c: linux-memory: change cache parsing so it works
	  on ARM servers  On Linux we parse files under
	  /sys/devices/system/cpu/ to determine the various cache settings.
	  The old code assumes certain files, such as associativity and
	  linesize are always there (because they are on x86).  This updated
	  code instead of exiting with an error if the files don't exist sets
	  the values to 0.  This allows the cache values to be returned on
	  ARM systems such as Ampere servers.  This could potentially break
	  user code if they are taking the cache values and doing division
	  (such as taking the cache size and dividing by the linesize: if
	  linesize is zero they could get a divide-by-zero error). I'm not
	  sure if there's a way around this without redesigning how the
	  meminfo structure works.

2021-08-24  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Added CUPTI_PROFILER=-1 for no
	  PAPI_CUDA_ROOT set at ./configure time. Disables the CUDA component
	  with the message "Environment variable \ PAPI_CUDA_ROOT must be
	  specified before ./configure is executed."
	* src/components/cuda/linux-cuda.c, src/configure, src/configure.in:
	  Changes to ensure PAPI_CUDA_ROOT was set BEFORE ./configure was
	  run, to ensure we distinguish between CUPTI11, Legacy, and
	  misconfigured.

2021-08-24  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/papi_events.csv: Fix the PAPI_FUL_CCY setting for a64fx  In
	  a64fx, the maximum number of instruction commits is 4, so the
	  following setting was incorrect. PAPI_FUL_CCY=CPU_CYCLES-
	  0INST_COMMIT-1INST_COMMIT-2INST_COMMIT-3INST_COMMIT-4INST_COMMIT
	  The correct settings are:. PAPI_FUL_CCY=CPU_CYCLES-0INST_COMMIT-
	  1INST_COMMIT-2INST_COMMIT-3INST_COMMIT

2021-08-23  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Removed extraneous comments.
	* src/components/cuda/linux-cuda.c: Changed the error messages about
	  the Legacy/Cupti11 failures to better distinguish the exact cause
	  of failure, and updated several possible exits of the
	  initialization that might cause an empty "Disabled" message on
	  papi_component_avail.

2021-08-20  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/dcache.c: Fixed a bug regarding race
	  conditions in a parallel construct in the CAT data cache
	  benchmarks.  These changes were tested on the Fujitsu A64FX
	  architecture.

2021-08-19  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/sde_lib/sde_lib.h,
	  .../Created_Counter/Lib_With_Created_Counter++.cpp,
	  src/components/sde/tests/Simple2/Simple2_Lib++.cpp,
	  src/components/sde/tests/Simple2/Simple2_Lib.c: Updates to the C++
	  API based on early-adopter feedback.

2021-08-12  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/linux-rocm.c, src/components/rocm_smi/linux-
	  rocm-smi.c: Corrected shutdown code to work correctly if delayed
	  init never executes (due to shutdown without using the component).

2021-08-11  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: In scanning system devices to
	  find GPUs, added Guiseppe's recommendation to also check device
	  class to filter out all but Display Controllers; which are GPUs.

2021-08-10  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Added checks for Nvidia devices
	  up front by scanning /sys/class/drm/card files. This is necessary
	  to avoid cuInit() which is needed to run cuDeviceGetCount(). Also
	  corrected a bug in delayed init, in case it was called more than
	  once after already being disabled.

2021-08-09  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Second change, needed proper
	  printf format code.
	* src/components/cuda/linux-cuda.c: Fixed a compile bug in CUDA that
	  only shows with later modules.

2021-08-07  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Changes necessary to sort out at
	  runtime what to do if we were compiled with one cuda module loaded,
	  but run with a different cuda module loaded.  Also had a compile
	  error to fix running with an old cuda module.

2021-08-06  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Change to reject if compiled in
	  headers have structures of different sizes than the version of the
	  cuda_runtime library we found. Also rejecting libraries <11.0, they
	  don't contain CounterAvailability functions that we currently must
	  use in setting up events; i.e. the 10.x API differs slightly from
	  the 11.x API. API differ

Mon Jul 26 16:22:25 2021 +0200  Thomas Richter <tmricht@linux.ibm.com>

	* src/libpfm4/README, src/libpfm4/docs/Makefile,
	  src/libpfm4/docs/man3/libpfm_arm_neoverse_n2.3,
	  src/libpfm4/include/perfmon/pfmlib.h, src/libpfm4/lib/Makefile,
	  src/libpfm4/lib/events/arm_neoverse_n2_events.h,
	  src/libpfm4/lib/pfmlib_arm_armv8.c,
	  src/libpfm4/lib/pfmlib_arm_perf_event.c,
	  src/libpfm4/lib/pfmlib_common.c, src/libpfm4/lib/pfmlib_priv.h,
	  src/libpfm4/lib/pfmlib_s390x_cpumf.c,
	  src/libpfm4/tests/validate_arm64.c: Update libpfm4, to be current
	  with the following commit. Tested on orbitty.icl.utk.edu, ARMv8
	  Processor rev 1 (v8l). commit
	  790451411d481492b6a3b94077b543c3e68c6d2b  do not set certain config
	  bits in pfm_arm_get_perf_encoding()  By default (raw encoding) on
	  ARM, the library was setting the PL1, USR, HYP control bits in the
	  config in the encoded value.  With Linux perf_events, these bits
	  are under the control of the kernel. Any of these bits set by the
	  user is overridden by the kernel based on the settings of the
	  perf_event_attr.exclude_* fields. Recent versions of the perf tool
	  started checking that the config field is not setting bits which
	  are ignored by the kernel. To avoid the perf tool warning, this
	  patch removes the setting of these bits when encoding for Linux
	  perf_events.   commit 0c3efc889fadc8cd9a632f5a10462d37c508c56a  add
	  support for ARM Neoverse N2 core PMU  This patch adds support for
	  ARM Neoverse N2 core PMU based on the ARM TRM version 0. The new
	  PMU is called arm_n2.   commit
	  e166a8869f64cd3a47b2b42a3022e4cceecea799  Support cycles:u modifier
	  for s390  The function invocation of
	  pfm_get_perf_event_encoding("cycles:u", ...) fails on s390. However
	  the modifier :u is supported on s390, where as modifiers :h and :k
	  are not supported.  Fix this by adding the supported_plm field and
	  set it properly. This setting causes function
	  pfm_perf_perf_validate_pattrs() to accept modifier :u as valid.
	  Test code: .... memset(&attr, 0, sizeof(attr)); attr.size =
	  sizeof(attr); ret = pfm_get_perf_event_encoding(evname,
	  PFM_PLM0|PFM_PLM3, &attr, NULL, NULL); txt = pfm_strerror(ret);
	  printf("TEST %s ret:%d(%s) config:%#lx type:%d\n", evname, ret,
	  txt, attr.config, attr.type);  Output before: TEST cycles:u
	  ret:ret:-8(invalid event attribute) config:0 type:0  Output after:
	  TEST cycles:u ret:0(success) config:0 type:0  Acked-by: Sumanth
	  Korikkar <sumanthk@linux.ibm.com>

2021-08-05  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/dcache.c,
	  src/counter_analysis_toolkit/dcache.h,
	  src/counter_analysis_toolkit/main.c: Added feature to CAT data
	  cache benchmark to print a header line in the output files. This
	  header shows the ID of the CPU core to which each thread was
	  pinned. This provides more detail of the hardware context for
	  reproducibility.  These changes were tested on the Fujitsu A64FX
	  architecture.

2021-07-30  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/sde_lib/sde_lib.h: Cleanup.
	* src/components/sde/sde_lib/sde_lib.h,
	  .../Created_Counter/Created_Counter_Driver++.cpp,
	  .../Created_Counter/Lib_With_Created_Counter++.cpp,
	  src/components/sde/tests/Makefile,
	  .../sde/tests/Minimal/Minimal_Test++.cpp,
	  src/components/sde/tests/README.txt,
	  .../sde/tests/Recorder/Lib_With_Recorder++.cpp,
	  .../sde/tests/Recorder/Recorder_Driver++.cpp,
	  src/components/sde/tests/Simple/Simple_Driver.c,
	  .../sde/tests/Simple2/Simple2_Driver++.cpp,
	  src/components/sde/tests/Simple2/Simple2_Driver.c,
	  src/components/sde/tests/Simple2/Simple2_Lib++.cpp,
	  src/components/sde/tests/Simple2/Simple2_Lib.c: C++ interface for
	  the library-side API of papi-sde and examples that demonstrate its
	  usage.

2021-07-29  Heike Jagode <jagode@icl.utk.edu>

	* src/papi.c: Added missing changes for 'delayed init' feature to
	  ensure that our PAPI utilities still report the correct number of
	  native events and counters.

2021-07-28  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Minor changes to Macros so merge
	  is not confused.

2021-07-29  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/caches.h,
	  src/counter_analysis_toolkit/dcache.c,
	  src/counter_analysis_toolkit/dcache.h,
	  src/counter_analysis_toolkit/timing_kernels.c: Added feature to CAT
	  data cache benchmark to measure data read latencies for each worker
	  thread. This allows us to observe additional data-access nuance for
	  each core in the socket.  These changes were tested on the Fujitsu
	  A64FX architecture.

2021-07-28  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/Rules.cuda,
	  src/components/cuda/tests/HelloWorld.cu,
	  src/components/cuda/tests/Makefile,
	  src/components/cuda/tests/simpleMultiGPU.cu,
	  .../cuda/tests/simpleMultiGPU_CUPTI11.cu, src/configure: Moved test
	  for CUpti 11 to configure, out of Rules.cuda. Modified HelloWorld
	  and simpleMultiGPU.cu to work properly with Legacy CUpti. Added
	  simpleMultiGPU_CUPTI11.cu, because CUcontext monitoring requires a
	  different protocol for managing CUcontexts. Adjusted Makefile
	  accordingly.

2021-07-28  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/caches.h,
	  src/counter_analysis_toolkit/dcache.c,
	  src/counter_analysis_toolkit/dcache.h,
	  src/counter_analysis_toolkit/driver.h,
	  src/counter_analysis_toolkit/main.c,
	  src/counter_analysis_toolkit/timing_kernels.c,
	  src/counter_analysis_toolkit/timing_kernels.h: Added feature to CAT
	  data cache benchmark to measure event occurrences for each worker
	  thread. This allows us to accurately measure a chip's region-
	  specific events.  These changes were tested on the Fujitsu A64FX
	  architecture.

2021-07-27  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Changes to compile clean on
	  Legacy Cupti.

2021-07-27  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/Rules.sde, src/components/sde/sde_internal.h,
	  src/components/sde/sde_lib/papi_sde_interface.h,
	  src/components/sde/sde_lib/sde_common.c,
	  src/components/sde/sde_lib/sde_common.h,
	  src/components/sde/sde_lib/sde_lib.c,
	  src/components/sde/sde_lib/sde_lib.h,
	  src/components/sde/sde_lib/weak_symbols.c,
	  .../sde/tests/Advanced_C+FORTRAN/Gamum.c,
	  .../sde/tests/Advanced_C+FORTRAN/sde_symbols.c,
	  .../Created_Counter/Lib_With_Created_Counter.c,
	  src/components/sde/tests/Makefile,
	  src/components/sde/tests/Minimal/Minimal_Test.c,
	  .../sde/tests/Recorder/Lib_With_Recorder.c,
	  src/components/sde/tests/Simple/Simple_Driver.c,
	  src/components/sde/tests/Simple/Simple_Lib.c,
	  src/components/sde/tests/Simple2/Simple2_Driver.c,
	  src/components/sde/tests/Simple2/Simple2_Lib.c, src/configure,
	  src/configure.in, src/utils/Makefile,
	  src/utils/papi_native_avail.c: Converted libsde into a header-only
	  library to ease integration into third part software. Now the only
	  thing a third party code needs in order to export SDEs is to
	  #include "sde_lib.h". This change also simplified the integration
	  into the PAPI utility papi_native_avail so "linking tricks" and
	  weak symbols are not needed anymore.

2021-07-26  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/README.md, src/components/cuda/Rules.cuda,
	  src/components/cuda/linux-cuda.c,
	  src/components/cuda/tests/HelloWorld.cu,
	  src/components/cuda/tests/HelloWorld_CUPTI11.cu,
	  src/components/cuda/tests/simpleMultiGPU.cu: Changes to make the
	  test code work as expected in CUpti 11, with the CUpti callback
	  monitoring of CUcontext activity.

2021-07-15  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: On A100 CC 8.0, details on some
	  events fails; and caused debug errors to print that should have
	  been suppressed. Corrected this.
	* src/components/cuda/linux-cuda.c: disabled some debug messages.
	* src/components/cuda/linux-cuda.c: Legacy CUPTI was failing if PAPI
	  user already had a context set.
	* src/components/cuda/linux-cuda.c: Clean up some debug code, and
	  extraneous code in cuda_shutdown that belonged in cuda11_shutdown.

2021-07-12  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c,
	  src/components/cuda/tests/HelloWorld_CUPTI11.cu: Retested with
	  valgrind for memory leaks; removed redundant code in the
	  cuda11_read(), and corrected HelloWorld_CUPTI11.cu to use a single
	  pass event, instead of my test 2-pass event.

2021-07-09  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c,
	  src/components/cuda/tests/HelloWorld_CUPTI11.cu: Corrected a bug in
	  description formation; had HelloWorld_CUPTI11.cu report in both
	  decimal and hexadecimal.
	* src/components/cuda/linux-cuda.c: Corrected problems with correctly
	  choosing Legacy or CUpti-11, and issues about what to
	  include/exclude to be compatible with previous Nvidia distributions
	  without profile headers and libraries.
	* src/components/cuda/README.md, src/components/cuda/Rules.cuda,
	  src/components/cuda/linux-cuda.c: Updates in commentary and
	  documentation.

2021-07-07  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/README.md, src/components/cuda/linux-cuda.c:
	  All CUPTI11 code works, legacy cupti still works, on xsdk. However,
	  configure and build still needs work, it will fail without a
	  PerfWorks directory.

2021-07-08  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/timing_kernels.c: Increased the
	  minimum number of pointer chain accesses in the CAT data cache
	  benchmark. This yields more stable measurements when using smaller
	  buffer sizes.  These changes were tested on the Fujitsu A64FX
	  architecture.

2021-06-23  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c,
	  src/components/cuda/tests/HelloWorld_CUPTI11.cu,
	  src/components/cuda/tests/Makefile: Extensive changes linux-cuda.c
	  to use CUpti-11 when CC >= 7.0; this has been tested and seems to
	  work on ICL xsdk, but is not optimized. Adding a changed Makefile
	  and HelloWorld_CUPTI11.cu to better test various scenarios of
	  cuda_context arrangements.

Sat Jun 5 15:07:53 2021 -0700  Thomas Richter <tmricht@linux.ibm.com>

	* src/libpfm4/lib/events/intel_skl_events.h,
	  src/libpfm4/lib/pfmlib_amd64.c,
	  src/libpfm4/lib/pfmlib_amd64_fam11h.c,
	  src/libpfm4/lib/pfmlib_amd64_fam12h.c,
	  src/libpfm4/lib/pfmlib_amd64_fam15h.c,
	  src/libpfm4/lib/pfmlib_amd64_fam16h.c,
	  src/libpfm4/lib/pfmlib_amd64_fam17h.c,
	  src/libpfm4/lib/pfmlib_amd64_fam19h.c,
	  src/libpfm4/lib/pfmlib_amd64_perf_event.c,
	  src/libpfm4/lib/pfmlib_amd64_priv.h,
	  src/libpfm4/lib/pfmlib_intel_x86_perf_event.c,
	  src/libpfm4/lib/pfmlib_perf_event.c,
	  src/libpfm4/lib/pfmlib_s390x_perf_event.c: Update libpfm4, to be
	  current with the following commit. Tested on icl.utk.edu machines
	  xsdk (Intel Xeon Gold 6254), morphine (AMD EPYC 7301), histamine
	  dopamine (AMD EPYC 7402) guyot (AMD EPYC 7742).  commit
	  d0b85fb5813dbd73e408fa21dceaf204623609cc  AMD64 encoding and debug
	  cleanup  This patch fixes and update the way the guest vs. host
	  encoding is handled. The guest vs. hosst hardware filtering is
	  available since Fam10h onward except for Fam11h. This is now
	  handled with proper pmu_rev encoding for each PMU and a new helper
	  function pfm_amd64_supports_virt().  Also fixes the verbose output
	  to handle guest vs. host correctly.   commit
	  e3ae4bd86b9f37cbdc31625dd23b80ef66da5df7  fix typo in
	  OFFCORE_RESPONSE umask on Intel SkylakeX
	  L3_MISS_MISS_REMOTE_HOP1_DRAM -> L3_MISS_REMOTE_HOP1_DRAM
	  Reported-by: Ian Rogers <irogers@google.com>  commit
	  0106e839a8bade2abda66512b8b4be2338fc3729  make verbose print more
	  explicit in pfmlib_perf_event_encode()  Spell out the field names
	  better to make them easier to understand.   commit
	  e0fcc38251cf680fcdd0c18b4c13327737f3ebb8  do not set certain config
	  bits in pfm_amd64_get_perf_encoding()  By default on Intel X86, the
	  library was setting the EN and INT bits for each core PMU events.
	  But when encoding for perf_events, these bits are ignored by the
	  interface and reprogrammed by the kernel. Similarly, the
	  USR/OS/GUEST/HOST bits are controlled by the
	  perf_event_attr.exclude_* field not the config field.  Recent
	  versions of the Linux perf tool warn when bits which are ignored
	  are set in the config field which is useful.  This patch clears all
	  the config bits under the control of the perf_events interface.
	  The encoding for raw PMU mode is unchanged.   commit
	  5be1e849a25c7d02bdeb04678bfe204783b8b5ff  do not set certain config
	  bits in pfm_intel_x86_get_perf_encoding()  By default on Intel X86,
	  the library was setting the EN and INT bits for each core PMU
	  events. But when encoding for perf_events, these bits are ignored
	  by the interface and reprogrammed by the kernel. Similarly, the
	  USR/OS bits are controlled by the perf_event_attr.exclude_* field
	  not the config field.  Recent versions of the Linux perf tool warn
	  when bits which are ignored are set in the config field which is
	  useful.  This patch clears all the config bits under the control of
	  the perf_events interface.  The encoding for raw PMU mode is
	  unchanged.   commit 3833ff527012a33131f9af2530fe1447f6984ebf
	  search perf attr.type event number for s390  Commit 30adc677603b
	  ("lib/pfmlib_s390x_perf_event.c: Fix perf attr.type event number
	  for s390") fixes the dynamic PMU type assignment by the kernel when
	  s390x PMU device drivers are loaded at boot time. However s390x has
	  several PMU device drivers. Therefore find the correct one first
	  and then return the type number read out from a sysfs file. Once
	  the PMU type number is determined, it does not change until the
	  next reboot. It is ok to cache it. Also add a check if the PMU
	  really exists and return an error if not.  Fixes: 30adc677603b
	  ("lib/pfmlib_s390x_perf_event.c: Fix perf attr.type event number
	  for s390")

2021-06-15  William Cohen <wcohen@redhat.com>

	* src/validation_tests/instructions_testcode.c: Use numeric local
	  labels to allow compilation with LTO enabled  Some assembly
	  snippets in instructions_testcode.c used regular label names.
	  Unfortunately, when multiple copies of the snippets are inlined in
	  different places with LTO enabled the multiple copies of a label by
	  the same name cause the build to fail because of the redefinition
	  of the label.  To avoid this problem all those labels have been
	  converted to numeric local labels to allow multiple copies to
	  peacefully coexist in the LTO enabled code.

2021-06-10  Heike Jagode <jagode@icl.utk.edu>

	* src/Rules.pfm4_pe: Rebase to remove commit 1f48bb7 since there
	  appear to be issues with this.

Sun May 9 15:45:18 2021 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/docs/Makefile,
	  src/libpfm4/docs/man3/libpfm_intel_icl.3,
	  src/libpfm4/docs/man3/libpfm_intel_icx.3,
	  src/libpfm4/docs/man3/pfm_get_os_event_encoding.3,
	  src/libpfm4/include/perfmon/pfmlib.h,
	  src/libpfm4/lib/events/arm_neoverse_n1_events.h,
	  src/libpfm4/lib/events/intel_icl_events.h,
	  src/libpfm4/lib/pfmlib_amd64.c, src/libpfm4/lib/pfmlib_common.c,
	  src/libpfm4/lib/pfmlib_intel_icl.c, src/libpfm4/lib/pfmlib_priv.h,
	  src/libpfm4/tests/validate_x86.c: Update libpfm4, to be current
	  with the following commit: Tested on Histamine (Zen2) Dopamine
	  (Zen2) Morphine (Zen1) XSDK (Intel).  commit
	  74b79969f2f752df3be404d9c23f9709d738062f  fix buffer overrun in
	  Intel IcelakeX model table  The following commit introduced a bug:
	  12aeb9f69438 enable Intel IcelakeX core PMU support  By forgetting
	  a NULL termination to the icx_models[] table.   commit
	  e2bd6b5b573b124d5c07670cfc9f0923b6223288  fix Intel Icelake man
	  page date  No Icelake in 2015!   commit
	  12aeb9f694382bbf82061ac0b28abb5d2178fe8d  enable Intel IcelakeX
	  core PMU support  This patch adds Intel IcelakeX (Icelake for
	  servers) core PMU support. This is the same core PMU as for the
	  client Icelake with the addition of events to cover remote and PMM
	  accesses.  Based on Intel's icelakex_core_v1.04.json from 01.org.
	  commit 9c3e9c025efc06f4ac4422d5e87a05d9776cbb94  fix detection of
	  AMD64 Zen1 vs. Zen2  This patch fixes the test checking the model
	  number for AMD64 Fam17h processors. There was a bug where it would
	  detect some Zen1 processors as Zen2. Zen2 processors start at model
	  number 48 and up.   commit dee24f6323023573f22dc68882cea44859c0b7ac
	  add ARM SPE events for Neoverse N1 core PMU  This patches adds the
	  four Statistical Profiling Extension (SPE) related core PMU events:
	  - SAMPLE_POP - SAMPLE_FEED - SAMPLE_FILTRATE - SAMPLE_COLLISON
	  commit 21787c7cca3b8b4d02e5608bfef9bdfa7acd7d8e  fix
	  pfm_get_os_event_encoding man page typos  There is no PERF_OS_EVENT
	  enum, should be PFM_OS_PERF_EVENT.

2021-05-22  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c, src/components/nvml/linux-nvml.c,
	  src/components/rocm/linux-rocm.c, src/components/rocm_smi/linux-
	  rocm-smi.c, src/papi.h, src/papi_vector.h: Reposting changes made
	  by Damien Genet, with bug corrections, to delay component
	  initialization until necessary. For CUDA, NVML, ROCM and ROCM_SMI
	  components. CUDA and NVML components tested on XSDK, ROCM and
	  ROCM_SMI components tested on Caffeine.

2021-05-17  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/dcache.c,
	  src/counter_analysis_toolkit/main.c,
	  src/counter_analysis_toolkit/timing_kernels.c: Added feature to CAT
	  to collect latency data for the entire parameter sweep used in the
	  data cache reading benchmark.  Also fixed an overflow error in the
	  number of pointer-chain accesses by storing this value as a 'long'
	  instead of an 'int'.

2021-05-18  Swarup Sahoo <swarup-chandra.sahoo@amd.com>

	* src/papi_events.csv: Added AMD Zen3 preset events. Refer section
	  2.1.17.2 of PPR for AMD family 19h model 01h,
	  https://www.amd.com/system/files/TechDocs/55898_pub.zip

2021-05-04  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Corrected an error discovered by
	  Tristan Konolige; pushing the retained context when it is identical
	  to the current context causes an error. Also updated all error
	  exits to properly restore user context.

Sun May 2 23:43:17 2021 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/docs/man3/pfm_get_os_event_encoding.3,
	  src/libpfm4/include/perfmon/perf_event.h,
	  src/libpfm4/lib/events/perf_events.h,
	  src/libpfm4/lib/pfmlib_amd64_rapl.c: Update libpfm4, to be current
	  with the following commit: The ZEN3 modification cannot be tested;
	  we have no ZEN3 machine. The other changes are not machine
	  specific; we did a smoke test (compile and execute
	  papi_component_avail, papi_native_avail) on ICLs xsdk machine.
	  commit 06197c0543476d40fad1c94d240e46a5d114f887  enable RAPL for
	  AMD64 Fam19h Zen3 processor  As per AMD64 PPR for Fam19h model 01h,
	  RAPL Package is supported, so enable it.   commit
	  be0dd1e0f63cb3d0915bc368baebe778792b6955  Add cgroup-switches
	  software event  Linux v5.13 added the 'cgroup-switches' event so it
	  should be supported by libpfm4 as well.   commit
	  d624a97b8e2143e1b890ac1a892b4620acb736f5  fix arg type in
	  pfm_get_os_event_encoding() man page  This patch replaces
	  references to pfm_raw_pmu_encode_t with pfm_pmu_encode_t to reflect
	  the actual data type used in the code.  Thanks to Claudio Parra for
	  reporting the issue.

2021-05-03  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Correcting a typo that can cause
	  a segfault.

2021-04-29  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/linux-rocm.c: Using macros (like papi_debug.h)
	  instead of if (0).

2021-04-28  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Additional context cleanup in
	  _cuda_update_control_state() to accomodate issues with non-primary
	  contexts.

2021-04-23  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/linux-rocm.c: Deleted an extraneous paranoid
	  line of code.

2021-04-22  Anthony Michael Castaldo <coe0234@tulip.cm.cluster>

	* src/components/rocm/README.md, src/components/rocm/Rules.rocm,
	  src/components/rocm/linux-rocm.c,
	  src/components/rocm/rocm_IncDirs.awk: Improved automatic detection
	  of ROCM root directory, so exporting PAPI_ROCM_ROOT is not always
	  necessary on systems that load modules. We recognize environment
	  variables ROCM_PATH, ROCM_DIR, and ROCMDIR. At compile time, we
	  have code in Rules.rocm that can examine the LD_LIBRARY_PATH
	  variable and extract possible -Iinclude_paths for the compile. This
	  uses 'awk', but if 'awk' is not present on the system it won't
	  cause an error message. We will also still use PAPI_ROCM_ROOT at
	  compile time, preferentially, when specified. README.md has been
	  updated to reflect these changes.

2021-04-22  William Cohen <wcohen@redhat.com>

	* src/Makefile.inc: Correct warning message to 'make dist-targz'.

2021-04-20  William Cohen <wcohen@redhat.com>

	* src/utils/papi_multiplex_cost.c: Check to ensure that mallocs
	  allocated memory in papi_multiplex_cost.c  The malloc function can
	  return NULL if the function is unable to allocate memory.
	  papi_multiplex_cost.c needs checks like papi_command_line.c has and
	  exit the program with an error if any of the malloc operations
	  fail.

2021-04-13  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c,
	  src/components/cuda/tests/HelloWorld_NP_Ctx.cu,
	  src/components/cuda/tests/Makefile: This code corrects an oversight
	  and works if the application has already created a non-primary
	  context before calling PAPI_library_init(). A modification of
	  HelloWorld.cu, HelloWorld_NP_Ctx.cu, will test if if the code works
	  with a non-primary context created; HelloWorld.cu tests without
	  creating a non-primary context. This was tested on XSDK with two
	  Titan V GPUs.

2021-04-12  Anthony <adanalis@icl.utk.edu>

	* src/configure, src/configure.in: Changes to the configure script to
	  accommodate the (upcoming) intel_gpu component.

Fri Apr 2 12:38:56 2021 -0700  Stephane Eranian <eranian@google.com>

	* src/libpfm4/lib/pfmlib_amd64_fam19h_l3.c,
	  src/libpfm4/lib/pfmlib_intel_snbep_unc.c,
	  src/libpfm4/tests/validate_x86.c: The following fixes are for AMD
	  Zen3 CPUs, untested by ICL, we have no access to Zen3 processors at
	  this time.  Update libpfm4, to be current with the following
	  commit:  commit 6864dad7cf85fac9fff04bd814026e2fbc160175  Fix AMD64
	  Fam19h L3 PMU support  The PMU perf_events type was not correctly
	  encoded because the .perf_name field was not initialized and
	  therefore it defaulted to using the core PMU. The correct perf_name
	  is "amd_l3". With that in place, the library now picks up the
	  correct PMU type and associated programming restrictions, e.g.,
	  per-cpu mode only and code such as perf_examples/self should not be
	  allow to succeed at perf_event_open().  Reported-by: Steve Kaufmann
	  <steven.kaufmann@hpe.com>  commit
	  99975b4738cf7f2550922f0761f2776159842c00  fix grpid handling for
	  Intel X86 uncore  On SkylakeX the umask grpid field is overloaded
	  to contain two subfield. The actual grpid and the required grpid
	  (at offset 8). The encoding code has a bug where it would not use
	  the accessor function get_grpid() to extract the group id from the
	  field. Given that the grpid is used in statements such as: u = 1 <<
	  pe[e->event].umasks[a->idx].grpid; The code could run the risk of
	  exceeding the max shift for a 16-bit value. The fix is to use
	  accessor function to extract the grpid.  The patch also adds a
	  validation test to ensure events which would cause a large grpid
	  are properly encoded.

2021-04-06  Anthony <adanalis@icl.utk.edu>

	* src/counter_analysis_toolkit/.cat_cfg,
	  src/counter_analysis_toolkit/main.c: Adjust cache leves based on
	  information in config file, and make the default config file empty.

2021-04-05  Anthony <adanalis@icl.utk.edu>

	* src/counter_analysis_toolkit/.cat_cfg,
	  src/counter_analysis_toolkit/Makefile,
	  src/counter_analysis_toolkit/branch.c,
	  src/counter_analysis_toolkit/branch.h,
	  src/counter_analysis_toolkit/dcache.c,
	  src/counter_analysis_toolkit/dcache.h,
	  src/counter_analysis_toolkit/driver.h,
	  src/counter_analysis_toolkit/event_list.txt,
	  src/counter_analysis_toolkit/flops.c,
	  src/counter_analysis_toolkit/flops.h,
	  src/counter_analysis_toolkit/hw_desc.h,
	  src/counter_analysis_toolkit/icache.c,
	  src/counter_analysis_toolkit/icache.h,
	  src/counter_analysis_toolkit/main.c,
	  src/counter_analysis_toolkit/prepareArray.c,
	  src/counter_analysis_toolkit/prepareArray.h,
	  src/counter_analysis_toolkit/timing_kernels.c,
	  src/counter_analysis_toolkit/timing_kernels.h: Changed CAT code to
	  enable dynamic discovery of cache sizes, and also user provided
	  values (through .cat_cfg file).

Wed Jan 27 20:12:59 2021 +0900  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/libpfm4/README, src/libpfm4/docs/Makefile,
	  src/libpfm4/docs/man3/libpfm_amd64_fam19h_zen3.3,
	  .../docs/man3/libpfm_amd64_fam19h_zen3_l3.3,
	  src/libpfm4/docs/man3/libpfm_arm_a64fx.3,
	  src/libpfm4/include/perfmon/pfmlib.h, src/libpfm4/lib/Makefile,
	  src/libpfm4/lib/events/amd64_events_fam19h_zen3.h,
	  .../lib/events/amd64_events_fam19h_zen3_l3.h,
	  src/libpfm4/lib/events/perf_events.h,
	  src/libpfm4/lib/pfmlib_amd64.c,
	  src/libpfm4/lib/pfmlib_amd64_fam19h.c,
	  src/libpfm4/lib/pfmlib_amd64_fam19h_l3.c,
	  src/libpfm4/lib/pfmlib_amd64_priv.h,
	  src/libpfm4/lib/pfmlib_amd64_rapl.c,
	  src/libpfm4/lib/pfmlib_arm_armv8.c,
	  src/libpfm4/lib/pfmlib_common.c,
	  src/libpfm4/lib/pfmlib_intel_icl.c,
	  src/libpfm4/lib/pfmlib_intel_nhm_unc.c,
	  src/libpfm4/lib/pfmlib_perf_event_pmu.c,
	  src/libpfm4/lib/pfmlib_priv.h,
	  src/libpfm4/perf_examples/notify_group.c,
	  src/libpfm4/perf_examples/perf_util.c,
	  src/libpfm4/tests/validate_x86.c: This affects the processor AMD
	  Zen2, we tested on it. It affects the following processors we do
	  not have to test on; A64FX (Fujitsu ARM),AMD Zen3, Intel TigerLake
	  and RocketLake.  Update libpfm4, to be current with the following
	  commit:  commit c132ab4948a828334a8fef00303a4b47f59bb4d9  Add
	  prefix to AMD Fam19h Zen3 L3 events  To avoid potential conflict
	  with other core PMU events and make it more explicit these are
	  uncore L3 events following the model of Intel uncore PMUs.   commit
	  a97908e8e6b6a28ae369dfbc9af97b52fe932273  Enable Intel Tigerlake
	  and Rocketlake core PMU support  They are equivalent to Intel
	  Icelake, so reuse the same event table.   commit
	  315941fc05f5a487e4eb5efd36ea10438336944b  add AMD64 Fam19h Zen3 L3
	  PMU support  This patch adds the AMD Fam19h (Zen3) L3 PMU support
	  consisting of 3 published events.  new PMU model:
	  amd64_fam19h_zen3_l3  Based on the public specifications PPR
	  (#55898) Rev 0.35 - Feb 5, 2021. Available at:
	  https://www.amd.com/system/files/TechDocs/55898_pub.zip   commit
	  e2afb6186dab2419a4b6f79a6adf7cd9bb0f2340  Add AMD64 Fam17h Zen2
	  RAPL support  This patch adds RAPL support for AMD64 Fam17h Zen2
	  processors. On Zen2, only the RAPL_ENERGY_PKGS event is supported.
	  commit cc4ba27e55440f87359bee5176380db1ba4ef8af  Add AMD64 Fam19h
	  Zen3 core PMU support  The patch adds a core PMU support for AMD
	  Fam19h Zen3.  new PMU model: amd64_fam19h_zen3  Based on the public
	  specifications PPR (#55898) Rev 0.35 - Feb 5, 2021. Available at:
	  https://www.amd.com/system/files/TechDocs/55898_pub.zip   commit
	  5333f3245954b038100530a17675bbbafdae3061  Fix casting issues
	  reported by PGI compiler  The PGI compiler does not like: struct {
	  unsigned long field; };  struct.field = -1,  So clean this up and
	  various others casting issues reported by Carl Ponder on the bugs.
	  commit f6500e77563e606c8510ff26f57d321328bd8157  Changing the
	  number of PMU counters and deleting the ARM(32-bit) mode for A64FX
	  The current libpfm4 implementation treats PMCR_EL0.N = 0x6 like
	  other ARM Reference processors. On an A64FX, PMCR_EL0.N = 0x8 (The
	  number of PMU counters is 8.). Therefore, only 6 counters are
	  available in the current implementation. The A64FX core also
	  supports the AArch64 state and the A64 Instruction set. The AArch32
	  state and the A32, T32 Instruction set are not supported and cannot
	  be transitioned to this Execution state. Currently, the libpfm
	  manual(docs/man3/libpfm_arm_a64fx.3) states that A32/A64 can be
	  used, but A32 cannot be used.  I have created a patch with the
	  above fixes, so please review and merge it.  Originally, the
	  specification of the A64FX which Fujitsu published should have
	  described the above two points, but the description was omitted.
	  A64FX Specification HPC Extension v1.1 will add:. - On a A64FX,
	  PMCR_EL0.N = 0x8 (The number of PMU counters is 8.). - A64FX does
	  not support the AArch32 state and the A32, T32 Instruction set and
	  cannot transition to this Execution state.

2021-03-11  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/papi_hl.c: Improved randomization of rank id.

2021-03-10  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/papi_hl.c: Added more hardware information in hl
	  performance output.

2021-03-09  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/papi_hl.c: Improved hl performance output for
	  parallel programs.  If the system does not provide the rank id, a
	  unique file is created per rank. This implementation avoids race
	  conditions.

2021-02-24  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Corrects a sequence error in the
	  use of cuda context that was causing an issue on Summit.
	* src/components/cuda/linux-cuda.c: interim commit for merge

2021-02-22  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/papi_hl.c, src/high-
	  level/scripts/papi_hl_output_writer.py: Improved hl output.

2021-02-22  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/tests/rocm_example.cpp,
	  src/components/rocm_smi/tests/rocmsmi_example.cpp: Modifications to
	  commentary in instructional example code, for accuracy and clarity.

2021-02-19  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/tests/ROCM_Makefile: Deleting obsolete
	  ROCM_Makefile.
	* src/components/rocm/tests/ROCM_Makefile: Re-adding
	  components/rocm/tests/ROCM_Makefile to resolve merge conflict. It
	  is obsolete, and will be deleted in a future update.
	* src/components/rocm/tests/ROCM_Makefile: ROCM_Makefile is obsolete;
	  incorporated into Makefile.
	* src/components/rocm/tests/ROCM_Makefile, src/components/rocm_smi
	  /linux-rocm-smi.c: Restoring ROCM_Makefile to deal with merge
	  conflictt. Adding sensor 0-relative, 1-relative fix.

2021-02-19  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/papi_hl.c, src/high-
	  level/scripts/papi_hl_output_writer.py: Revised hl output.

2021-02-19  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/linux-rocm.c, src/components/rocm_smi/linux-
	  rocm-smi.c: Clean up code and library search for both components.
	  For ROCM, automatically set rocprofiler environment variables if
	  missing.

2021-02-18  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/papi_hl.c: Fixed raw output.
	* src/high-level/papi_hl.c: Added component name to event
	  definitions.

2021-02-16  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/papi_events.csv: remove PAPI_L1_TCA and PAPI_L1_TCH for a64fx
	  PAPI_L1_TCA and PAPI_L1_TCH for a64fx measure L1D_CACHE just like
	  PAPI_L1_DCA and PAPI_L1_DCH, so I delete (comment out) PAPI_L1_TCA
	  and PAPI_L1_TCH for a64fx from the papi_events.csv file.

2021-02-15  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/dcache.c,
	  src/counter_analysis_toolkit/timing_kernels.c: Modified the multi-
	  threaded CAT data cache benchmark so that each thread's memory
	  buffer is allocated in separate threads.  Allocating all buffers in
	  a single thread means they exist in the same NUMA region. This
	  change prevents an imbalance of memory accesses to just a single
	  NUMA region.  This change was tested on the IBM POWER9
	  architecture.

2021-02-14  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/papi_hl.c: Modified recording of regions. - All
	  regions have an unique region ID - Added hierarchy for nested
	  regions - List regions that have the same name separately in the
	  JSON output

2021-02-12  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/papi_events.csv: remove PAPI_L1_DCA and PAPI_L1_DCH for a64fx
	  There seems to be a problem with PAPI_L1_DCA and PAPI_L1_DCH for
	  a64fx that prefetch overcounts. I delete (comment out) PAPI_L1_DCA
	  and PAPI_L1_DCH for a64fx from the papi_events.csv file. I will
	  issue the pullrequest again once I have identified how to handle
	  the overcount.

2021-02-11  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/Makefile,
	  src/counter_analysis_toolkit/dcache.c,
	  src/counter_analysis_toolkit/dcache.h,
	  src/counter_analysis_toolkit/timing_kernels.c,
	  src/counter_analysis_toolkit/timing_kernels.h: Implemented a multi-
	  threaded version of the CAT data cache benchmarks.  This is
	  necessary for full utilization of the hardware in the memory
	  hierarchy, which provides more stable benchmark results.  These
	  changes were tested on the IBM POWER9 architecture.

2021-02-10  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/dcache.c,
	  src/counter_analysis_toolkit/dcache.h: Removed the CAT data cache
	  benchmarks from running in a separate, stand-alone thread.  This is
	  a necessary step to implement truly multi-threaded versions of the
	  benchmarks.  These changes were tested on the IBM POWER9
	  architecture.

2021-02-04  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm_smi/tests/rocmsmi_example.cpp: Minor
	  modifications to comments and report code.

2021-02-03  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/tests/Makefile,
	  src/components/rocm/tests/ROCM_Makefile,
	  src/components/rocm/tests/rocm_all.cpp,
	  src/components/rocm/tests/rocm_example.cpp,
	  src/components/rocm_smi/tests/Makefile,
	  src/components/rocm_smi/tests/ROCM_SMI_Makefile,
	  .../rocm_smi/tests/power_monitor_rocm.cpp,
	  .../rocm_smi/tests/rocm_command_line.cpp,
	  src/components/rocm_smi/tests/rocm_smi_all.cpp,
	  .../rocm_smi/tests/rocm_smi_writeTests.cpp,
	  src/components/rocm_smi/tests/rocmsmi_example.cpp: In ROCM and
	  ROCM_SMI, deleted specialty Makefiles and incorporated all makes
	  into .../tests/Makefile. This required minor mods to existing files
	  to get a clean compile without warnings. Added two files,
	  rocm_example.cpp and rocmsmi_example.cpp, that are coding tutorials
	  with heaving commenting for programmers new to PAPI; these will
	  also be used for video tutorials by AMD.

2021-01-29  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/tests/lib/.gitignore: Added the directory lib
	  under sde/tests
	* src/components/sde/Rules.sde,
	  src/components/sde/interface/papi_sde_interface.c,
	  src/components/sde/interface/papi_sde_interface.h,
	  src/components/sde/sde_lib/Makefile,
	  src/components/sde/sde_lib/sde_common.c,
	  src/components/sde/sde_lib/sde_common.h,
	  src/components/sde/sde_lib/sde_lib.c,
	  src/components/sde/sde_lib/weak_symbols.c,
	  src/components/sde/tests/Makefile, src/configure, src/configure.in,
	  src/utils/Makefile, src/utils/Makefile.target.in: Cleaned up the
	  stand alone sde code. Now it does not need to be built into a
	  separate library, the sources/objects can be integrated into third
	  party libraries directly.

2021-01-20  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/interface/papi_sde_interface.c,
	  src/components/sde/sde.c, src/components/sde/sde_internal.h,
	  src/components/sde/sde_lib/Makefile,
	  src/components/sde/sde_lib/sde_common.c,
	  src/components/sde/sde_lib/sde_common.h,
	  src/components/sde/sde_lib/sde_lib.c,
	  src/components/sde/sde_lib/weak_symbols.c,
	  .../tests/Created_Counter/Created_Counter_Driver.c,
	  .../Created_Counter/Lib_With_Created_Counter.c,
	  .../sde/tests/Created_Counter/Overflow_Driver.c,
	  src/components/sde/tests/Minimal/Minimal_Test.c,
	  .../sde/tests/Recorder/Lib_With_Recorder.c,
	  .../sde/tests/Recorder/Recorder_Driver.c,
	  src/components/sde/tests/Simple/Simple_Driver.c,
	  src/components/sde/tests/Simple/Simple_Lib.c,
	  src/components/sde/tests/Simple2/Simple2_Driver.c,
	  src/components/sde/tests/Simple2/Simple2_Lib.c,
	  .../sde/tests/Simple2/Simple2_NoPAPI_Driver.c: Removed trailing
	  white spaces.

2020-09-03  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/Rules.sde, src/components/sde/sde.c,
	  src/components/sde/sde_common.c, src/components/sde/sde_common.h,
	  src/components/sde/sde_internal.h, src/components/sde/sde_lib.c,
	  src/components/sde/sde_lib/Makefile,
	  src/components/sde/sde_lib/papi_sde_interface.h,
	  src/components/sde/sde_lib/sde_common.c,
	  src/components/sde/sde_lib/sde_common.h,
	  src/components/sde/sde_lib/sde_lib.c,
	  src/components/sde/sde_lib/weak_symbols.c,
	  .../sde/tests/Advanced_C+FORTRAN/sde_test_f08.F90,
	  .../tests/Created_Counter/Created_Counter_Driver.c,
	  .../Created_Counter/Lib_With_Created_Counter.c,
	  .../sde/tests/Created_Counter/Overflow_Driver.c,
	  src/components/sde/tests/Makefile,
	  src/components/sde/tests/Minimal/Minimal_Test.c,
	  src/components/sde/tests/README.txt,
	  .../sde/tests/Recorder/Lib_With_Recorder.c,
	  .../sde/tests/Recorder/Recorder_Driver.c,
	  src/components/sde/tests/Simple/Simple_Driver.c,
	  src/components/sde/tests/Simple2/Simple2_Driver.c,
	  src/components/sde/tests/Simple2/Simple2_Lib.c,
	  .../sde/tests/Simple2/Simple2_NoPAPI_Driver.c, src/run_tests.sh,
	  src/run_tests_exclude.txt, src/utils/Makefile,
	  src/utils/papi_native_avail.c, src/utils/papi_sde_interface.c: +
	  Major restructuring of the libsde code, so that it can be used more
	  easily by external projects. + Changes to the tests so they conform
	  to the rest of PAPI's testing infrastructure.

2020-05-30  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/README.md, src/components/sde/Rules.sde,
	  src/components/sde/sde.c, src/components/sde/sde_common.h: Fixed a
	  problem occuring in non-debug builds and updated the README file.

2020-05-29  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/sde.c, src/components/sde/sde_common.c,
	  src/components/sde/sde_common.h, src/components/sde/sde_internal.h,
	  src/components/sde/sde_lib.c: Better header organization.
	* src/components/sde/tests/Makefile,
	  .../sde/tests/Simple2/Simple2_NoPAPI_Driver.c: New test with no
	  libpapi.so linkage was added.
	* src/components/sde/sde.c, src/components/sde/sde_internal.h,
	  src/components/sde/sde_lib.c: More complete support for
	  overflowing. Now, case r5 is supported.
	* src/components/sde/sde.c, src/components/sde/sde_common.h,
	  src/components/sde/sde_internal.h, src/components/sde/sde_lib.c:
	  Support for overflow for the case of created counters as well as
	  r[1-4].

2020-05-28  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/Rules.sde, src/components/sde/sde.c,
	  src/components/sde/sde_common.c, src/components/sde/sde_common.h,
	  src/components/sde/sde_internal.h, src/components/sde/sde_lib.c,
	  src/components/sde/tests/Makefile,
	  src/components/sde/tests/Simple2/Simple2_Driver.c: Pushing the
	  library interface of SDEs into a stand-alone library (libsde.so).

2021-01-26  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/components/mx/linux-mx.c: Add string length check before
	  strncpy() and strcat() calls in _mx_init_component()  Myrinet
	  Express-related component MX modules are initialized with the
	  _mx_init_component() function, which is called from the
	  PAPI_library_init() function. The popen(3) call runs a loadable
	  module called "mx_counters", and if the loadable module does not
	  exist, it attempts to run a loadable module called
	  "./components/mx/utils/fake_mx_counters". In an environment where
	  there are no "mx_counters" and
	  "./components/mx/utils/fake_mx_counters" loadable modules, popen(3)
	  will be called twice uselessly. popen(3) internally calls pipe(2)
	  once, fork(2) twice and exec(2) once.  The size of the user space
	  of the application calling the PAPI_library_init() function affects
	  the performance of fork(2), which is called as an extension of
	  popen(3). As a result, the performance of the PAPI_library_init()
	  function is affected by the amount of user space in the application
	  that called the PAPI_library_init() function.  In the
	  _mx_init_component() function, the MX module only needs to be able
	  to verify that a load module named "mx_counters" exists. We
	  improved the _mx_init_component() function to call fopen(3) instead
	  of popen(3). We add string length check before strncpy() and
	  strcat() calls in _mx_init_component() function.

2021-01-21  William Cohen <wcohen@redhat.com>

	* src/Rules.pfm4_pe: Only check for libpfm.a if static libraries are
	  being used.  Even when static libraries are not be used papi was
	  checking for libpfm.a, this would cause a failure if libpfm.a
	  wasn't installed. Exclude checking for libpfm.a if no static libpfm
	  library is needed.

2021-01-22  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/scripts/papi_hl_output_writer.py: Improved
	  performance report script.

2021-01-18  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/papi_hl.c, src/high-
	  level/scripts/papi_hl_output_writer.py: Fixed real time
	  measurement.

2021-01-07  Damien Genet <dgenet@icl.utk.edu>

	* src/Rules.perfnec, src/components/perfnec/Rules.perfnec,
	  src/components/perfnec/perfmon.c, src/components/perfnec/perfnec.h,
	  src/configure, src/configure.in, src/libperfnec/COPYRIGHT,
	  src/libperfnec/ChangeLog, src/libperfnec/Makefile,
	  src/libperfnec/README, src/libperfnec/TODO,
	  src/libperfnec/config.mk, src/libperfnec/docs/Makefile,
	  src/libperfnec/docs/man3/libpfm.3,
	  src/libperfnec/docs/man3/libpfm_amd64.3,
	  src/libperfnec/docs/man3/libpfm_atom.3,
	  src/libperfnec/docs/man3/libpfm_core.3,
	  src/libperfnec/docs/man3/libpfm_itanium.3,
	  src/libperfnec/docs/man3/libpfm_itanium2.3,
	  src/libperfnec/docs/man3/libpfm_montecito.3,
	  src/libperfnec/docs/man3/libpfm_nehalem.3,
	  src/libperfnec/docs/man3/libpfm_p6.3,
	  src/libperfnec/docs/man3/libpfm_powerpc.3,
	  src/libperfnec/docs/man3/libpfm_westmere.3,
	  src/libperfnec/docs/man3/pfm_dispatch_events.3,
	  src/libperfnec/docs/man3/pfm_find_event.3,
	  src/libperfnec/docs/man3/pfm_find_event_bycode.3,
	  .../docs/man3/pfm_find_event_bycode_next.3,
	  src/libperfnec/docs/man3/pfm_find_event_mask.3,
	  src/libperfnec/docs/man3/pfm_find_full_event.3,
	  src/libperfnec/docs/man3/pfm_force_pmu.3,
	  src/libperfnec/docs/man3/pfm_get_cycle_event.3,
	  src/libperfnec/docs/man3/pfm_get_event_code.3,
	  .../docs/man3/pfm_get_event_code_counter.3,
	  src/libperfnec/docs/man3/pfm_get_event_counters.3,
	  .../docs/man3/pfm_get_event_description.3,
	  src/libperfnec/docs/man3/pfm_get_event_mask_code.3,
	  .../docs/man3/pfm_get_event_mask_description.3,
	  src/libperfnec/docs/man3/pfm_get_event_mask_name.3,
	  src/libperfnec/docs/man3/pfm_get_event_name.3,
	  src/libperfnec/docs/man3/pfm_get_full_event_name.3,
	  .../docs/man3/pfm_get_hw_counter_width.3,
	  src/libperfnec/docs/man3/pfm_get_impl_counters.3,
	  src/libperfnec/docs/man3/pfm_get_impl_pmcs.3,
	  src/libperfnec/docs/man3/pfm_get_impl_pmds.3,
	  src/libperfnec/docs/man3/pfm_get_inst_retired.3,
	  .../docs/man3/pfm_get_max_event_name_len.3,
	  src/libperfnec/docs/man3/pfm_get_num_counters.3,
	  src/libperfnec/docs/man3/pfm_get_num_events.3,
	  src/libperfnec/docs/man3/pfm_get_num_pmcs.3,
	  src/libperfnec/docs/man3/pfm_get_num_pmds.3,
	  src/libperfnec/docs/man3/pfm_get_pmu_name.3,
	  src/libperfnec/docs/man3/pfm_get_pmu_name_bytype.3,
	  src/libperfnec/docs/man3/pfm_get_pmu_type.3,
	  src/libperfnec/docs/man3/pfm_get_version.3,
	  src/libperfnec/docs/man3/pfm_initialize.3,
	  src/libperfnec/docs/man3/pfm_list_supported_pmus.3,
	  src/libperfnec/docs/man3/pfm_pmu_is_supported.3,
	  src/libperfnec/docs/man3/pfm_regmask_and.3,
	  src/libperfnec/docs/man3/pfm_regmask_clr.3,
	  src/libperfnec/docs/man3/pfm_regmask_copy.3,
	  src/libperfnec/docs/man3/pfm_regmask_eq.3,
	  src/libperfnec/docs/man3/pfm_regmask_isset.3,
	  src/libperfnec/docs/man3/pfm_regmask_or.3,
	  src/libperfnec/docs/man3/pfm_regmask_set.3,
	  src/libperfnec/docs/man3/pfm_regmask_weight.3,
	  src/libperfnec/docs/man3/pfm_set_options.3,
	  src/libperfnec/docs/man3/pfm_strerror.3,
	  src/libperfnec/include/Makefile,
	  src/libperfnec/include/perfmon/perfmon.h,
	  src/libperfnec/include/perfmon/perfmon_compat.h,
	  src/libperfnec/include/perfmon/perfmon_crayx2.h,
	  .../include/perfmon/perfmon_default_smpl.h,
	  src/libperfnec/include/perfmon/perfmon_dfl_smpl.h,
	  src/libperfnec/include/perfmon/perfmon_i386.h,
	  src/libperfnec/include/perfmon/perfmon_ia64.h,
	  src/libperfnec/include/perfmon/perfmon_mips64.h,
	  src/libperfnec/include/perfmon/perfmon_nec.h,
	  .../include/perfmon/perfmon_pebs_core_smpl.h,
	  .../include/perfmon/perfmon_pebs_p4_smpl.h,
	  src/libperfnec/include/perfmon/perfmon_pebs_smpl.h,
	  src/libperfnec/include/perfmon/perfmon_powerpc.h,
	  src/libperfnec/include/perfmon/perfmon_sparc.h,
	  src/libperfnec/include/perfmon/perfmon_v2.h,
	  src/libperfnec/include/perfmon/perfmon_x86_64.h,
	  src/libperfnec/include/perfmon/pfmlib.h,
	  src/libperfnec/include/perfmon/pfmlib_amd64.h,
	  src/libperfnec/include/perfmon/pfmlib_cell.h,
	  src/libperfnec/include/perfmon/pfmlib_comp.h,
	  .../include/perfmon/pfmlib_comp_crayx2.h,
	  src/libperfnec/include/perfmon/pfmlib_comp_i386.h,
	  src/libperfnec/include/perfmon/pfmlib_comp_ia64.h,
	  .../include/perfmon/pfmlib_comp_mips64.h,
	  .../include/perfmon/pfmlib_comp_powerpc.h,
	  src/libperfnec/include/perfmon/pfmlib_comp_sparc.h,
	  .../include/perfmon/pfmlib_comp_x86_64.h,
	  src/libperfnec/include/perfmon/pfmlib_core.h,
	  src/libperfnec/include/perfmon/pfmlib_coreduo.h,
	  src/libperfnec/include/perfmon/pfmlib_crayx2.h,
	  src/libperfnec/include/perfmon/pfmlib_gen_ia32.h,
	  src/libperfnec/include/perfmon/pfmlib_gen_ia64.h,
	  src/libperfnec/include/perfmon/pfmlib_gen_mips64.h,
	  src/libperfnec/include/perfmon/pfmlib_i386_p6.h,
	  src/libperfnec/include/perfmon/pfmlib_intel_atom.h,
	  src/libperfnec/include/perfmon/pfmlib_intel_nhm.h,
	  src/libperfnec/include/perfmon/pfmlib_itanium.h,
	  src/libperfnec/include/perfmon/pfmlib_itanium2.h,
	  src/libperfnec/include/perfmon/pfmlib_montecito.h,
	  src/libperfnec/include/perfmon/pfmlib_os.h,
	  src/libperfnec/include/perfmon/pfmlib_os_crayx2.h,
	  src/libperfnec/include/perfmon/pfmlib_os_i386.h,
	  src/libperfnec/include/perfmon/pfmlib_os_ia64.h,
	  src/libperfnec/include/perfmon/pfmlib_os_mips64.h,
	  src/libperfnec/include/perfmon/pfmlib_os_powerpc.h,
	  src/libperfnec/include/perfmon/pfmlib_os_sparc.h,
	  src/libperfnec/include/perfmon/pfmlib_os_x86_64.h,
	  src/libperfnec/include/perfmon/pfmlib_pentium4.h,
	  src/libperfnec/include/perfmon/pfmlib_powerpc.h,
	  src/libperfnec/include/perfmon/pfmlib_sicortex.h,
	  src/libperfnec/include/perfmon/pfmlib_sparc.h,
	  src/libperfnec/lib/Makefile, src/libperfnec/lib/amd64_events.h,
	  src/libperfnec/lib/amd64_events_fam10h.h,
	  src/libperfnec/lib/amd64_events_fam15h.h,
	  src/libperfnec/lib/amd64_events_k7.h,
	  src/libperfnec/lib/amd64_events_k8.h,
	  src/libperfnec/lib/cell_events.h, src/libperfnec/lib/core_events.h,
	  src/libperfnec/lib/coreduo_events.h,
	  src/libperfnec/lib/crayx2_events.h,
	  src/libperfnec/lib/gen_ia32_events.h,
	  src/libperfnec/lib/gen_mips64_events.h,
	  src/libperfnec/lib/i386_p6_events.h,
	  src/libperfnec/lib/intel_atom_events.h,
	  src/libperfnec/lib/intel_corei7_events.h,
	  src/libperfnec/lib/intel_corei7_unc_events.h,
	  src/libperfnec/lib/intel_wsm_events.h,
	  src/libperfnec/lib/intel_wsm_unc_events.h,
	  src/libperfnec/lib/itanium2_events.h,
	  src/libperfnec/lib/itanium_events.h, src/libperfnec/lib/libpfm.a,
	  src/libperfnec/lib/montecito_events.h,
	  src/libperfnec/lib/niagara1_events.h,
	  src/libperfnec/lib/niagara2_events.h,
	  src/libperfnec/lib/pentium4_events.h,
	  src/libperfnec/lib/pfmlib_amd64.c,
	  src/libperfnec/lib/pfmlib_amd64_priv.h,
	  src/libperfnec/lib/pfmlib_cell.c,
	  src/libperfnec/lib/pfmlib_cell_priv.h,
	  src/libperfnec/lib/pfmlib_common.c,
	  src/libperfnec/lib/pfmlib_core.c,
	  src/libperfnec/lib/pfmlib_core_priv.h,
	  src/libperfnec/lib/pfmlib_coreduo.c,
	  src/libperfnec/lib/pfmlib_coreduo_priv.h,
	  src/libperfnec/lib/pfmlib_crayx2.c,
	  src/libperfnec/lib/pfmlib_crayx2_priv.h,
	  src/libperfnec/lib/pfmlib_gen_ia32.c,
	  src/libperfnec/lib/pfmlib_gen_ia32_priv.h,
	  src/libperfnec/lib/pfmlib_gen_ia64.c,
	  src/libperfnec/lib/pfmlib_gen_mips64.c,
	  src/libperfnec/lib/pfmlib_gen_mips64_priv.h,
	  src/libperfnec/lib/pfmlib_gen_powerpc.c,
	  src/libperfnec/lib/pfmlib_i386_p6.c,
	  src/libperfnec/lib/pfmlib_i386_p6_priv.h,
	  src/libperfnec/lib/pfmlib_intel_atom.c,
	  src/libperfnec/lib/pfmlib_intel_atom_priv.h,
	  src/libperfnec/lib/pfmlib_intel_nhm.c,
	  src/libperfnec/lib/pfmlib_intel_nhm_priv.h,
	  src/libperfnec/lib/pfmlib_itanium.c,
	  src/libperfnec/lib/pfmlib_itanium2.c,
	  src/libperfnec/lib/pfmlib_itanium2_priv.h,
	  src/libperfnec/lib/pfmlib_itanium_priv.h,
	  src/libperfnec/lib/pfmlib_montecito.c,
	  src/libperfnec/lib/pfmlib_montecito_priv.h,
	  src/libperfnec/lib/pfmlib_os_linux.c,
	  src/libperfnec/lib/pfmlib_os_linux_v2.c,
	  src/libperfnec/lib/pfmlib_os_linux_v3.c,
	  src/libperfnec/lib/pfmlib_os_macos.c,
	  src/libperfnec/lib/pfmlib_pentium4.c,
	  src/libperfnec/lib/pfmlib_pentium4_priv.h,
	  src/libperfnec/lib/pfmlib_power4_priv.h,
	  src/libperfnec/lib/pfmlib_power5+_priv.h,
	  src/libperfnec/lib/pfmlib_power5_priv.h,
	  src/libperfnec/lib/pfmlib_power6_priv.h,
	  src/libperfnec/lib/pfmlib_power7_priv.h,
	  src/libperfnec/lib/pfmlib_power_priv.h,
	  src/libperfnec/lib/pfmlib_powerpc_priv.h,
	  src/libperfnec/lib/pfmlib_ppc970_priv.h,
	  src/libperfnec/lib/pfmlib_ppc970mp_priv.h,
	  src/libperfnec/lib/pfmlib_priv.c, src/libperfnec/lib/pfmlib_priv.h,
	  src/libperfnec/lib/pfmlib_priv_comp.h,
	  src/libperfnec/lib/pfmlib_priv_comp_ia64.h,
	  src/libperfnec/lib/pfmlib_priv_ia64.h,
	  src/libperfnec/lib/pfmlib_sicortex.c,
	  src/libperfnec/lib/pfmlib_sicortex_priv.h,
	  src/libperfnec/lib/pfmlib_sparc.c,
	  src/libperfnec/lib/pfmlib_sparc_priv.h,
	  src/libperfnec/lib/power4_events.h,
	  src/libperfnec/lib/power5+_events.h,
	  src/libperfnec/lib/power5_events.h,
	  src/libperfnec/lib/power6_events.h,
	  src/libperfnec/lib/power7_events.h,
	  src/libperfnec/lib/powerpc_events.h,
	  src/libperfnec/lib/powerpc_reg.h,
	  src/libperfnec/lib/ppc970_events.h,
	  src/libperfnec/lib/ppc970mp_events.h,
	  src/libperfnec/lib/ultra12_events.h,
	  src/libperfnec/lib/ultra3_events.h,
	  src/libperfnec/lib/ultra3i_events.h,
	  src/libperfnec/lib/ultra3plus_events.h,
	  src/libperfnec/lib/ultra4plus_events.h,
	  src/libperfnec/libpfms/Makefile,
	  src/libperfnec/libpfms/include/libpfms.h,
	  src/libperfnec/libpfms/lib/Makefile,
	  src/libperfnec/libpfms/lib/libpfms.c,
	  src/libperfnec/libpfms/syst_smp.c, src/libperfnec/python/Makefile,
	  src/libperfnec/python/README, src/libperfnec/python/self.py,
	  src/libperfnec/python/setup.py,
	  src/libperfnec/python/src/__init__.py,
	  src/libperfnec/python/src/perfmon_int.i,
	  src/libperfnec/python/src/pmu.py,
	  src/libperfnec/python/src/session.py, src/libperfnec/python/sys.py,
	  src/libperfnec/rules.mk, src/linux-common.h, src/linux-context.h,
	  src/linux-lock.h, src/linux-timer.c, src/mb.h,
	  src/utils/papi_native_avail.c: Merged in feature/pr_nec (pull
	  request #157)  * Adding lib and component

2021-01-05  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/components/perf_event/pe_libpfm4_events.c: Get model_string for
	  ARM processor from pfm_get_pmu_info() function  On ARM processors,
	  the model_string does not appear in /proc/cpuinfo. Instead of
	  looking at the /proc/cpuinfo information, you can look at the lscpu
	  command information at the following URL:.
	  https://github.com/google/cpu_features/issues/26
	  http://suihkulokki.blogspot.com/2018/02/making-sense-of-
	  proccpuinfo-on-arm.html  The libpfm4 library identifies the ARM
	  processor type from the "CPU implement" and "CPU part" in the
	  /proc/cpuinfo information. The papi library can use the
	  pfm_get_pmu_info() function from the libpfm4 library to obtain a
	  string identifying the ARM processor type.

2020-12-23  Peinan Zhang <peinan.zhang@intel.com>

	* .../intel_gpu/internal/inc/GPUMetricHandler.h,
	  .../intel_gpu/internal/src/GPUMetricHandler.cpp,
	  .../intel_gpu/internal/src/GPUMetricInterface.cpp: Changed query
	  based data read with timeout rather than blocked till data
	  available.

2020-12-18  Peinan Zhang <peinan.zhang@intel.com>

	* src/components/intel_gpu/README,
	  src/components/intel_gpu/Rules.intel_gpu,
	  .../intel_gpu/internal/inc/GPUMetricHandler.h,
	  .../intel_gpu/internal/inc/GPUMetricInterface.h,
	  .../intel_gpu/internal/src/GPUMetricHandler.cpp,
	  .../intel_gpu/internal/src/GPUMetricInterface.cpp,
	  src/components/intel_gpu/internal/src/Makefile,
	  src/components/intel_gpu/linux_intel_gpu_metrics.c,
	  src/components/intel_gpu/linux_intel_gpu_metrics.h,
	  src/components/intel_gpu/tests/Makefile,
	  src/components/intel_gpu/tests/gemm.spv,
	  src/components/intel_gpu/tests/gpu_metric_list.c,
	  src/components/intel_gpu/tests/gpu_metric_read.c,
	  src/components/intel_gpu/tests/gpu_query_gemm.cc,
	  src/components/intel_gpu/tests/gpu_thread_read.c,
	  src/components/intel_gpu/tests/readme.txt: Add intel_gpu component
	  to collect Intel GPU performance metrics

2020-12-17  Heike Jagode <jagode@icl.utk.edu>

	* src/components/perf_event/perf_event.c: Deleting Tony's hard
	  failure for check_exclude_guest() if perf_event_open fails. There
	  shouldn't be a failure since exclude_guest_unsupported is already
	  set before perf_event_open is called. At this point, if
	  perf_event_open fails it should just return but not result in a
	  hard failure.  Hence, going back to the previous version of
	  check_exclude_guest(). And since the return value of this function
	  is not checked, we change it to void (instead of int).

2020-12-15  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/papi_events.csv: modify PAPI_FP_INS and PAPI_VEC_INS for A64FX
	  supports

2020-12-14  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/papi_events.csv: Add or modify various A64FX support events,
	  including floating point events (PAPI_FP_OPS, PAPI_SP_OPS,
	  PAPI_DP_OPS).
	* src/papi_events.csv: Corrected typo for A64FX support (PAPI_L2_DCH
	  is a typo of PAPI_L2_DCA)

Wed Dec 9 19:48:23 2020 -0800  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/lib/pfmlib_intel_snbep_unc_perf_event.c,
	  src/libpfm4/lib/pfmlib_perf_event_pmu.c: Update libpfm4, to be
	  current with the following commit:
	  --------------------------------------------------------------
	  commit c96ebc0d19c6167b45e1694ea38719f230da254e  fix typos in
	  comments related to PERF_ATTR_HWS  AMD was mentioned in non-AMD
	  related files.  Reported-by:  Steve Kaufmann
	  <steven.kaufmann@hpe.com>

Thu Nov 12 17:46:47 2020 -0800  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/lib/events/intel_icl_events.h,
	  src/libpfm4/lib/pfmlib_amd64.c,
	  src/libpfm4/lib/pfmlib_amd64_perf_event.c,
	  src/libpfm4/lib/pfmlib_arm.c,
	  src/libpfm4/lib/pfmlib_arm_perf_event.c,
	  src/libpfm4/lib/pfmlib_intel_netburst.c,
	  src/libpfm4/lib/pfmlib_intel_netburst_perf_event.c,
	  src/libpfm4/lib/pfmlib_intel_snbep_unc.c,
	  .../lib/pfmlib_intel_snbep_unc_perf_event.c,
	  src/libpfm4/lib/pfmlib_mips.c,
	  src/libpfm4/lib/pfmlib_mips_perf_event.c,
	  src/libpfm4/lib/pfmlib_perf_event_pmu.c,
	  src/libpfm4/lib/pfmlib_perf_event_raw.c,
	  src/libpfm4/lib/pfmlib_powerpc.c,
	  src/libpfm4/lib/pfmlib_powerpc_perf_event.c,
	  src/libpfm4/lib/pfmlib_s390x_cpumf.c,
	  src/libpfm4/lib/pfmlib_s390x_perf_event.c,
	  src/libpfm4/lib/pfmlib_s390x_priv.h,
	  src/libpfm4/lib/pfmlib_sparc.c,
	  src/libpfm4/lib/pfmlib_sparc_perf_event.c,
	  src/libpfm4/tests/validate_x86.c: Update libpfm4, to be current
	  with the following commit:
	  --------------------------------------------------------------
	  commit fb6ddf78949eb1bc6921df5cfd0cf3e5ef2e752e  fix Intel Icelake
	  encodings for CPU_CLK_UNHALTED.*_DISTRIBUTED  The event code for
	  CPU_CLK_UNHALTED was wrong and the umasks DISTRIBUTED and
	  REF_DISTRIBUTED were wrong. For these, the event code is actually
	  0xec, so add the code override tag.   commit
	  6f687e42c62bc71766c5369d218cea9ca2e246cf  fix support of
	  PERF_ATTR_HWS  Remove the attribute for all PMU which do not
	  support it which is the majority. Without the patch, you would see
	  [hw_smpl] on Intel uncore PMUs, AMD64 Fam17h PMU, and much more.
	  The patch also fixes a few place where info->is_precise was not
	  cleared.   commit 02ab45abc160d1be754917524c40e268c490937d  Fix
	  MEM_TRANS_RETIRED for Intel Icelake  The umasks generated for Intel
	  Icelake MEM_TRANS_RETIRED where not setting ldlat properly. To use
	  the Load Latency feature with libpfm4, the ldlat= modifier must be
	  used either implicitly or explicitly. It cannot be encoded in the
	  umask code for now.

2020-11-30  William Cohen <wcohen@redhat.com>

	* src/components/appio/README.md: Remove mention of the removed
	  iozone test in the appio README.md.
	* src/components/appio/tests/Makefile,
	  src/components/appio/tests/iozone/Changes.txt,
	  src/components/appio/tests/iozone/Generate_Graphs,
	  src/components/appio/tests/iozone/Gnuplot.txt,
	  src/components/appio/tests/iozone/client_list,
	  src/components/appio/tests/iozone/fileop.c,
	  src/components/appio/tests/iozone/gengnuplot.sh,
	  src/components/appio/tests/iozone/gnu3d.dem,
	  src/components/appio/tests/iozone/gnuplot.dem,
	  src/components/appio/tests/iozone/gnuplotps.dem,
	  src/components/appio/tests/iozone/iozone.c,
	  .../appio/tests/iozone/iozone_visualizer.pl,
	  src/components/appio/tests/iozone/libasync.c,
	  src/components/appio/tests/iozone/libbif.c,
	  src/components/appio/tests/iozone/makefile,
	  src/components/appio/tests/iozone/pit_server.c,
	  src/components/appio/tests/iozone/read_telemetry,
	  src/components/appio/tests/iozone/report.pl,
	  src/components/appio/tests/iozone/spec.in,
	  src/components/appio/tests/iozone/write_telemetry: Remove bundled
	  iozone due to incompatible license.  A review of the PAPI sources
	  found some iozone code bundled in papi (rhbz1901077 - papi bundles
	  non-free iozone code ).  The upstream license for iozone does not
	  give permission to modify the source. There are some minor changes
	  in the PAPI version of the iozone files.

2020-11-25  Masahiko, Yamada <yamada.masahiko@fujitsu.com>

	* src/components/mx/linux-mx.c: fix for performance improvement of
	  _mx_init_component() function

2020-11-18  Gerald Ragghianti <ragghianti@icl.utk.edu>

	* src/components/rocm/README.md: Typo in library file name

2020-11-17  Damien Genet <dgenet@icl.utk.edu>

	* src/components/infiniband/linux-infiniband.c: Fix: location is not
	  stored, so mark that location is broken

2020-11-17  Anthony Castaldo <tonycastaldo@tellico-master0.local>

	* src/components/cuda/tests/nvlink_all.cu: Removed debug messages.
	* src/components/cuda/tests/nvlink_all.cu: Removing debug messages.
	* src/components/cuda/tests/nvlink_all.cu,
	  src/components/cuda/tests/nvlink_bandwidth.cu: Improved argument
	  handling in nvlink_all.cu and nvlink_bandwidth.cu.

2020-11-06  Anthony Michael Castaldo <coe0234@tulip.cm.cluster>

	* src/components/rocm/linux-rocm.c,
	  src/components/rocm/tests/ROCM_Makefile: Changed ROCM_Makefile to
	  require PAPI_ROCM_ROOT; and to cross-compile for all "Instinct"
	  GPUs.  Code in component to set needed environment variables if not
	  defined, or ensure definition meets expectations.

2020-11-05  Anthony Michael Castaldo <coe0234@tulip.cm.cluster>

	* src/components/rocm/README.md, src/components/rocm/linux-rocm.c,
	  src/components/rocm/tests/ROCM_Makefile: First draft of changes to
	  automatically find and set environment variables automatically.

2020-11-03  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/main.c: Added checks for the return
	  values of calls to malloc(), calloc(), and realloc(). This way, the
	  user will know if there are issues with allocating memory.  These
	  changes were tested on the IBM POWER9 architecture.

2020-11-02  Daniel Barry <dbarry@vols.utk.edu>

	* src/counter_analysis_toolkit/main.c: Increase buffer size for
	  larger input files to CAT. Since the number of qualifier counts is
	  equal to the number of event names in the input file, the size of
	  the buffer containing the qualifier counts should be equal to the
	  size of the buffer containing the event names. This change is
	  necessary to accommodate large input files. This change was tested
	  on the IBM POWER9 architecture.

2020-10-27  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Added explicit Compute Capability
	  retrieval and checking to disable component if CC>=7.5 and cannot
	  work with Legacy CUPTI.  Added a filter to exclude multipass
	  metrics. Provided a timing for that conditioned on #define
	  TIME_MULTIPASS_ELIM to measure how long that takes. On Saturn A04
	  V100 device, 95-98ms additional time in init_component(). On
	  Summit, 1-6 GPUs, about 73ms extra time per GPU in
	  init_component(); from 73.5 ms (1 GPU) to 437.3 ms (6 GPU).

2020-10-26  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Added explicit Compute Capability
	  retrieval and checking; also added a filter to exclude multipass
	  metrics; as well as timing (if a #define is made) of how long that
	  takes. On 1 V100 device, 95-98ms additional time in
	  init_component().

2020-10-23  Björn Dick <dick@hlrs.de>

	* src/components/sde/tests/Makefile: adapted setting FFLAGS in
	  src/components/sde/tests/Makefile in order to make it work with
	  flang (otherwise crashes due to unknown flag '-free')

2020-10-22  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Just the compute capability
	  check.

2020-10-20  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c,
	  .../cuda/tests/BlackScholes/BlackScholes.cu,
	  .../cuda/tests/BlackScholes/BlackScholes_gold.cpp,
	  .../tests/BlackScholes/BlackScholes_kernel.cuh,
	  src/components/cuda/tests/BlackScholes/Makefile,
	  .../cuda/tests/BlackScholes/NsightEclipse.xml,
	  .../cuda/tests/BlackScholes/README_SETUP.txt,
	  src/components/cuda/tests/BlackScholes/readme.txt,
	  .../cuda/tests/BlackScholes/testAllEvents.sh,
	  .../cuda/tests/BlackScholes/testSomeEvents.sh,
	  .../cuda/tests/BlackScholes/thr_BlackScholes.cu: Thread Safety is
	  added to the cuda component; protecting functions with PAPI locks:
	  _papi_hwi_lock(COMPONENT_LOCK) and
	  _papi_hwi_unlock(COMPONENT_LOCK).  The BlackScholes directory is
	  added; a slightly modified version of an Nvidia sample program,
	  used to exercise a great deal of computation.  This includes new
	  code, in particular thr_BlackScholes.cu uses phtreads and executes
	  the kernel from several threads, using PAPI_read() in each, to test
	  the above thread safety. Also tested with helgrind (tool within
	  valgrind to test for threading synchronization issues).

2020-10-19  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/net/linux-net.c, src/components/nvml/linux-nvml.c,
	  src/components/pcp/linux-pcp.c,
	  src/components/perf_event/perf_event.c,
	  .../perf_event_uncore/perf_event_uncore.c, src/components/powercap
	  /linux-powercap.c, src/components/powercap_ppc/linux-powercap-
	  ppc.c, src/components/rapl/linux-rapl.c, src/components/rocm/linux-
	  rocm.c, src/components/rocm_smi/linux-rocm-smi.c,
	  src/components/sensors_ppc/linux-sensors-ppc.c: Changes to
	  init_component() to properly set the component vector element
	  disabled_reason() if init fails. Also changes to eliminate compiler
	  warnings for failing to process return codes from string functions
	  (strcpy and snprintf and variants), and to check for alloc()
	  failures; but only in the init_component() functions and any
	  functions it invokes.  Testing was on Saturn A04 by default; ICL
	  Caffeine for (rocm, rocm_smi, powercap), Summit for PCP, Tellico
	  (IBM processor) for sensors_ppc and powercap_ppc.

Wed Sep 2 11:40:42 2020 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/config.mk, src/libpfm4/debian/changelog,
	  src/libpfm4/docs/Makefile,
	  src/libpfm4/lib/events/amd64_events_fam16h.h,
	  src/libpfm4/lib/events/amd64_events_fam17h_zen2.h,
	  src/libpfm4/lib/events/intel_bdx_unc_cbo_events.h,
	  src/libpfm4/lib/events/intel_bdx_unc_ha_events.h,
	  src/libpfm4/lib/events/intel_bdx_unc_imc_events.h,
	  src/libpfm4/lib/events/intel_bdx_unc_pcu_events.h,
	  src/libpfm4/lib/events/intel_bdx_unc_qpi_events.h,
	  .../lib/events/intel_bdx_unc_r3qpi_events.h,
	  src/libpfm4/lib/events/intel_icl_events.h,
	  src/libpfm4/lib/events/intel_knl_unc_cha_events.h,
	  src/libpfm4/lib/events/intel_skx_unc_cha_events.h,
	  src/libpfm4/lib/events/intel_skx_unc_imc_events.h,
	  .../lib/events/intel_skx_unc_m3upi_events.h,
	  src/libpfm4/lib/events/intel_skx_unc_pcu_events.h,
	  src/libpfm4/lib/events/intel_skx_unc_upi_events.h,
	  src/libpfm4/lib/events/mips_74k_events.h,
	  src/libpfm4/lib/events/power4_events.h,
	  src/libpfm4/lib/events/power5+_events.h,
	  src/libpfm4/lib/events/power5_events.h,
	  src/libpfm4/lib/events/power6_events.h,
	  src/libpfm4/lib/events/power7_events.h,
	  src/libpfm4/lib/events/power8_events.h,
	  src/libpfm4/lib/events/power9_events.h,
	  src/libpfm4/lib/events/ppc970_events.h,
	  src/libpfm4/lib/events/ppc970mp_events.h,
	  src/libpfm4/lib/events/s390x_cpumf_events.h,
	  src/libpfm4/lib/pfmlib_itanium2.c, src/libpfm4/lib/pfmlib_mips.c,
	  src/libpfm4/lib/pfmlib_montecito.c: Update libpfm4, to be current
	  with the following commits:
	  --------------------------------------------------------------
	  commit fa84c27b60572621a8e48e364de9f55bdff5237e  fix incorrect
	  strncpy() usage  gcc 9 failed on mips* with: /usr/include/mips64el-
	  linux-gnuabi64/bits/string_fortified.h:106:10: error:
	  ‘__builtin___strncpy_chk’ output truncated before terminating
	  nul copying as many bytes from a string as its length [-Werror
	  =stringop-truncation] pfmlib_mips.c: In function
	  ‘pfm_mips_detect’: pfmlib_mips.c:147:2: note: length computed
	  here 147 |  strncpy(pfm_mips_cfg.model,buffer,strlen(buffer));
	  strncpy(dest, src, strlen(src)) does *not* copy the terminating
	  '\0' strncpy(dest, src, strlen(src)+1) is identical to strcpy(dest,
	  src) but the third argument to strncpy() should rather be based on
	  the size of 'dest', not 'src'   commit
	  c3e97e0c9510f047623f6548cdef188eed0038cd  fix typos and normalize
	  spacing  most typos were found by Lintian   commit
	  de4beb0da7530bc1dcd2f19582dfeca2ecb1d185  update AMD Fam17h Zen2
	  event table  Based on PPR version 0.91 Sep1, 2020.  Thanks to
	  Emmanuel for tracking the diffs.   commit
	  53797b096497dd278fa844c302ce93495b469754  update Intel Icelake
	  event table to 1.09  This patch updates the Icelake event table
	  based on the official JSON event file up to version 1.09.   commit
	  414e482ace00d334015341e032a8b325d80e92eb  update to version 4.11.1
	  Update to 4.11.1 revision to fix some minor issues with 11.0
	  release   commit dfe30a72c18dc64ea8e55c469a9adcfec9c09340  install
	  Fujitsu A64FX man page in ARM64 mode  This patch corrects the
	  documentation Makefile to install the libpfm_a64fx.3 man page when
	  bulding for ARM64. Otherwise the man page woul only be installed in
	  ARM (32-bit) mode.  Reported-by: William Cohen <wcohen@redhat.com>
	  commit 3a7dbd35cfde80923dca3d7a02386fde6d859f93  update to version
	  4.11.0  Update to 4.11.0 revision to prepare for release

2020-10-12  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/scripts/papi_hl_output_writer.py: Fixed bug in
	  summary mode 2. Each region can have a different number of ranks
	  and threads.
	* src/high-level/scripts/papi_hl_output_writer.py: Fixed bug in
	  summary mode. Starting from the second region all events were
	  ignored by a wrong indented break statement.

2020-10-09  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/scripts/papi_hl_output_writer.py: Fixed bug for IPC
	  metric.
	* src/high-level/scripts/papi_hl_output_writer.py: Revised
	  performance output.

2020-10-08  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/appio/appio.c, src/components/coretemp/linux-
	  coretemp.c, src/components/cuda/linux-cuda.c,
	  src/components/example/example.c, src/components/io/linux-io.c,
	  src/components/libmsr/linux-libmsr.c: Got rid of unnecessary
	  PAPI_MAX_STR_LEN-2, replaced with PAPI_MAX_STR_LEN.

2020-10-08  Sebastian Mobo <smobo@vols.utk.edu>

	* src/papi_events.csv: Added instruction-cache preset events for the
	  Zen2.

2020-10-08  Heike Jagode <jagode@icl.utk.edu>

	* src/papi_events.csv: For zen2, since FP_OPS counts both single- and
	  double-prec operations correctly, we don't need to confuse the user
	  with additional DP_OPS and SP_OPS events. So, I'm taking them out.
	  Same applies for events counting FP instructions.

2020-10-08  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/appio/appio.c, src/components/coretemp/linux-
	  coretemp.c: Brought appio and coretemp in line with other
	  components for standardization.
	* src/components/appio/appio.c, src/components/coretemp/linux-
	  coretemp.c, src/components/cuda/linux-cuda.c,
	  src/components/example/example.c, src/components/io/linux-io.c,
	  src/components/libmsr/linux-libmsr.c: In addition to
	  init_component() changes to ensure all possible failing return
	  paths set a disable_reason; added testing for string functions;
	  strncpy and snprintf, to avoid compiler warnings about
	  uninterpreted return values.

2020-10-08  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/scripts/papi_hl_output_writer.py: Added derived
	  events for summary report.

2020-10-06  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/appio/appio.c, src/components/coretemp/linux-
	  coretemp.c, src/components/cuda/linux-cuda.c,
	  src/components/example/example.c, src/components/io/linux-io.c,
	  src/components/libmsr/linux-libmsr.c,
	  src/components/libmsr/tests/ICL_TESTING_NOTES.txt,
	  src/components/libmsr/tests/Makefile,
	  src/components/libmsr/tests/libmsr_basic.c: Most changes here
	  ensure that any exit from the init_component() function that
	  disables the component will give a sensible reason for the disable
	  in the component vector string, which is reported by
	  papi_component_avail.  Additional changes were made to prevent
	  compiler warnings on various issues; such as unused values or
	  incompatible formatting.

2020-10-04  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/scripts/papi_hl_output_writer.py: Several
	  improvements.

2020-10-02  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/scripts/papi_hl_output_writer.py: Started with
	  summary output format.

2020-10-01  Damien Genet <dgenet@icl.utk.edu>

	* src/components/infiniband/linux-infiniband.c: Fixing the infiniband
	  component for the 3 counters that are misplaced in the filesystem
	  and thus wrongfully listed as 32 bits

2020-09-24  Heike Jagode <jagode@icl.utk.edu>

	* src/papi_events.csv: Added missing 'PRESET' to csv file.
	* src/papi_events.csv: Added presets for floating-point instructions
	  (FP_INS, VEC_DP, VEC_SP) for AMD zen2.  For unoptimized code (like
	  native MMM), these events may include non-numeric floating-point
	  instructions, e.g. MOVSD: move or merge scalar double-precision
	  floating-point value instructions.  Tested with: 1) SSE double:
	  _mm_mul_pd / _mm_add_pd 2) SSE single: _mm_mul_ps / _mm_add_ps 3)
	  AVX double: _mm256_mul_pd / _mm256_add_pd 4) AVX single:
	  _mm256_mul_ps / _mm256_add_ps 5) FMA double: _mm256_macc_pd 6) FMA
	  single: _mm256_macc_pd
	* src/papi_events.csv: Added presets for floating-point operations
	  (FP_OPS, DP_OPS, SP_OPS) for AMD zen2.  PPR (under section
	  2.1.15.3. -- https://www.amd.com/system/files/TechDocs/54945_3.03_p
	  pr_ZP_B2_pub.zip) explains that FLOP events require MergeEvent
	  support, which was included in the 5.6 kernel.  ===>>> Hence, a
	  kernel version 5.6 or greater is required.  NOTE: without the
	  MergeEvent support in the kernel, there is no guarantee that the
	  SSE/AVX FLOP events produce any useful data whatsoever.  These
	  events have been tested and verified for scalar flops, SSE, AVX,
	  and FMA:  (1) for one AVX instruction (e.g. _mm256_add_pd()), the
	  RETIRED_SSE_AVX_FLOPS:ADD_SUB_FLOPS event returns a count of 4 (in
	  the case of double precision), and a count of 8 (in the case of
	  single precision).  (2) for one AVX FMA instruction (e.g.
	  _mm256_macc_pd()), the RETIRED_SSE_AVX_FLOPS:MAC_FLOPS event
	  returns a count of 8 (in the case of double precision), and a count
	  of 16 (in the case of single precision).  (3) for one SSE
	  instruction (e.g. _mm_mul_pd()), the
	  RETIRED_SSE_AVX_FLOPS:MULT_FLOPS event returns a count of 2 (in the
	  case of double precision), and a count of 4 (in the case of single
	  precision).

2020-09-18  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/papi_hl.c: Added min, avg, and max for instantaneous
	  events.

2020-09-17  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/high-level/papi_hl.c: Fixed bug for empty strings in
	  PAPI_EVENTS.

2020-09-11  Frank Winkler <frankbook@m016.zih.tu-dresden.de>

	* src/high-level/papi_hl.c: Improved coding style.
	* src/high-level/papi_hl.c: Simplified event definitions.

2020-09-09  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/pcp/linux-pcp.c: Added __FILE__ and __LINE__ to the
	  error messages which are shown depending on the return value of
	  snprintf(). This way, the user will know from where the error
	  message originated.  These changes were tested on the IBM POWER9
	  architecture.

2020-09-09  Anthony Castaldo <tonycastaldo@tellico-master0.local>

	* src/components/cuda/linux-cuda.c: Uninitialized retcode variables
	  caused problems on Power 9.

2020-09-09  Frank Winkler <frankbook@m016.zih.tu-dresden.de>

	* src/high-level/papi_hl.c: Added event definitions to performance
	  report.

2020-09-08  Daniel Barry <dbarry@vols.utk.edu>

	* src/components/pcp/linux-pcp.c: Added if-statements to check
	  whether the number of characters intended to be written to the
	  destination buffer exceed the size of the buffer. This prevents GCC
	  9.1.0 from warning that the destination buffer may not be large
	  enough to store the contents of the source buffers.  These changes
	  were tested on the IBM POWER9 architecture.

2020-09-08  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/tests/ROCM_Makefile,
	  src/components/rocm/tests/rocm_standalone.cpp: Corrected a too-
	  specific reference in Makefile; and changed a #define from
	  CamelCase to all caps.

2020-08-28  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Capitalized #defines.

2020-08-27  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Moved structure typedef
	  definitions ahead of their use in the source file; and changed
	  references to structures within structures to use the typedef type.
	  Removed '#if 0' test code for the binary search function.

2020-08-26  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/papi.c, src/papi_internal.c, src/papi_internal.h: This modifies
	  PAPI_library_init() to initialize components in two classes,
	  separated by the initialization of the papi thread structure.  The
	  first class is those that need no thread structure, currently
	  everything but perf_event and perf_event_uncore. Following the init
	  of the threading structure, we init the second class (perf_event
	  and perf_event_uncore) that DOES need the thread structure to
	  successfully init_component().  This required a change to
	  _papi_hwi_init_global(), to add an argument to distinguish which
	  class it should initialize.

Thu Aug 13 01:51:28 2020 -0700  Ondrej Sykora <ondrasej@google.com>

	* src/libpfm4/lib/events/s390x_cpumf_events.h,
	  src/libpfm4/lib/pfmlib_common.c,
	  src/libpfm4/lib/pfmlib_s390x_perf_event.c,
	  src/libpfm4/perf_examples/Makefile: Update libpfm4, to be current
	  with the following commit:
	  --------------------------------------------------------------
	  commit 437628ebe58edd6cff3e493a7925f66e3a016b76  make rtop build
	  conditional on ncurses.h present  This avoids build problems on
	  systems where ncurses development package is not installed.
	  commit 2293ceb3ad9d2ed0c63f85fc07cc30b278ee4eda
	  lib/events/s390x_cpumf_events.h: Change counter name DFLT_CCERROR
	  on s390  Change the counter name DLFT_CCERROR to DLFT_CCFINISH on
	  IBM z15. This counter counts completed DEFLATE instructions with
	  exit code 0, 1 or 2. Since exit code 0 means success and exit code
	  1 or 2 indicate errors, change the counter name to avoid confusion.
	  This counter is incremented each time the DEFLATE instruction
	  completed regardless if an error was detected or not.  This change
	  is in sync with kernel commit 3d3af181d370 ("s390/cpum_cf,perf:
	  change DFLT_CCERROR counter name")   commit
	  30adc677603b28c6d9eb311de7298fa4fea26eed
	  lib/pfmlib_s390x_perf_event.c: Fix perf attr.type event number for
	  s390  The s390 Performance Measurement counter facility does not
	  have a fixed type number anymore. This was caused by linux kernel
	  commits commit 66d258c5b048 ("perf/core: Optimize
	  perf_init_event()") and its necessary follow on commit commit
	  6a82e23f45fe ("s390/cpumf: Adjust registration of s390 PMU device
	  drivers")  Now read out the current type number from a sysfs file
	  named /sys/devices/cpum_cf/type. If it does not exist there is no
	  CPU-MF counter facility installed or activated, which has been
	  checked before.   commit b1651ff3c5eed6289db9545d080d8d28bccfdbe4
	  Add a custom implementation of strsep().  This is required to build
	  the library on platforms where strsep() is not available, e.g. on
	  Windows via MinGW.

2020-08-25  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Implement cumulative counters (as
	  per PAPI specification) for cuda events and metrics. It includes
	  two #define controlled diagnostics that may prove necessary on
	  other models of GPUs. 'Produce_Event_Report' will print (to stderr)
	  all the events discovered, and 'Expose_Unenumerated_Events' will
	  add as PAPI events (beginning 'cuda:::unenum_event:0x...') those
	  events used by cuda metrics that are not reported by nvidia's event
	  enumeration routines. These are explained further in code comments.

2020-08-21  William Cohen <wcohen@redhat.com>

	* src/Makefile.inc: Makefile to generate papi-x.y.z.tar.gz directly
	  from git repo  SystemTap has a make rule to generate a tarball
	  directly from the git repository.  This make rule has proved useful
	  for quickly producing Fedora rawhide rpms with a snapshot of what
	  is currently git repository.  This patch adds a similar make rule
	  to PAPI.

Wed Aug 12 15:23:27 2020 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/lib/events/amd64_events_fam17h_zen1.h: Update libpfm4,
	  to be current with the following commit:
	  --------------------------------------------------------------
	  commit e162519d26d313860a9e69889bcc67406f92edc9  fix duplicate
	  event code on AMD Fam17h Zen1  Removed
	  DISPATCH_RESOURCE_STALL_CYCLES_0 which is not an AMD Fam17h event
	  but rather a Zen2 event with the same event code.  Reported-by:
	  Kaufmann, Steve <steven.kaufmann@hpe.com> Tested on Zen1, Castaldo.

Fri Jun 19 15:07:01 2020 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/README, src/libpfm4/docs/Makefile,
	  src/libpfm4/docs/man3/libpfm_arm_neoverse_n1.3,
	  src/libpfm4/include/perfmon/pfmlib.h, src/libpfm4/lib/Makefile,
	  src/libpfm4/lib/events/amd64_events_fam17h_zen1.h,
	  src/libpfm4/lib/events/arm_neoverse_n1_events.h,
	  src/libpfm4/lib/pfmlib_arm_armv8.c,
	  src/libpfm4/lib/pfmlib_common.c, src/libpfm4/lib/pfmlib_priv.h,
	  src/libpfm4/tests/validate_arm64.c: Update libpfm4, to be current
	  with the following commits:
	  --------------------------------------------------------------
	  commit 2c3d94eb306e52a48fe881c8c5d68fd8849bccc0  clean INC_ARM in
	  lib Makefile  Had duplicated INC_ARM= definitions. Some includes
	  were missing from INC_ARM64.   commit
	  286bf87042469524098a3aa65485f2eef395c3d5  enable priv level
	  filtering on ARMv8  ARMv8 core PMU supports privilege level
	  filtering but this was missing from the definitions all of ARMv8
	  PMUs, therefore it was ignored during perf_events encoding. This
	  patch fixes the problem by initializingi the .supported_plm field
	  properly.   commit 7fa9131274d450581aa98e6ee662a19f20ff3381  Enable
	  ARM Neoverse N1 core PMU  This patch enabled ARM Neoverse N1 core
	  PMU support. Event table based on: https://static.docs.arm.com/1006
	  16/0301/neoverse_n1_trm_100616_0301_01_en.pdf   commit
	  ea9752f3fee76798010093c2f35cbf719980997d  more updates to AMD
	  Fam17h Zen1 event table  Added: - DYNAMIC_INDIRECT_PREDICTIONS -
	  DECODER_OVERRIDES_PREDICTION  Reported-by: Emmanuel Oseret
	  <emmanuel.oseret@uvsq.fr>  commit
	  5a623727cf7111afd09df2cdb0ff4b294d31efa7  update AMD Fam17h Zen2
	  event table  Added: - FP_DISPATCH_FAULT -
	  DATA_CACHE_REFILLS_FROM_SYSTEM  Fixed typos in umask for
	  SOFTWARE_PREFETCH_DATA_CACHE_FILLS which are shared with
	  DATA_CACHE_REFILLS_FROM_SYSTEM.  Reported-by: Steve Kaufmann
	  <steven.kaufmann@hpe.com>

2020-07-24  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/README.md: Added an extra HSA_TOOLS_LIB export
	  that is required to read counters.

2020-07-23  Frank Winkler <frankbook@Franks-MacBook-Air.local>

	* src/high-level/papi_hl.c: Revised previous push.  Only nvml events
	  are automatically saved as instantaneous values.
	* src/high-level/papi_hl.c: Some events like power, temperature or
	  all nvml events are always considered instantaneous.

2020-07-17  Anthony <adanalis@icl.utk.edu>

	* src/papi_events.csv: Separated the cache preset events of AMD Zen1
	  and Zen2 and added some more.

2020-07-17  Frank Winkler <frankbook@franks-air.localdomain>

	* src/configure, src/configure.in: Revised configure script.  1)
	  Changed "--with-tests" option. The user can now disable all tests
	  using "--with-tests=no" or "--without-tests". MPI tests are
	  included in "--with-tests".  2) Aligned help text for a better
	  output format.

2020-07-14  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/tests/ROCM_SA_Makefile,
	  src/components/rocm/tests/rocm_failure_demo.cpp,
	  src/components/rocm/tests/rocm_standalone.cpp: Added two utilities
	  that perform event reading for AMD GPUs without any use of the PAPI
	  interface. To prove that PAPI is not the problem when events are
	  not working correctly.

2020-07-03  Frank Winkler <frankbook@franks-air.localdomain>

	* src/components/cuda/README.md, src/components/nvml/README.md: Added
	  instructions how to find the correct paths of all required shared
	  libraries at runtime.

2020-06-24  Steve Kaufmann <steven.kaufmann@hpe.com>

	* src/papi_events.csv: Added PAPI preset support for Fujitsu A64FX.

Sat Jun 13 00:39:58 2020 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/lib/events/amd64_events_fam17h_zen2.h: commit
	  5a623727cf7111afd09df2cdb0ff4b294d31efa7  update AMD Fam17h Zen2
	  event table  Added: - FP_DISPATCH_FAULT -
	  DATA_CACHE_REFILLS_FROM_SYSTEM  Fixed typos in umask for
	  SOFTWARE_PREFETCH_DATA_CACHE_FILLS which are shared with
	  DATA_CACHE_REFILLS_FROM_SYSTEM.  Reported-by: Steve Kaufmann
	  <steven.kaufmann@hpe.com>  commit
	  17e622e9539e1f8faf3c0c27889963a537e95537  add L2_PREFETCH_MISS_L3
	  for AMD Fam17h Zen2  Add missing L2_PREFETCH_MISS_L3 event for AMD
	  Fam17h Zen2.  Reported-by: Emmanuel Oseret
	  <emmanuel.oseret@uvsq.fr>

2020-06-23  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/components/sde/README.md: README.md edited online with
	  Bitbucket

2020-06-18  Heike Jagode <jagode@icl.utk.edu>

	* src/components/perf_event_uncore/README.md: README.md edited online
	  with Bitbucket

2020-06-18  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/components/perf_event_uncore/README.md: README.md edited online
	  with Bitbucket
	* src/components/perf_event_uncore/README.md: README.md edited online
	  with Bitbucket
	* src/components/perf_event_uncore/README.md: README.md edited online
	  with Bitbucket

2020-06-12  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm_smi/Rules.rocm_smi: Added an include directory
	  to the Rules file, to fix a coding error on including a new file,
	  kfd_ioctl.h. (The #include statement includes the directory
	  rocm_smi when it should not.)

2020-06-12  Frank Winkler <frankbook@Franks-MacBook-Air.local>

	* src/configure: Generated new configure file with autoconf 2.69 on
	  saturn.icl.utk.edu.
	* src/configure.in: Added rpath and runpath to find libpfm.so and
	  libpapi.so if not specified via LD_LIBRARY_PATH. The search path at
	  runtime can be overriden by LD_LIBRARY_PATH.

2020-06-12  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/components/perf_event_uncore/README.md: README.md edited online
	  with Bitbucket
	* src/components/perf_event_uncore/README.md: README.md edited online
	  with Bitbucket
	* src/components/perf_event_uncore/README.md: README.md edited online
	  with Bitbucket

Sat May 30 18:08:52 2020 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/lib/events/amd64_events_fam17h_zen1.h: Update libpfm4,
	  to be current with the following commit:
	  --------------------------------------------------------------
	  commit c99ed181402b21e74744d5f602aceb6a320c7ded  update AMD64
	  Fam17h Zen1 event table  Add a few missing events. Thanks to
	  Emmanuel for tracking them down.  Based on AMD Fam17h model 01,08h
	  B2 PPR version 3.03 Jun 14, 2019  Reported-by: Emmanuel Oseret
	  <emmanuel.oseret@uvsq.fr>  Tested functional on ICL Morphine; AMD64
	  Fam17h Zen1 machine.

2020-06-03  Heike Jagode <jagode@icl.utk.edu>

	* src/papi.h: Bug fix for architectures with more than 40 PMUs (e.g.
	  KNL has > 40 uncore PMUs).  PAPI_PMU_MAX and its static value were
	  introduced in 2014 (https://bitbucket.org/icl/papi/commits/2a1805ec
	  ebba1b1789853e0a36af9bd921ef1b9a).  The problem was not only that
	  papi_component_avail didn't list all PMUs, but even worse, that
	  papi_native_avail did, in fact, list all events, however, if a user
	  tried to monitor listed events from omitted PMUs, an error was
	  returned.

2020-05-29  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/components/perf_event_uncore/README.md: README.md edited online
	  with Bitbucket

2020-05-29  Frank Winkler <frankbook@Franks-MacBook-Air.local>

	* src/components/perf_event_uncore/README.md: Added FAQ entry for
	  component perf_event_uncore.
	* src/components/cuda/README.md, src/components/perf_event/README.md,
	  src/components/perf_event_uncore/README.md,
	  src/components/powercap/README, src/components/powercap/README.md,
	  src/components/powercap_ppc/README,
	  src/components/powercap_ppc/README.md, src/components/rapl/README,
	  src/components/rapl/README.md, src/components/sde/README,
	  src/components/sde/README.md, src/components/sensors_ppc/README,
	  src/components/sensors_ppc/README.md,
	  src/components/stealtime/README.md, src/components/vmware/README,
	  src/components/vmware/README.md: New readme files for the
	  components in markdown format (2/2).

2020-05-29  Frank Winkler <frank.winkler@tu-dresden.de>

	* src/components/pcp/README.md: README.md edited online with
	  Bitbucket

2020-05-28  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm/linux-rocm.c, src/components/rocm_smi/linux-
	  rocm-smi.c: Improved _init_component() code in rocm, rocm_smi to
	  populate the disabled reason if library failures are encountered
	  during initialization. Per Steve Kaufmann request 05/28/2020.

2020-05-28  Frank Winkler <frankbook@franks-air.localdomain>

	* src/components/appio/README, src/components/appio/README.md,
	  src/components/bgpm/README, src/components/bgpm/README.md,
	  src/components/coretemp/README.md,
	  src/components/coretemp_freebsd/README,
	  src/components/coretemp_freebsd/README.md,
	  src/components/emon/README, src/components/emon/README.md,
	  src/components/example/README.md,
	  src/components/host_micpower/README,
	  src/components/host_micpower/README.md,
	  src/components/infiniband/README,
	  src/components/infiniband/README.md, src/components/io/README,
	  src/components/io/README.md, src/components/libmsr/README.md,
	  src/components/lmsensors/README.md,
	  src/components/lustre/README.md, src/components/lustre/linux-
	  lustre.c, src/components/micpower/README,
	  src/components/micpower/README.md, src/components/mx/README.md,
	  src/components/net/README, src/components/net/README.md: New readme
	  files for 16 components in markdown format.

2020-05-22  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/nvml/tests/Makefile: Removed some leftover
	  development lines from nvml/tests/Makefile

2020-05-21  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm_smi/Rules.rocm_smi, src/components/rocm_smi
	  /linux-rocm-smi.c, .../rocm_smi/tests/rocm_command_line.cpp:
	  Changes to make rocm_smi have its own PAPI_ROCMSMI_ROOT variable,
	  and given PAPI_ROCM_SMI_LIB as an override environment variable.
	* src/components/cuda/README, src/components/cuda/README.md,
	  src/components/nvml/README, src/components/nvml/README.md,
	  src/components/pcp/README, src/components/pcp/README.md,
	  src/components/rocm/README, src/components/rocm/README.md,
	  src/components/rocm_smi/README, src/components/rocm_smi/README.md:
	  Changed files to Markup versions, matched templates, for five
	  components; cuda, nvml, pcp, rocm, rocm_smi.

Mon May 18 09:33:57 2020 -0700  Steve Kaufmann <steven.kaufmann@hpe.com>

	* src/libpfm4/README, src/libpfm4/docs/Makefile,
	  src/libpfm4/docs/man3/libpfm_arm_a64fx.3,
	  src/libpfm4/include/perfmon/pfmlib.h,
	  src/libpfm4/lib/events/arm_fujitsu_a64fx_events.h,
	  src/libpfm4/lib/events/intel_icl_events.h,
	  src/libpfm4/lib/pfmlib_arm_armv8.c,
	  src/libpfm4/lib/pfmlib_common.c, src/libpfm4/lib/pfmlib_priv.h,
	  src/libpfm4/tests/validate_arm64.c,
	  src/libpfm4/tests/validate_x86.c: Update libpfm4, to be current
	  with the following commit:
	  --------------------------------------------------------------
	  commit 0cfc35f73e0e39d54ba48c24e663bec93d164211  Enable support for
	  Fujitsu A64FX core PMU  This patch adds support for Fujitsu A64FX
	  core PMU. This includes ARMv8 generic core events and Fujitsu model
	  specfic events.

2020-05-15  Frank Winkler <frankbook@franks-air.localdomain>

	* src/configure: Generated new configure file with autoconf (2.69) on
	  saturn.

2020-05-07  Anthony <adanalis@icl.utk.edu>

	* src/components/sde/sde.c: Avoid creating a variable for something
	  that is only used for a debug message. Otherwise we create compiler
	  warnings when debug is not enabled.

2020-05-07  Frank Winkler <frankbook@franks-air.localdomain>

	* src/configure.in, src/utils/papi_native_avail.c: Added CFLAG -DSDE.

2020-04-30  Frank Winkler <frankbook@Franks-MacBook-Air.local>

	* src/configure.in, src/ctests/Makefile.recipies,
	  src/ctests/Makefile.target.in, src/utils/Makefile,
	  src/utils/Makefile.target.in, src/utils/papi_native_avail.c: Fixed
	  static build. - SDE component is disabled - "ctest" shlib is
	  disabled

Thu Apr 16 15:12:05 2020 +0200  Thomas Richter <tmricht@linux.ibm.com>

	* src/libpfm4/lib/events/s390x_cpumf_events.h,
	  src/libpfm4/lib/pfmlib_s390x_cpumf.c, src/libpfm4/tests/validate.c:
	  Update libpfm4, to be current with the following commit:
	  --------------------------------------------------------------
	  commit 47f0845d81f851e8bee8745b8c4c7ad6f8e03122  s390: Update
	  counter definition  This patch updates the libpfm4 s390 counter
	  defintions to the latest documentation:  SA23-2260-06:The Load-
	  Program-Paramenter and the CPU-Measurement Facilities, September
	  2019  https://www.ibm.com/support/pages/sites/default/files/inline-
	  files/117183_SA23-2260-06.pdf  SA23-2261-06:The CPU-Measurement
	  Facility Extended Counters Definition for z10, z196/z114,
	  zEC12/zBC12, z13/z13s, z14 and z15, January 2020
	  https://www.ibm.com/support/pages/sites/default/files/inline-
	  files/119190_SA23-2261-06.pdf  This includes updated counter
	  description for existing counters and the complete counter
	  definition for IBM z15.   commit
	  f1aedd4f189814b980763f9db2465a4a9c34bd6e  validate: Add flag p to
	  the getopt list of commandline files  The validate program supports
	  flag -p to test perf events. This option is not listed in the
	  getopt list.

2020-04-24  Frank Winkler <frankbook@Franks-MacBook-Air.local>

	* src/configure.in: Another test for "--with-static-tools".
	* src/configure.in: Fixed configure options for shared and static
	  builds.  1) --with-static-lib=no (force PAPI to build shared
	  libraries and tools) 2) --with-shlib-tools (use internal libpfm via
	  rpath-link)
	* release_procedure.txt: Modified instructions for release procedure.
	* release_procedure.txt, src/configure: Generated new configure file
	  with autoconf 2.69 on saturn.

2020-04-19  Frank Winkler <frankbook@Franks-MacBook-Air.local>

	* src/configure.in, src/ctests/Makefile.recipies,
	  src/ctests/Makefile.target.in: Fixed bug for MPI tests.  MPI tests
	  are disabled if - user specifiy "--with-shared-lib=no" - mpicc is
	  not using the current $CC compiler

2020-04-17  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Repaired error code and error
	  reporting on the check for compute capability >= 7.5. An
	  uninitialized variable.

2020-04-14  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm_smi/linux-rocm-smi.c,
	  .../rocm_smi/tests/power_monitor_rocm.cpp: Changed component to
	  handle mis-numbered sensors coming from driver. Cleaned up
	  commenting in power_monitor_rocm.cpp.

Wed Apr 8 01:02:22 2020 -0700  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/lib/events/intel_skl_events.h: Update libpfm4, to be
	  current with the following commit:
	  --------------------------------------------------------------
	  commit 34164d84bba9794c75b4ce643ad74aad1362e97a  fix encoding typos
	  for OFFCORE_RESPONSE on SKL/SKX/CLX  Some of the aliases encodings
	  were wrong. No impact because only the encoding of the actual umask
	  is used except when listing umasks.

2020-04-03  Frank Winkler <frankbook@Franks-MacBook-Air.local>

	* release_procedure.txt: Added text for bug fix release procedure.
	* src/papi.h: Fixed typo.

2020-04-02  Anthony <adanalis@icl.utk.edu>

	* .../tests/Created_Counter/Created_Counter_Driver.c,
	  .../Created_Counter/Lib_With_Created_Counter.c,
	  .../sde/tests/Created_Counter/Overflow_Driver.c,
	  src/components/sde/tests/Makefile: Added example that shows how to
	  implement Created Counters in a library and how to use
	  PAPI_overflow() to monitor an SDE.

2020-04-01  Frank Winkler <frankbook@Franks-MacBook-Air.local>

	* src/Makefile.inc: Fixed bug in install process. Create BINDIR
	  before copying the hl python script to BINDIR.

2020-03-30  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/cuda/linux-cuda.c: Changed to report a useful
	  disabled reason when devices with compute capability >=7.5 are
	  present; these no longer support the CUPTI interface.

2020-03-24  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* src/components/rocm_smi/tests/ROCM_SMI_Makefile,
	  .../rocm_smi/tests/power_monitor_rocm.cpp,
	  src/components/rocm_smi/tests/rocmcap_plot.cpp: New code for power
	  monitoring, replaces rocmcap_plot.cpp. Extensive new command line
	  options.

Fri Mar 6 17:32:45 2020 -0800  Stephane Eranian <eranian@gmail.com>

	* src/libpfm4/README, src/libpfm4/docs/Makefile,
	  src/libpfm4/docs/man3/libpfm_intel_icl.3,
	  src/libpfm4/docs/man3/libpfm_intel_tmt.3,
	  src/libpfm4/docs/man3/libpfm_perf_event_raw.3,
	  src/libpfm4/examples/showevtinfo.c,
	  src/libpfm4/include/perfmon/pfmlib.h, src/libpfm4/lib/Makefile,
	  src/libpfm4/lib/events/amd64_events_fam17h_zen2.h,
	  src/libpfm4/lib/events/intel_icl_events.h,
	  src/libpfm4/lib/events/intel_tmt_events.h,
	  src/libpfm4/lib/pfmlib_common.c,
	  src/libpfm4/lib/pfmlib_intel_icl.c,
	  src/libpfm4/lib/pfmlib_intel_rapl.c,
	  src/libpfm4/lib/pfmlib_intel_skl.c,
	  src/libpfm4/lib/pfmlib_intel_tmt.c,
	  src/libpfm4/lib/pfmlib_intel_x86.c,
	  src/libpfm4/lib/pfmlib_intel_x86_perf_event.c,
	  src/libpfm4/lib/pfmlib_intel_x86_priv.h,
	  src/libpfm4/lib/pfmlib_perf_event.c,
	  src/libpfm4/lib/pfmlib_perf_event_priv.h,
	  src/libpfm4/lib/pfmlib_priv.h, src/libpfm4/tests/validate_x86.c:
	  Update libpfm4, to be current with the following commit:
	  --------------------------------------------------------------
	  NOTE: Intel Tremont and IceLake changes have not been tested; due
	  to lack of hardware at this time.  commit
	  647d1160b6fdd902b2bfe3138522cc09e2d57387  add Intel Icelake core
	  PMU support  This patch adds Intel Icelake core PMU support for all
	  published SKUs. It is based on the official event table published
	  at download.01.org version 1.04.   commit
	  5847026aa516dd4c220a5d04ab9e6128eefc19fd  add hw_smpl support to
	  x86 perf_events code  Enables support for new hw_smpl attribute to
	  perf_events x86 code.   commit
	  67e238ef03bcdccd017c1bfc2a0c4d8fe545c442  add perf_events hw_smpl
	  attribute  This patch adds a new attribute to perf_events OS
	  support. The attribute is called hw_smpl. It enables hardware
	  sampling on an event. Hardware sampling is CPU specific and
	  therefore requires CPU specific code. hw_smpl is a variation of
	  precise sampling. It provides hardware assistance to sample but
	  does not guarantee precise attribution of samples to code. With
	  perf_events this is equivalent to setting attr.precise_ip = 1 which
	  is what this attribute does.  This patch only modifies the generic
	  perf_event code.   commit 2ba296e3b1254f2bbaa0c7a3505721f395b53bf8
	  enable ExtendedPEBS attribute support for Intel X86  This patch
	  introduces a new Intel X86 specific PMU flag,
	  INTEL_X86_PMU_FL_EXTPEBS, to indicate that the PMU supports
	  Extended PEBS. ExtendedPEBS provides the flexibility of the PEBS
	  hardware assist sampling without guaranteeing the precision of the
	  sample instruction address.   commit
	  2a6c6b60c4f65f63a300be52382af283a6a537c8  add support_hw_smpl
	  attribute  This patch adds a new event and umask attribute visible
	  at the API level called support_hw_smpl. This is a boolean
	  attribute. If set, it means the event or the umask supports
	  hardware buffer sampling. In other words, the event can be sampled
	  using hardware-assist buffer instead of basic interrupt-based
	  sampling. This usually brings lower cost of sampling by amortizing
	  the PMU interupt over multiple samples.  Hardware-assist sampling
	  does not mean there is no sampling skid. There may be some skid.
	  Only events supporting precise sampling can be sampled without
	  skid. Note that oftentimes, precise sampling is achieved by having
	  a precise event sampled using a hardware-assist buffer. In other
	  words, event/umask marked as precise usually also have
	  support_hw_smpl set to true but this is not a requirement.  This
	  patch adds the new attributes in the generic code, the man page,
	  and the showeventinfo program. Arch-specific enablement is provided
	  by separate patches.   commit
	  70f9c2d13ee7088be788a399e23f69a5f0524cb4  fix handling of FETHR on
	  Intel X86  This patch fixes several issues with the handling of the
	  Precise FRONTEND_RETIRED event on Intel X86 processors which
	  support it (Skylake and later). First, the FE latency field is not
	  3 bits but 12. Second, the code was missing locked down capability
	  for the fe_thres modifier. Some events of the fe_thres hardcoded,
	  and therefore attempts to force a value should be rejected. Third,
	  when a umask does not hardcode a fe_thres then the user can pass
	  one. Note that not all umasks of the event use the fe_thres.
	  commit 42c1857c7694cec1a4750a340381d49dd84ca8ff  add
	  RETIRED_SSE_AVX_FLOPS event for AMD64 Fam17h Zen2  Was missing from
	  initial commit. Added as PPR rev 0.54.  Note that this event by
	  itself does not count correctly. It needs large increment support,
	  which means merging of two consecutive counters. This is handled by
	  the Linux kernel starting with 5.6-rc4. The library simply encodes
	  the event as if it was like any other normal event.   commit
	  210b2ef95f33eccb671f2a88a979de5364c94465  fix Intel Tremont OCR
	  event code  In Tremont, the second OCR event has encoding 0x02b7
	  and not 0x01bb.   commit a2909cdfbea45524931ca13035293555a645d2e5
	  add Intel Tremont core PMU support  This patch adds support for the
	  Intel Tremont core PMU events. Based on Intel snowridgex_v1.06.json
	  event information released on download.01.org/perfmon/SNR.   commit
	  0f6a3c3308f29699b4f698b5e0983af322d44bdb  update RAPL processor
	  support  Added  Goldmont, Cannonlake, CometLake, Icelake support.
	  Based on Linux kernel support.   commit
	  a291613f3cd2d3e3355627674af264210c3fcbe1  enable Intel CometLake
	  support  Identical to Intel Skylake client support.

2020-03-18  Frank Winkler <frankbook@Franks-MacBook-Air.local>

	* src/configure, src/configure.in: Some modifications.

2020-03-17  Frank Winkler <frankbook@Franks-MacBook-Air.local>

	* src/configure: Generated configure via autoconf 2.69.
	* src/configure.in: Put paranoid check message at the end of
	  configure.

2020-03-16  Frank Winkler <frankbook@m016.zih.tu-dresden.de>

	* src/configure, src/configure.in: Replaced paranoid check error with
	  warning.

2020-03-14  Frank Winkler <frankbook@franks-air.localdomain>

	* src/configure, src/configure.in: Added paranoid check at
	  configuration.
	* src/threads.c: Removed linux condition.
	* src/papi_internal.c, src/papi_internal.h, src/threads.c,
	  src/threads.h: Added several thread identification functions for
	  PAPI_thread_init (2).

2020-03-13  Frank Winkler <frankbook@m016.zih.tu-dresden.de>

	* src/high-level/papi_hl.c, src/papi.c, src/papi_internal.c,
	  src/papi_internal.h: Added several thread identification functions
	  for PAPI_thread_init.

2020-03-05  Anthony Castaldo <TonyCastaldo@icl.utk.edu>

	* release_procedure.txt: Further clarifications in
	  release_procedure.txt
