Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No run result from certain benchmarks #132

Open
DonggeLiu opened this issue Feb 29, 2024 · 10 comments
Open

No run result from certain benchmarks #132

DonggeLiu opened this issue Feb 29, 2024 · 10 comments
Assignees
Labels

Comments

@DonggeLiu
Copy link
Collaborator

https://llm-exp.oss-fuzz.com/Result-reports/scheduled/2024-02-29-weekly-all/sample/output-wxwidgets-_z15wxgetfreememoryv/01

@cjx10
Copy link
Collaborator

cjx10 commented Mar 5, 2024

Incorrect target name and target path.
For wxwidgets, runner.cpp was not compiled, only zip.cpp

@DonggeLiu
Copy link
Collaborator Author

FI's daily report contains the info about fuzz target, but 1) the fuzz target path is not consistent, and 2) it does not store the fuz target binary name yet.

Inconsistency:
Some projects' reports contain the full path (which is what we need):
https://storage.googleapis.com/oss-fuzz-introspector/abseil-cpp/inspector-report/20240305/summary.json
But others seem to use binary name:
https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240305/summary.json

Ideally, we need two info from the FI report:

  1. Fuzz target path. In particular, some projects move the fuzz target during compilation. We need the fuzz target path before compilation, but FI usually provides the one after compilation.
  2. Binary name, which can be different from fuzz target basename.

@DavidKorczynski
Copy link
Collaborator

Ideally, we need two info from the FI report:

  1. Fuzz target path.

Just to clarify, this refers to the source code path?

Inconsistency:
Some projects' reports contain the full path (which is what we need):
https://storage.googleapis.com/oss-fuzz-introspector/abseil-cpp/inspector-report/20240305/summary.json
But others seem to use binary name:
https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240305/summary.json

Is the issue with wxwidgets or abseil-cpp, or both? wxwidgets has both the fuzzer binary name and the source code path: https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240318/fuzz_report.html -- abseil-cpp only has the source code path. To my understanding wxwidgets is good but abseil-cpp has issues, is this correctly understood?

@DonggeLiu
Copy link
Collaborator Author

Hi @DavidKorczynski!

  1. Fuzz target path.
    Just to clarify, this refers to the source code path?

Yes! The framework needs the source code path before compilation to locate the file and then replace its content with the LLM-generated fuzz target.
I recall that you explained that some projects move the files around while building the fuzz target. Hence, the path before compilation is required.

Is the issue with wxwidgets or abseil-cpp, or both? wxwidgets has both the fuzzer binary name and the source code path: https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240318/fuzz_report.html -- abseil-cpp only has the source code path. To my understanding wxwidgets is good but abseil-cpp has issues, is this correctly understood?

Yes! You are correct.
wxwidgets is good because it provides both fuzz target binary name and fuzz target source code path, which is needed by its benchmark YAML. In this particular case, our benchmark YAML is wrong because runner.cpp was never compiled. FI correctly provides zip and /src/wxwidgets/tests/fuzz/zip.cpp, precisely what we need.

Since wxwidgets's summary.json does not have this info, what's the best way to retrieve this information? (E.g., other files in bucket, APIs, or libs? I presume parsing the fuzz_report.html is not a scalable idea.)

@DavidKorczynski
Copy link
Collaborator

Can you confirm abseil-cpp is looking good now? This is what I get when I generate a benchmark:

"functions":
- "name": "_ZN4absl19str_format_internal13FormatArgImpl8DispatchIxEEbNS1_4DataENS0_24FormatConversionSpecImplEPv"
  "params":
  - "name": "arg"
    "type": "char *"
  - "name": "spec"
    "type": "size_t"
  - "name": "out"
    "type": "int"
  - "name": ""
    "type": "char *"
  "return_type": "bool"
  "signature": "bool absl::str_format_internal::FormatArgImpl::Dispatch<absl::str_format_internal::VoidPtr>(DW_TAG_union_typeData, FormatConversionSpecImpl, void *)"
- "name": "_ZN4absl19str_format_internal13FormatArgImpl8DispatchINS_6int128EEEbNS1_4DataENS0_24FormatConversionSpecImplEPv"
  "params":
  - "name": "arg"
    "type": "char *"
  - "name": "spec"
    "type": "size_t"
  - "name": "out"
    "type": "int"
  - "name": ""
    "type": "char *"
  "return_type": "bool"
  "signature": "bool absl::str_format_internal::FormatArgImpl::Dispatch<absl::Dispatch<absl::str_format_internal::VoidPtr>(DW_TAG_union_typeData, FormatConversionSpecImpl, void *)"
- "name": "_ZN4absl19str_format_internal17FormatConvertImplEwNS0_24FormatConversionSpecImplEPNS0_14FormatSinkImplE"
  "params":
  - "name": "v"
    "type": "int"
  - "name": "conv"
    "type": "size_t"
  - "name": "sink"
    "type": "int"
  - "name": ""
    "type": "absl::str_format_internal::FormatSinkImpl *"
  "return_type": "char"
  "signature": "CharConvertResult absl::str_format_internal::FormatConvertImpl(wchar_t, const FormatConversionSpecImpl, FormatSinkImpl *)"
"language": "c++"
"project": "abseil-cpp"
"target_name": "string_escape_fuzzer"
"target_path": "/src/string_escape_fuzzer.cc"

@DonggeLiu
Copy link
Collaborator Author

Thanks @DavidKorczynski! May I ask if this is generated via python -m data_prep.introspector abseil-cpp -m 5 or a new lib/API of FI?

I can confirm that data_prep.introspector generates the same result on my side, but I thought previously the binary target_name and source code target_path were the same, too? E.g., https://github.com/google/oss-fuzz-gen/pull/162/files

Previously, the target_name and target_path were generated via building local docker images, which takes a long time and is error-prone (e.g., using runner.cpp from wxwidgets, which was never built).
Given FI has the correct target_name and target_path in some project reports, it would be great if a user could use them directly. What would be the best way to access that info?

@DavidKorczynski
Copy link
Collaborator

Thanks @DavidKorczynski! May I ask if this is generated via python -m data_prep.introspector abseil-cpp -m 5 or a new lib/API of FI?

Yes, it was

Previously, the target_name and target_path were generated via building local docker images, which takes a long time and is error-prone (e.g., using runner.cpp from wxwidgets, which was never built). Given FI has the correct target_name and target_path in some project reports, it would be great if a user could use them directly. What would be the best way to access that info?

Ah, I misunderstood @DonggeLiu -- I thought that the goal was to fix project_src.search_source if it was broken but the plan is to substitute it in full to avoid having that step?

@DavidKorczynski
Copy link
Collaborator

In google/oss-fuzz#11718 I pushed out a fix that allows FI to see the binary for bazel projects. For example, abseil-cpp used to be:

Screenshot 2024-04-03 111941

And now it is:

Screenshot 2024-04-03 111843

The only missing piece there for FI to provide all the info is giving the source at the start of the Dockerfile creation, and not the bazel sandbox path

@DavidKorczynski
Copy link
Collaborator

DavidKorczynski commented Apr 3, 2024

Notice there are some projects with limitations. For example, picotls creates analysis for three harnesses. However, picotls in fact only has two harnesses in OSS-Fuzz, but, they build three harnesses and only copy two of them out -- i.e. one of the build fuzzers does not actually run on OSS-Fuzz so there is no binary to map to. They build 3 fuzzers but only copy 2 to oss-fuzz's project: https://github.com/google/oss-fuzz/blob/153df7202e2838a3239f2a631686ad49eff62347/projects/picotls/build.sh#L21-L22

Consequently, there will be one erroneous match here.

@DonggeLiu
Copy link
Collaborator Author

Ah, I misunderstood @DonggeLiu -- I thought that the goal was to fix project_src.search_source if it was broken but the plan is to substitute it in full to avoid having that step?

Yep, sorry for not being clearer earlier!
That step is redundant and time-consuming, because FI already has the correct binary target_name and fuzz target source code target_path for most projects, and they are almost always constant over time. Replacing that step can avoid building docker images and retrieving them from containers whenever we update benchmarks.

They build 3 fuzzers but only copy 2 to oss-fuzz's project:

That's an interesting case, thanks for noticing it.
Since we only need one pair of binary target_name and source code target_path, is there a general way to automatically select the one actually used in fuzzing? (e.g., by checking the past coverage or other cleverer ways)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants