Add sample Python auto generation #205

DavidKorczynski · 2024-04-08T20:47:48Z

Sample for auto-generating an OSS-Fuzz project for a given Python project.

This differs a bit from the existing set up. The approach in this PR relies on cloning the Python repository within an OSS-Fuzz base-builder image. Within this image Fuzz Introspector is cloned, and a fuzz introspector analysis purely based on static analysis is performed, to extract details about the Python library under analysis.

It's a bit raw at this stage, and can likely be better integrated into OSS-Fuzz-gen. However, I'm not 100% sure what the smartest steps are here, so am sharing in case there are opinions. It may be smarter to operate on two tracks in parallel and merge them later on once it's better known what works/doesn't work.

Signed-off-by: David Korczynski <david@adalogics.com>

oliverchang

nice! just initial comments.

I wonder if there's a way to refactor this to make all of this a bit neater in the future. Right now the concept of a benchmark is quite ingrained in these classes, so we need to untie that, or alternately somehow create a synthetic "benchmark" based on the Python project we're generating targets for.

oliverchang · 2024-04-11T09:04:05Z

python_fuzzgen/README.md

@@ -0,0 +1,9 @@
+# Python auto-gen


nit: can we place this in /languages/python instead?

oliverchang · 2024-04-11T09:06:32Z

python_fuzzgen/build.py

+
+def create_sample_harness(github_repo: str, func_elem):
+
+  prompt_template = """Hi, I'm looking for your help to write a Python fuzzing harness for the %s Python project. The project is located at %s and I would like you to write a harness targeting this module. You should use the Python Atheris framework for writing the fuzzer. Could you please show me the source code for this harness?


Can we put this into a file?

oliverchang · 2024-04-11T09:07:49Z

experiment/builder_runner.py

@@ -119,6 +119,13 @@ def _pre_build_check(self, target_path: str,
      return False
    return True

+  def build_and_run_python(self, generated_project: str, target_path: str):


Can we avoid adding language specific methods here? We should ideally make this class language agnostic.

oliverchang · 2024-04-11T09:09:25Z

experiment/builder_runner.py

@@ -145,6 +152,36 @@ def build_and_run(self, generated_project: str, target_path: str,
        generated_project, benchmark_target_name))
    return build_result, run_result

+  def run_target_local_python(self, generated_project: str, target_name: str,


This is very similar to run_target_local. Is the only difference we're not passing things from self.benchmark ?

We should find some better way to refactor things so there's less duplication here. One simple way I can think of now is to perhaps factor out a general run_oss_fuzz_helper(...) which is called by both run_target_local and python_fuzzgen.

oliverchang · 2024-04-11T09:09:43Z

experiment/builder_runner.py

@@ -199,6 +237,14 @@ def build_target_local(self,
        print(f'Failed to build image for {generated_project}')
        return False

+    if language == 'python':
+      command = 'python3 infra/helper.py build_fuzzers %s' % (generated_project)


Can you use fstrings here to be consistent?

oliverchang · 2024-04-11T09:11:27Z

experiment/builder_runner.py

@@ -199,6 +237,14 @@ def build_target_local(self,
        print(f'Failed to build image for {generated_project}')
        return False

+    if language == 'python':


What breaks if we let this run through the existing code from line 248 instead? IS there a way to make this work by changing the env vars being set there instead?

DavidKorczynski added 2 commits April 8, 2024 13:51

Add sample Python auto generation

46c4152

Signed-off-by: David Korczynski <david@adalogics.com>

nit

46a53a5

Signed-off-by: David Korczynski <david@adalogics.com>

oliverchang reviewed Apr 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sample Python auto generation #205

Add sample Python auto generation #205

DavidKorczynski commented Apr 8, 2024

oliverchang left a comment

oliverchang Apr 11, 2024

oliverchang Apr 11, 2024

oliverchang Apr 11, 2024

oliverchang Apr 11, 2024

oliverchang Apr 11, 2024

oliverchang Apr 11, 2024


		def create_sample_harness(github_repo: str, func_elem):

		prompt_template = """Hi, I'm looking for your help to write a Python fuzzing harness for the %s Python project. The project is located at %s and I would like you to write a harness targeting this module. You should use the Python Atheris framework for writing the fuzzer. Could you please show me the source code for this harness?

Add sample Python auto generation #205

Are you sure you want to change the base?

Add sample Python auto generation #205

Conversation

DavidKorczynski commented Apr 8, 2024

oliverchang left a comment

Choose a reason for hiding this comment

oliverchang Apr 11, 2024

Choose a reason for hiding this comment

oliverchang Apr 11, 2024

Choose a reason for hiding this comment

oliverchang Apr 11, 2024

Choose a reason for hiding this comment

oliverchang Apr 11, 2024

Choose a reason for hiding this comment

oliverchang Apr 11, 2024

Choose a reason for hiding this comment

oliverchang Apr 11, 2024

Choose a reason for hiding this comment