Use error messages from jcc err.log in experiments #230

cjx10 · 2024-04-26T03:32:58Z

Before merging:

Remove custom jcc

This pr adds support for extracting error messages from jcc's err.log

experiment/evaluator.py

cjx10 · 2024-05-03T04:43:48Z

Special cases when get_jcc_errstr() cannot find error message but it should:

proftpd: cp tests/fuzzing/json_fuzzer.c /src/fuzzer.c

In all other projects, get_jcc_errstr() can return desired error messages, especially linker errors, or there's build issue from the project source. (In other words, fuzz target is never touched by jcc and error messages from source are not returned mistakenly)

DonggeLiu · 2024-05-04T01:25:31Z

proftpd: cp tests/fuzzing/json_fuzzer.c /src/fuzzer.c

In all other projects, get_jcc_errstr() can return desired error messages, especially linker errors, or there's build issue from the project source. (In other words, fuzz target is never touched by jcc and error messages from source are not returned mistakenly)

Thanks for documenting this.
I reckon if this is the only case, then it's probably not too important.

However, could you please try building the original fuzz target with JCC and see if that fails?
E.g., build its image in OSS-Fuzz, set JCC in the docker container, and compile to build the binaries.
This will confirm if the problem is from JCC, given proftpd can build without it:
https://oss-fuzz-build-logs.storage.googleapis.com/index.html#proftpd

cjx10 · 2024-05-06T06:34:28Z

try building the original fuzz target with JCC

To clarify, proftpd built in oss-fuzz-gen, but we lost track of the fuzz target in logs because the file name changed.

For testing, in oss-fuzz the following 2 lines were added to projects/{project_name}/Dockerfile:

ENV CC=/usr/local/bin/clang-jcc
ENV CXX=/usr/local/bin/clang++-jcc

Then python3 infra/helper.py shell {project_name}, manually run compile in docker shell.

Below is a list of projects found not building in oss-fuzz-gen, and whether they build with jcc (7c67544) in oss-fuzz:

firestore: build
libidn2: fail
libpsl: fail
libtasn1: fail
mdbtools: fail
mosh: fail
myanmar-tools: fail
oatpp: build
openvswitch: build
piex: build
proj4: fail
protobuf-c: fail
qpid-proton: build
tarantool: fail

experiment/builder_runner.py

llm_toolkit/code_fixer.py

DonggeLiu · 2024-05-14T03:34:22Z

llm_toolkit/code_fixer.py

+      # Assume the default output name.
+      return 'a.out'
+    return os.path.basename(output_name)
+  return ''


Exclude the simple cases first to avoid nested conditions, e.g.,

if not target_found: return ''

Log a warning when the target is not found.

Please simplify this function, e.g.,

for i, arg in enumerate(compile_args): if arg in ['-o', '--output'] and i < len(compile_args) - 1: output_name = compile_args[i + 1] elif arg.startswith('--output='): output_name = arg.removeprefix('--output=') elif not arg.startswith('-') and os.path.basename(arg) in target_names: target_found = os.path.basename(arg) if not target_found: return '' if output_name: return os.path.basename(output_name) if '-c' in compile_args: return f'{os.path.splitext(target_found)[0]}.o' logging.warning( 'Output file not specified in [%s], but fuzz target found', ' '.join(compile_args)) return 'a.out'

Thanks for the suggestion.
Re 2.
Probably better to not log a warning when the target is not found. It's expected to have many such cases where the command is not compiling the fuzz target, indicating the log lines followed are not the target error lines we want.
Rename variable instead: target_found -> fuzz_target_found

llm_toolkit/code_fixer.py

cjx10

Thanks so much for the detailed review Dongge :)

llm_toolkit/code_fixer.py

cjx10 · 2024-05-14T06:56:25Z

llm_toolkit/code_fixer.py

+      # Assume the default output name.
+      return 'a.out'
+    return os.path.basename(output_name)
+  return ''


Thanks for the suggestion.
Re 2.
Probably better to not log a warning when the target is not found. It's expected to have many such cases where the command is not compiling the fuzz target, indicating the log lines followed are not the target error lines we want.
Rename variable instead: target_found -> fuzz_target_found

DonggeLiu · 2024-05-15T00:31:33Z

experiment/builder_runner.py

    except FileNotFoundError as e:
      logging.error('Cannot get err.log for %s: %s', generated_project, e)
+      # Touch err.log in results folder to avoid FileNotFoundError when
+      # extracting errors.
+      open(jcc_errlog_path, 'x')


Is there a better solution than this?
I guess this intends to make parsing easier, but it may confuse us to think JCC created an empty file. We will have to search in gcloud logs to distinguish these two cases.

Could we check os.path.isfile() at the beginning extract_error_message() instead?

Maybe add a magic string in err.log when err.log does not exist?

This solution is for local experiment, creating an empty file is currently consistent with the behaviour of getting the build log and run log. Might help prevent increasing the complexity of build_and_run() workflow?

Maybe add a magic string in err.log when err.log does not exist?

This can work but not preferred.
Intuitively, if err.log is not generated, then it should not exist.

creating an empty file is currently consistent with the behaviour of getting the build log and run log

Where do we create empty build log and run logs when they do not exist?

Might help prevent increasing the complexity of build_and_run() workflow?

Not sure if this relates, but I suppose this will only add two lines in code_fixer?

Could we check os.path.isfile() at the beginning extract_error_message() instead?

Where do we create empty build log and run logs?

It's in cloud builder, we open the local file object before checking if the file blob on cloud exists
https://github.com/google/oss-fuzz-gen/blob/main/experiment/builder_runner.py#L645

Ah, I see; let's keep it, then.
Thanks!

DonggeLiu · 2024-05-15T00:42:22Z

llm_toolkit/code_fixer.py

+    elif arg.startswith('-o'):
+      output_name = arg.removeprefix('-o')
+    elif (not arg.startswith('-') and not arg == output_name and
+          os.path.basename(arg) in target_names):


A silly question:
Why do we need not arg == output_name?

Also, would it be a good idea to log a warning if arg in ['-o', '--output'] but i + 1 >= len(compile_args)?
This is unlikely to happen now but may occur later when we automate the build script.

Why do we need not arg == output_name?

Since we assigned output_name = compile_args[i + 1] in the previous iteration, we want to skip it now.
I'm thinking of the situation when we have clang src.c -o src.o, and we added src.o to target_names for search.
Then it gets compiled again clang++ src.cpp -o src.o. We dont want to assign src.otofuzz_target_found`.
Although this should be ok because we will not use it later on.

log a warning if i + 1 >= len(compile_args)

Sure good point

llm_toolkit/code_fixer.py

DonggeLiu · 2024-05-16T05:44:12Z

Thanks for addressing the comments, I have no more suggestions now.
Could you please run a PR experiment and triple-check the report that everything is intact?
Thanks

cjx10 · 2024-05-16T05:47:27Z

Could you please run a PR experiment

Many thanks Dongge :)
Will do after I double check the parsing is correct, since the code structure has changed quite a bit

cjx10 · 2024-05-16T05:56:47Z

/gcbrun request_pr_exp.py -n jim -f

DonggeLiu · 2024-05-16T05:59:37Z

nit:

-n jim

nit:Could you please add the branch id to the name in the future?
This helps us identify the job/bucket_dir/report.
Thanks : )

cjx10 requested a review from DonggeLiu April 26, 2024 03:32

cjx10 commented Apr 26, 2024

View reviewed changes

experiment/evaluator.py Outdated Show resolved Hide resolved

cjx10 force-pushed the err_parsing branch from 876b49c to 3b67c50 Compare April 26, 2024 07:21

cjx10 force-pushed the get_err_from_errlog branch from 7de4eb5 to 844aef5 Compare April 26, 2024 07:21

DonggeLiu force-pushed the err_parsing branch from 3b67c50 to 13fc238 Compare April 26, 2024 23:20

cjx10 force-pushed the get_err_from_errlog branch from 844aef5 to caf19de Compare April 28, 2024 23:47

cjx10 marked this pull request as draft April 29, 2024 06:54

DonggeLiu force-pushed the err_parsing branch from 13fc238 to b8a4a69 Compare May 7, 2024 03:23

cjx10 force-pushed the get_err_from_errlog branch from 9d061b5 to 7109c21 Compare May 7, 2024 04:39

cjx10 marked this pull request as ready for review May 7, 2024 22:34

DonggeLiu force-pushed the err_parsing branch from b8a4a69 to 6bc3046 Compare May 8, 2024 05:01

cjx10 force-pushed the get_err_from_errlog branch from 6dbb72c to 828cee2 Compare May 8, 2024 05:18

cjx10 changed the title ~~Check errlog in experiments~~ Use error messages from jcc err.log in experiments May 8, 2024

cjx10 force-pushed the get_err_from_errlog branch from 828cee2 to 5586639 Compare May 8, 2024 05:32

Base automatically changed from err_parsing to main May 10, 2024 04:10

cjx10 force-pushed the get_err_from_errlog branch 3 times, most recently from 105bac4 to e9d5283 Compare May 13, 2024 04:27

DonggeLiu requested changes May 14, 2024

View reviewed changes

cjx10 force-pushed the get_err_from_errlog branch from e9d5283 to 312a029 Compare May 14, 2024 08:05

cjx10 commented May 14, 2024

View reviewed changes

DonggeLiu reviewed May 15, 2024

View reviewed changes

cjx10 force-pushed the get_err_from_errlog branch from 312a029 to 4d7b6e5 Compare May 16, 2024 05:17

cjx10 added 29 commits May 23, 2024 13:24

fix err.log path

a951227

fix binary name

9c2a01d

add in all ld error lines

b6a133e

fix jcc error message extraction logic

8c72362

check end line as start line

6442609

use err.log errors

24ce372

give detail err.log path

3ddd9b8

remove color code from lines

02d08d0

extract lld errors

f907c6d

refine err.log parsing logic

3a32c0d

use build log errors when extracting from err.log failed

7d748b0

correspond to changes made in jcc

89e8c67

linker pattern updated for clang-18

fb83573

add clang output flag pattern

4b92773

add target names to look for in command args

dcab0e2

extend clang error pattern to match with all compiler errors

d42f94d

only parse output when needed

2092fa2

consider only the last block produced by target but is unexpected error

cbc7ce9

clang error pattern in grouping too

20d8a62

include end line for consistency

cb4bee8

strip color code before checking lines

fca048c

linker error

e028792

comment for extracting linker error from build log

05af518

update references to jcc.go

c02d7e9

comments

df1091b

move linker error parsing to new branch

1656b4b

rename

be2ce1e

nits and refactor

fd5868c

more restructure

8f4e841

cjx10 force-pushed the get_err_from_errlog branch from 4d7b6e5 to 8f4e841 Compare May 23, 2024 23:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use error messages from jcc err.log in experiments #230

Use error messages from jcc err.log in experiments #230

cjx10 commented Apr 26, 2024 •

edited

cjx10 commented May 3, 2024 •

edited

DonggeLiu commented May 4, 2024

cjx10 commented May 6, 2024

DonggeLiu May 14, 2024

DonggeLiu May 14, 2024

cjx10 May 14, 2024

cjx10 left a comment

cjx10 May 14, 2024

DonggeLiu May 15, 2024

cjx10 May 16, 2024

DonggeLiu May 16, 2024

cjx10 May 16, 2024

DonggeLiu May 16, 2024

DonggeLiu May 15, 2024

cjx10 May 16, 2024

DonggeLiu commented May 16, 2024

cjx10 commented May 16, 2024

cjx10 commented May 16, 2024

DonggeLiu commented May 16, 2024

Use error messages from jcc err.log in experiments #230

Are you sure you want to change the base?

Use error messages from jcc err.log in experiments #230

Conversation

cjx10 commented Apr 26, 2024 • edited

cjx10 commented May 3, 2024 • edited

DonggeLiu commented May 4, 2024

cjx10 commented May 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjx10 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DonggeLiu commented May 16, 2024

cjx10 commented May 16, 2024

cjx10 commented May 16, 2024

DonggeLiu commented May 16, 2024

cjx10 commented Apr 26, 2024 •

edited

cjx10 commented May 3, 2024 •

edited