Support Chinese characters in prompt generation stage #1168

Tu-Zhenzhao · 2024-05-20T02:58:26Z

System Info

pandasai == 2.0.43
python == 3.11

🐛 Describe the bug

I was trying to use Field Descriptions feature to improve the understanding of my dataset to LLMs. The way I am doing is write a data description function to create a dictionary info about dataset then pass then to pandasai through Field Descriptions like this:

data = preview_data(df)
# define a connector
connector = PandasConnector({"original_df": df}, name='My Connector', field_descriptions=data)

My part of data looks like this:

{'时间': 'The 时间 column contains string values. The unique values are: 2023-6-14, 2022-4-22, 2022-11-5.'}

As you can see there is some Chinese characters, but in the prompt_generation stage, the Chinese characters was not decoded thus it looks like this:

dfs[0]:
  name: My Connector
  description: null
  type: pd.DataFrame
  rows: 28
  columns: 18
  schema:
    fields:
    - name: "\u65F6\u95F4"
      type: object
      samples:
      - 2022-4-22
      - 2022-11-5
      - 2023-6-14

Which makes LLM much more confused "\u65F6\u95F4".

Is any way we solve this problem? Any suggestion will be grateful!

The text was updated successfully, but these errors were encountered:

dosubot · 2024-05-20T02:58:55Z

To address the issue with Chinese characters not being properly decoded during the prompt generation stage in pandasai, it's crucial to understand the encoding of your original data source and ensure it's UTF-8 or compatible. Additionally, knowing the Python version you're using and whether this issue occurs with other non-ASCII characters can help pinpoint the problem. A minimal, reproducible example of the code causing this issue would also be invaluable for finding a solution. This information will enable a more targeted approach to solving the decoding issue with Chinese characters.

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot bot added the bug Something isn't working label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Chinese characters in prompt generation stage #1168

Support Chinese characters in prompt generation stage #1168

Tu-Zhenzhao commented May 20, 2024 •

edited

dosubot bot commented May 20, 2024 •

edited

Details

Support Chinese characters in prompt generation stage #1168

Support Chinese characters in prompt generation stage #1168

Comments

Tu-Zhenzhao commented May 20, 2024 • edited

System Info

🐛 Describe the bug

dosubot bot commented May 20, 2024 • edited

Details

Tu-Zhenzhao commented May 20, 2024 •

edited

dosubot bot commented May 20, 2024 •

edited